# HG changeset patch # User peterjc # Date 1307484350 14400 # Node ID 81caef04ce8bee25b335c34cc652cc387eb4cced # Parent f3b373a41f8176a690f8aca6f1fcb9543a79a5cb Migrated tool version 0.0.7 from old tool shed archive to new tool shed repository diff -r f3b373a41f81 -r 81caef04ce8b tools/protein_analysis/README --- a/tools/protein_analysis/README Tue Jun 07 18:05:13 2011 -0400 +++ b/tools/protein_analysis/README Tue Jun 07 18:05:50 2011 -0400 @@ -73,6 +73,9 @@ v0.0.4 - Ignore comment lines in tmhmm2 output. v0.0.5 - Explicitly request tmhmm short output (may not be the default) v0.0.6 - Improvement to how sub-jobs are run (should be faster) +v0.0.7 - Change SignalP default truncation from 60 to 70 to match the + SignalP webservice. + Developers ========== diff -r f3b373a41f81 -r 81caef04ce8b tools/protein_analysis/seq_analysis_utils.py diff -r f3b373a41f81 -r 81caef04ce8b tools/protein_analysis/signalp3.xml --- a/tools/protein_analysis/signalp3.xml Tue Jun 07 18:05:13 2011 -0400 +++ b/tools/protein_analysis/signalp3.xml Tue Jun 07 18:05:50 2011 -0400 @@ -1,4 +1,4 @@ - + Find signal peptides in protein sequences signalp3.py $organism $truncate 8 $fasta_file $tabular_file @@ -11,7 +11,7 @@ - + @@ -46,6 +46,12 @@ + + + + + + @@ -67,9 +73,9 @@ The NN output comprises three different scores (C-max, S-max and Y-max) and two scores derived from them (S-mean and D-score). -The C-score is the 'cleavage site' score. For each position in the submitted sequence, a C-score is reported, which should only be significantly high at the cleavage site. Confusion is often seen with the position numbering of the cleavage site. When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein, meaning that a reported cleavage site between amino acid 26-27 corresponds to that the mature protein starts at (and include) position 27. +The C-score is the 'cleavage site' score. For each position in the submitted sequence, a C-score is reported, which should only be significantly high at the cleavage site. Confusion is often seen with the position numbering of the cleavage site. When a cleavage site position is referred to by a single number, the number indicates the first residue in the mature protein, meaning that a predicted cleavage site between amino acid 26-27 is reported as 27, corresponding to the mature protein starting at (and including) position 27. -The S-score for the signal peptide prediction is calculateded for every single amino acid position in the submitted sequence (not shown in the output via Galaxy), with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein. +The S-score for the signal peptide prediction is calculated for every single amino acid position in the submitted sequence (not shown in the output via Galaxy), with high scores indicating that the corresponding amino acid is part of a signal peptide, and low scores indicating that the amino acid is part of a mature protein. Y-max is a derivative of the C-score combined with the S-score resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The cleavage site is assigned from the Y-score where the slope of the S-score is steep and a significant C-score is found. diff -r f3b373a41f81 -r 81caef04ce8b tools/protein_analysis/suite_config.xml --- a/tools/protein_analysis/suite_config.xml Tue Jun 07 18:05:13 2011 -0400 +++ b/tools/protein_analysis/suite_config.xml Tue Jun 07 18:05:50 2011 -0400 @@ -1,9 +1,9 @@ - + Wrappers for TMHMM and SignalP Find transmembrane domains in protein sequences - + Find signal peptides in protein sequences