comparison tools/protein_analysis/signalp3.xml @ 20:a19b3ded8f33 draft

v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
author peterjc
date Thu, 21 Sep 2017 11:35:20 -0400
parents eb6ac44d4b8e
children 238eae32483c
comparison
equal deleted inserted replaced
19:f3ecd80850e2 20:a19b3ded8f33
1 <tool id="signalp3" name="SignalP 3.0" version="0.0.15"> 1 <tool id="signalp3" name="SignalP 3.0" version="0.0.19">
2 <description>Find signal peptides in protein sequences</description> 2 <description>Find signal peptides in protein sequences</description>
3 <!-- If job splitting is enabled, break up the query file into parts --> 3 <!-- If job splitting is enabled, break up the query file into parts -->
4 <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal --> 4 <!-- Using 2000 chunks meaning 4 threads doing 500 each is ideal -->
5 <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism> 5 <parallelism method="basic" split_inputs="fasta_file" split_mode="to_size" split_size="2000" merge_outputs="tabular_file"></parallelism>
6 <requirements> 6 <requirements>
7 <requirement type="binary">signalp</requirement>
8 <requirement type="package">signalp</requirement> 7 <requirement type="package">signalp</requirement>
9 </requirements> 8 </requirements>
10 <stdio> 9 <version_command>
11 <!-- Anything other than zero is an error --> 10 python $__tool_directory__/signalp3.py --version
12 <exit_code range="1:" /> 11 </version_command>
13 <exit_code range=":-1" /> 12 <command detect_errors="aggressive">
14 </stdio> 13 python $__tool_directory__/signalp3.py $organism $truncate "\$GALAXY_SLOTS" '$fasta_file' '$tabular_file'
15 <command interpreter="python">
16 signalp3.py $organism $truncate "\$GALAXY_SLOTS" $fasta_file $tabular_file
17 ##If the environment variable isn't set, get "", and the python wrapper
18 ##defaults to four threads.
19 </command> 14 </command>
20 <inputs> 15 <inputs>
21 <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences"/> 16 <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences"/>
22 <param name="organism" type="select" display="radio" label="Organism"> 17 <param name="organism" type="select" display="radio" label="Organism">
23 <option value="euk">Eukaryote</option> 18 <option value="euk">Eukaryote</option>
24 <option value="gram+">Gram positive</option> 19 <option value="gram+">Gram positive</option>
25 <option value="gram-">Gram negative</option> 20 <option value="gram-">Gram negative</option>
26 </param> 21 </param>
33 </outputs> 28 </outputs>
34 <tests> 29 <tests>
35 <test> 30 <test>
36 <param name="fasta_file" value="four_human_proteins.fasta" ftype="fasta"/> 31 <param name="fasta_file" value="four_human_proteins.fasta" ftype="fasta"/>
37 <param name="organism" value="euk"/> 32 <param name="organism" value="euk"/>
38 <param name="truncate" value="0"/> 33 <param name="truncate" value="0"/>
39 <output name="tabular_file" file="four_human_proteins.signalp3.tabular" ftype="tabular"/> 34 <output name="tabular_file" file="four_human_proteins.signalp3.tabular" ftype="tabular"/>
40 </test> 35 </test>
41 <test> 36 <test>
42 <param name="fasta_file" value="empty.fasta" ftype="fasta"/> 37 <param name="fasta_file" value="empty.fasta" ftype="fasta"/>
43 <param name="organism" value="euk"/> 38 <param name="organism" value="euk"/>
44 <param name="truncate" value="60"/> 39 <param name="truncate" value="60"/>
45 <output name="tabular_file" file="empty_signalp3.tabular" ftype="tabular"/> 40 <output name="tabular_file" file="empty_signalp3.tabular" ftype="tabular"/>
46 </test> 41 </test>
47 <test> 42 <test>
48 <param name="fasta_file" value="empty.fasta" ftype="fasta"/> 43 <param name="fasta_file" value="empty.fasta" ftype="fasta"/>
49 <param name="organism" value="gram+"/> 44 <param name="organism" value="gram+"/>
50 <param name="truncate" value="80"/> 45 <param name="truncate" value="80"/>
51 <output name="tabular_file" file="empty_signalp3.tabular" ftype="tabular"/> 46 <output name="tabular_file" file="empty_signalp3.tabular" ftype="tabular"/>
52 </test> 47 </test>
53 <test> 48 <test>
54 <param name="fasta_file" value="empty.fasta" ftype="fasta"/> 49 <param name="fasta_file" value="empty.fasta" ftype="fasta"/>
55 <param name="organism" value="gram-"/> 50 <param name="organism" value="gram-"/>
56 <param name="truncate" value="0"/> 51 <param name="truncate" value="0"/>
57 <output name="tabular_file" file="empty_signalp3.tabular" ftype="tabular"/> 52 <output name="tabular_file" file="empty_signalp3.tabular" ftype="tabular"/>
58 </test> 53 </test>
59 <test> 54 <test>
60 <param name="fasta_file" value="rxlr_win_et_al_2007.fasta" ftype="fasta"/> 55 <param name="fasta_file" value="rxlr_win_et_al_2007.fasta" ftype="fasta"/>
61 <param name="organism" value="euk"/> 56 <param name="organism" value="euk"/>
62 <param name="truncate" value="70"/> 57 <param name="truncate" value="70"/>
63 <output name="tabular_file" file="rxlr_win_et_al_2007_sp3.tabular" ftype="tabular"/> 58 <output name="tabular_file" file="rxlr_win_et_al_2007_sp3.tabular" ftype="tabular"/>
64 </test> 59 </test>
65 </tests> 60 </tests>
66 <help> 61 <help>
67 62
68 **What it does** 63 **What it does**
69 64
70 This calls the SignalP v3.0 tool for prediction of signal peptides, which uses both a Neural Network (NN) and Hidden Markov Model (HMM) to produce two sets of scores. 65 This calls the SignalP v3.0 tool for prediction of signal peptides, which uses both a Neural Network (NN) and Hidden Markov Model (HMM) to produce two sets of scores.
71 66
72 The input is a FASTA file of protein sequences, and the output is tabular with twenty columns (one row per protein): 67 The input is a FASTA file of protein sequences, and the output is tabular with twenty columns (one row per protein):
81 76
82 Internally the input FASTA file is divided into parts (to allow multiple processors to be used), and the proteins truncated as specified (see below). The raw output from SignalP is then reformatted into a tabular layout suitable for Galaxy (see below). 77 Internally the input FASTA file is divided into parts (to allow multiple processors to be used), and the proteins truncated as specified (see below). The raw output from SignalP is then reformatted into a tabular layout suitable for Galaxy (see below).
83 78
84 **Neural Network Scores** 79 **Neural Network Scores**
85 80
86 For each organism class (Eukaryote, Gram-negative and Gram-positive), two different neural networks are used, one for predicting the actual signal peptide and one for predicting the position of the signal peptidase I (SPase I) cleavage site. 81 For each organism class (Eukaryote, Gram-negative and Gram-positive), two different neural networks are used, one for predicting the actual signal peptide and one for predicting the position of the signal peptidase I (SPase I) cleavage site.
87 82
88 The NN output comprises three different scores (C-max, S-max and Y-max) and two scores derived from them (S-mean and D-score). 83 The NN output comprises three different scores (C-max, S-max and Y-max) and two scores derived from them (S-mean and D-score).
89 84
90 ====== ======= =============================================================== 85 ====== ======= ===============================================================
91 Column Name Description 86 Column Name Description
92 ------ ------- --------------------------------------------------------------- 87 ------ ------- ---------------------------------------------------------------
93 2-4 C-score The C-score is the 'cleavage site' score. For each position in 88 2-4 C-score The C-score is the 'cleavage site' score. For each position in
94 the submitted sequence, a C-score is reported, which should 89 the submitted sequence, a C-score is reported, which should
95 only be significantly high at the cleavage site. Confusion is 90 only be significantly high at the cleavage site. Confusion is
96 often seen with the position numbering of the cleavage site. 91 often seen with the position numbering of the cleavage site.
139 134
140 **Notes** 135 **Notes**
141 136
142 The raw output 'short' output from TMHMM v2.0 looks something like this (21 columns space separated - shown here formatted nicely). Notice that the identifiers are given twice, the first time truncated (as part of the NN predictions) and the second time in full (in the HMM predictions). 137 The raw output 'short' output from TMHMM v2.0 looks something like this (21 columns space separated - shown here formatted nicely). Notice that the identifiers are given twice, the first time truncated (as part of the NN predictions) and the second time in full (in the HMM predictions).
143 138
144 ==================== ===== === = ===== === = ===== === = ===== = ===== = =================================== = ===== === = ===== = 139 ==================== ===== === = ===== === = ===== === = ===== = ===== = =================================== = ===== === = ===== =
145 # SignalP-NN euk predictions # SignalP-HMM euk predictions 140 # SignalP-NN euk predictions # SignalP-HMM euk predictions
146 ----------------------------------------------------------------------------- ------------------------------------------------------------ 141 ----------------------------------------------------------------------------- ------------------------------------------------------------
147 # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? D ? # name ! Cmax pos ? Sprob ? 142 # name Cmax pos ? Ymax pos ? Smax pos ? Smean ? D ? # name ! Cmax pos ? Sprob ?
148 gi|2781234|pdb|1JLY| 0.061 17 N 0.043 17 N 0.199 1 N 0.067 N 0.055 N gi|2781234|pdb|1JLY|B Q 0.000 17 N 0.000 N 143 gi|2781234|pdb|1JLY| 0.061 17 N 0.043 17 N 0.199 1 N 0.067 N 0.055 N gi|2781234|pdb|1JLY|B Q 0.000 17 N 0.000 N
149 gi|4959044|gb|AAD342 0.099 191 N 0.012 38 N 0.023 12 N 0.014 N 0.013 N gi|4959044|gb|AAD34209.1|AF069992_1 Q 0.000 0 N 0.000 N 144 gi|4959044|gb|AAD342 0.099 191 N 0.012 38 N 0.023 12 N 0.014 N 0.013 N gi|4959044|gb|AAD34209.1|AF069992_1 Q 0.000 0 N 0.000 N
150 gi|671626|emb|CAA856 0.139 381 N 0.020 8 N 0.121 4 N 0.067 N 0.044 N gi|671626|emb|CAA85685.1| Q 0.000 0 N 0.000 N 145 gi|671626|emb|CAA856 0.139 381 N 0.020 8 N 0.121 4 N 0.067 N 0.044 N gi|671626|emb|CAA85685.1| Q 0.000 0 N 0.000 N
151 gi|3298468|dbj|BAA31 0.208 24 N 0.184 38 N 0.980 32 Y 0.613 Y 0.398 N gi|3298468|dbj|BAA31520.1| Q 0.066 24 N 0.139 N 146 gi|3298468|dbj|BAA31 0.208 24 N 0.184 38 N 0.980 32 Y 0.613 Y 0.398 N gi|3298468|dbj|BAA31520.1| Q 0.066 24 N 0.139 N
152 ==================== ===== === = ===== === = ===== === = ===== = ===== = =================================== = ===== === = ===== = 147 ==================== ===== === = ===== === = ===== === = ===== = ===== = =================================== = ===== === = ===== =
153 148
154 In order to make this easier to use in Galaxy, the wrapper script simplifies this to remove the redundant column and use tabs for separation. It also includes a header line with unique column names. 149 In order to make this easier to use in Galaxy, the wrapper script simplifies this to remove the redundant column and use tabs for separation. It also includes a header line with unique column names.
155 150
156 =================================== ============= =========== ============ ============= =========== ============ ============= =========== ============ ============== ============= ========== ========= ======== ============== ============ ============= =============== ============== 151 =================================== ============= =========== ============ ============= =========== ============ ============= =========== ============ ============== ============= ========== ========= ======== ============== ============ ============= =============== ==============
157 #ID NN_Cmax_score NN_Cmax_pos NN_Cmax_pred NN_Ymax_score NN_Ymax_pos NN_Ymax_pred NN_Smax_score NN_Smax_pos NN_Smax_pred NN_Smean_score NN_Smean_pred NN_D_score NN_D_pred HMM_type HMM_Cmax_score HMM_Cmax_pos HMM_Cmax_pred HMM_Sprob_score HMM_Sprob_pred 152 #ID NN_Cmax_score NN_Cmax_pos NN_Cmax_pred NN_Ymax_score NN_Ymax_pos NN_Ymax_pred NN_Smax_score NN_Smax_pos NN_Smax_pred NN_Smean_score NN_Smean_pred NN_D_score NN_D_pred HMM_type HMM_Cmax_score HMM_Cmax_pos HMM_Cmax_pred HMM_Sprob_score HMM_Sprob_pred