comparison tools/get_orfs_or_cdss/get_orfs_or_cdss.xml @ 7:705a2e2df7fb draft

v0.1.1 fix typo; v0.1.0 BED output (Eric Rasche), NCBI genetic code 24; v0.0.7 embeds citation
author peterjc
date Thu, 30 Jul 2015 12:35:31 -0400
parents 5208c15805ec
children 09a8be9247ca
comparison
equal deleted inserted replaced
6:64e67f172188 7:705a2e2df7fb
1 <tool id="get_orfs_or_cdss" name="Get open reading frames (ORFs) or coding sequences (CDSs)" version="0.0.5"> 1 <tool id="get_orfs_or_cdss" name="Get open reading frames (ORFs) or coding sequences (CDSs)" version="0.1.1">
2 <description>e.g. to get peptides from ESTs</description> 2 <description>e.g. to get peptides from ESTs</description>
3 <requirements> 3 <requirements>
4 <requirement type="package" version="1.62">biopython</requirement> 4 <requirement type="package" version="1.65">biopython</requirement>
5 <requirement type="python-module">Bio</requirement> 5 <requirement type="python-module">Bio</requirement>
6 </requirements> 6 </requirements>
7 <version_command interpreter="python">get_orfs_or_cdss.py --version</version_command>
8 <command interpreter="python">
9 get_orfs_or_cdss.py $input_file $input_file.ext $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file
10 </command>
11 <stdio> 7 <stdio>
12 <!-- Anything other than zero is an error --> 8 <!-- Anything other than zero is an error -->
13 <exit_code range="1:" /> 9 <exit_code range="1:" />
14 <exit_code range=":-1" /> 10 <exit_code range=":-1" />
15 </stdio> 11 </stdio>
12 <version_command interpreter="python">get_orfs_or_cdss.py --version</version_command>
13 <command interpreter="python">
14 get_orfs_or_cdss.py -i $input_file -f $input_file.ext --table $table -t $ftype -e $ends -m $mode --min_len $min_len -s $strand --on $out_nuc_file --op $out_prot_file --ob $out_bed_file
15 </command>
16 <inputs> 16 <inputs>
17 <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file (nucleotides)" help="FASTA, FASTQ, or SFF format." /> 17 <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file (nucleotides)" help="FASTA, FASTQ, or SFF format." />
18 <param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons"> 18 <param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons">
19 <option value="1">1. Standard</option> 19 <option value="1">1. Standard</option>
20 <option value="2">2. Vertebrate Mitochondrial</option> 20 <option value="2">2. Vertebrate Mitochondrial</option>
31 <option value="15">15. Blepharisma Macronuclear</option> 31 <option value="15">15. Blepharisma Macronuclear</option>
32 <option value="16">16. Chlorophycean Mitochondrial</option> 32 <option value="16">16. Chlorophycean Mitochondrial</option>
33 <option value="21">21. Trematode Mitochondrial</option> 33 <option value="21">21. Trematode Mitochondrial</option>
34 <option value="22">22. Scenedesmus obliquus</option> 34 <option value="22">22. Scenedesmus obliquus</option>
35 <option value="23">23. Thraustochytrium Mitochondrial</option> 35 <option value="23">23. Thraustochytrium Mitochondrial</option>
36 <option value="24">24. Pterobranchia Mitochondrial</option>
36 </param> 37 </param>
37 <param name="ftype" type="select" value="True" label="Look for ORFs or CDSs"> 38 <param name="ftype" type="select" value="True" label="Look for ORFs or CDSs">
38 <option value="ORF">Look for ORFs (check for stop codons only, ignore start codons)</option> 39 <option value="ORF">Look for ORFs (check for stop codons only, ignore start codons)</option>
39 <option value="CDS">Look for CDSs (with start and stop codons)</option> 40 <option value="CDS">Look for CDSs (with start and stop codons)</option>
40 </param> 41 </param>
47 <option value="all">All ORFs/CDSs from each sequence</option> 48 <option value="all">All ORFs/CDSs from each sequence</option>
48 <option value="top">All ORFs/CDSs from each sequence with the maximum length</option> 49 <option value="top">All ORFs/CDSs from each sequence with the maximum length</option>
49 <option value="one">First ORF/CDS from each sequence with the maximum length</option> 50 <option value="one">First ORF/CDS from each sequence with the maximum length</option>
50 </param> 51 </param>
51 <param name="min_len" type="integer" size="5" value="30" label="Minimum length ORF/CDS (in amino acids, e.g. 30 aa = 90 bp plus any stop codon)" /> 52 <param name="min_len" type="integer" size="5" value="30" label="Minimum length ORF/CDS (in amino acids, e.g. 30 aa = 90 bp plus any stop codon)" />
52 <param name="strand" type="select" label="Strand to search" help="Use the forward only option if your sequence directionality is known (e.g. from poly-A tails, or strand specific RNA sequencing."> 53 <param name="strand" type="select" label="Strand to search" help="Use the forward only option if your sequence directionality is known (e.g. from poly-A tails, or strand specific RNA sequencing).">
53 <option value="both">Search both the forward and reverse strand</option> 54 <option value="both">Search both the forward and reverse strand</option>
54 <option value="forward">Only search the forward strand</option> 55 <option value="forward">Only search the forward strand</option>
55 <option value="reverse">Only search the reverse strand</option> 56 <option value="reverse">Only search the reverse strand</option>
56 </param> 57 </param>
57 </inputs> 58 </inputs>
58 <outputs> 59 <outputs>
59 <data name="out_nuc_file" format="fasta" label="${ftype.value}s (nucleotides)" /> 60 <data name="out_nuc_file" format="fasta" label="${ftype.value}s (nucleotides)" />
60 <data name="out_prot_file" format="fasta" label="${ftype.value}s (amino acids)" /> 61 <data name="out_prot_file" format="fasta" label="${ftype.value}s (amino acids)" />
62 <data name="out_bed_file" format="bed6" label="${ftype.value}s (bed)" />
61 </outputs> 63 </outputs>
62 <tests> 64 <tests>
63 <test> 65 <test>
64 <param name="input_file" value="get_orf_input.fasta" /> 66 <param name="input_file" value="get_orf_input.fasta" />
65 <param name="table" value="1" /> 67 <param name="table" value="1" />
68 <param name="mode" value="all" /> 70 <param name="mode" value="all" />
69 <param name="min_len" value="10" /> 71 <param name="min_len" value="10" />
70 <param name="strand" value="forward" /> 72 <param name="strand" value="forward" />
71 <output name="out_nuc_file" file="get_orf_input.t1_nuc_out.fasta" /> 73 <output name="out_nuc_file" file="get_orf_input.t1_nuc_out.fasta" />
72 <output name="out_prot_file" file="get_orf_input.t1_prot_out.fasta" /> 74 <output name="out_prot_file" file="get_orf_input.t1_prot_out.fasta" />
75 <output name="out_bed_file" file="get_orf_input.t1_bed_out.bed" />
73 </test> 76 </test>
74 <test> 77 <test>
75 <param name="input_file" value="get_orf_input.fasta" /> 78 <param name="input_file" value="get_orf_input.fasta" />
76 <param name="table" value="11" /> 79 <param name="table" value="11" />
77 <param name="ftype" value="CDS" /> 80 <param name="ftype" value="CDS" />
78 <param name="ends" value="closed" /> 81 <param name="ends" value="closed" />
79 <param name="mode" value="all" /> 82 <param name="mode" value="all" />
80 <param name="min_len" value="10" /> 83 <param name="min_len" value="10" />
81 <param name="strand" value="forward" /> 84 <param name="strand" value="forward" />
82 <output name="out_nuc_file" file="get_orf_input.t11_nuc_out.fasta" /> 85 <output name="out_nuc_file" file="get_orf_input.t11_nuc_out.fasta" />
83 <output name="out_prot_file" file="get_orf_input.t11_prot_out.fasta" /> 86 <output name="out_prot_file" file="get_orf_input.t11_prot_out.fasta" />
87 <output name="out_bed_file" file="get_orf_input.t11_bed_out.bed" />
84 </test> 88 </test>
85 <test> 89 <test>
86 <param name="input_file" value="get_orf_input.fasta" /> 90 <param name="input_file" value="get_orf_input.fasta" />
87 <param name="table" value="11" /> 91 <param name="table" value="11" />
88 <param name="ftype" value="CDS" /> 92 <param name="ftype" value="CDS" />
90 <param name="mode" value="all" /> 94 <param name="mode" value="all" />
91 <param name="min_len" value="10" /> 95 <param name="min_len" value="10" />
92 <param name="strand" value="forward" /> 96 <param name="strand" value="forward" />
93 <output name="out_nuc_file" file="get_orf_input.t11_open_nuc_out.fasta" /> 97 <output name="out_nuc_file" file="get_orf_input.t11_open_nuc_out.fasta" />
94 <output name="out_prot_file" file="get_orf_input.t11_open_prot_out.fasta" /> 98 <output name="out_prot_file" file="get_orf_input.t11_open_prot_out.fasta" />
99 <output name="out_bed_file" file="get_orf_input.t11_open_bed_out.bed" />
95 </test> 100 </test>
96 <test> 101 <test>
97 <param name="input_file" value="Ssuis.fasta" /> 102 <param name="input_file" value="Ssuis.fasta" />
98 <param name="table" value="11" /> 103 <param name="table" value="11" />
99 <param name="ftype" value="ORF" /> 104 <param name="ftype" value="ORF" />
101 <param name="mode" value="all" /> 106 <param name="mode" value="all" />
102 <param name="min_len" value="100" /> 107 <param name="min_len" value="100" />
103 <param name="strand" value="both" /> 108 <param name="strand" value="both" />
104 <output name="out_nuc_file" file="get_orf_input.Suis_ORF.nuc.fasta" /> 109 <output name="out_nuc_file" file="get_orf_input.Suis_ORF.nuc.fasta" />
105 <output name="out_prot_file" file="get_orf_input.Suis_ORF.prot.fasta" /> 110 <output name="out_prot_file" file="get_orf_input.Suis_ORF.prot.fasta" />
111 <output name="out_bed_file" file="get_orf_input.Suis_ORF.bed" />
106 </test> 112 </test>
107 </tests> 113 </tests>
108 <help> 114 <help>
109 **What it does** 115 **What it does**
110 116
132 When searching for ORFs, the sequences will run from stop codon to stop 138 When searching for ORFs, the sequences will run from stop codon to stop
133 codon, and any start codons are ignored. When searching for CDSs, the first 139 codon, and any start codons are ignored. When searching for CDSs, the first
134 potential start codon will be used, giving the longest possible CDS within 140 potential start codon will be used, giving the longest possible CDS within
135 each ORF, and thus the longest possible protein sequence. This is useful 141 each ORF, and thus the longest possible protein sequence. This is useful
136 for things like BLAST or domain searching, but since this may not be the 142 for things like BLAST or domain searching, but since this may not be the
137 correct start codon may not be appropriate for signal peptide detection 143 correct start codon, it may not be appropriate for signal peptide detection
138 etc. 144 etc.
139 145
140 **Example Usage** 146 **Example Usage**
141 147
142 Given some EST sequences (Sanger capillary reads) assembled into unigenes, 148 Given some EST sequences (Sanger capillary reads) assembled into unigenes,
143 or a transcriptome assembly from some RNA-Seq, each of your nucleotide 149 or a transcriptome assembly from some RNA-Seq, each of your nucleotide
144 sequences should (barring sequencing, assembly errors, frame-shifts etc) 150 sequences should (barring sequencing, assembly errors, frame-shifts etc)
145 encode one protein as a single ORF/CDS, which you wish to extract (and 151 encode one protein as a single ORF/CDS, which you wish to extract (and
146 perhaps translate into amino acids). 152 perhaps translate into amino acids).
147 153
148 If your RNS-Seq data was strand specific, and assembled taking this into 154 If your RNA-Seq data was strand specific, and assembled taking this into
149 account, you should only search for ORFs/CDSs on the forward strand. 155 account, you should only search for ORFs/CDSs on the forward strand.
150 156
151 **Citation** 157 **Citation**
152 158
153 If you use this Galaxy tool in work leading to a scientific publication please 159 If you use this Galaxy tool in work leading to a scientific publication please
166 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. 172 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
167 173
168 This tool is available to install into other Galaxy Instances via the Galaxy 174 This tool is available to install into other Galaxy Instances via the Galaxy
169 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/get_orfs_or_cdss 175 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/get_orfs_or_cdss
170 </help> 176 </help>
177 <citations>
178 <citation type="doi">10.7717/peerj.167</citation>
179 <citation type="doi">10.1093/bioinformatics/btp163</citation>
180 </citations>
171 </tool> 181 </tool>