Mercurial > repos > peterjc > get_orfs_or_cdss
comparison tools/get_orfs_or_cdss/get_orfs_or_cdss.xml @ 7:705a2e2df7fb draft
v0.1.1 fix typo; v0.1.0 BED output (Eric Rasche), NCBI genetic code 24; v0.0.7 embeds citation
author | peterjc |
---|---|
date | Thu, 30 Jul 2015 12:35:31 -0400 |
parents | 5208c15805ec |
children | 09a8be9247ca |
comparison
equal
deleted
inserted
replaced
6:64e67f172188 | 7:705a2e2df7fb |
---|---|
1 <tool id="get_orfs_or_cdss" name="Get open reading frames (ORFs) or coding sequences (CDSs)" version="0.0.5"> | 1 <tool id="get_orfs_or_cdss" name="Get open reading frames (ORFs) or coding sequences (CDSs)" version="0.1.1"> |
2 <description>e.g. to get peptides from ESTs</description> | 2 <description>e.g. to get peptides from ESTs</description> |
3 <requirements> | 3 <requirements> |
4 <requirement type="package" version="1.62">biopython</requirement> | 4 <requirement type="package" version="1.65">biopython</requirement> |
5 <requirement type="python-module">Bio</requirement> | 5 <requirement type="python-module">Bio</requirement> |
6 </requirements> | 6 </requirements> |
7 <version_command interpreter="python">get_orfs_or_cdss.py --version</version_command> | |
8 <command interpreter="python"> | |
9 get_orfs_or_cdss.py $input_file $input_file.ext $table $ftype $ends $mode $min_len $strand $out_nuc_file $out_prot_file | |
10 </command> | |
11 <stdio> | 7 <stdio> |
12 <!-- Anything other than zero is an error --> | 8 <!-- Anything other than zero is an error --> |
13 <exit_code range="1:" /> | 9 <exit_code range="1:" /> |
14 <exit_code range=":-1" /> | 10 <exit_code range=":-1" /> |
15 </stdio> | 11 </stdio> |
12 <version_command interpreter="python">get_orfs_or_cdss.py --version</version_command> | |
13 <command interpreter="python"> | |
14 get_orfs_or_cdss.py -i $input_file -f $input_file.ext --table $table -t $ftype -e $ends -m $mode --min_len $min_len -s $strand --on $out_nuc_file --op $out_prot_file --ob $out_bed_file | |
15 </command> | |
16 <inputs> | 16 <inputs> |
17 <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file (nucleotides)" help="FASTA, FASTQ, or SFF format." /> | 17 <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file (nucleotides)" help="FASTA, FASTQ, or SFF format." /> |
18 <param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons"> | 18 <param name="table" type="select" label="Genetic code" help="Tables from the NCBI, these determine the start and stop codons"> |
19 <option value="1">1. Standard</option> | 19 <option value="1">1. Standard</option> |
20 <option value="2">2. Vertebrate Mitochondrial</option> | 20 <option value="2">2. Vertebrate Mitochondrial</option> |
31 <option value="15">15. Blepharisma Macronuclear</option> | 31 <option value="15">15. Blepharisma Macronuclear</option> |
32 <option value="16">16. Chlorophycean Mitochondrial</option> | 32 <option value="16">16. Chlorophycean Mitochondrial</option> |
33 <option value="21">21. Trematode Mitochondrial</option> | 33 <option value="21">21. Trematode Mitochondrial</option> |
34 <option value="22">22. Scenedesmus obliquus</option> | 34 <option value="22">22. Scenedesmus obliquus</option> |
35 <option value="23">23. Thraustochytrium Mitochondrial</option> | 35 <option value="23">23. Thraustochytrium Mitochondrial</option> |
36 <option value="24">24. Pterobranchia Mitochondrial</option> | |
36 </param> | 37 </param> |
37 <param name="ftype" type="select" value="True" label="Look for ORFs or CDSs"> | 38 <param name="ftype" type="select" value="True" label="Look for ORFs or CDSs"> |
38 <option value="ORF">Look for ORFs (check for stop codons only, ignore start codons)</option> | 39 <option value="ORF">Look for ORFs (check for stop codons only, ignore start codons)</option> |
39 <option value="CDS">Look for CDSs (with start and stop codons)</option> | 40 <option value="CDS">Look for CDSs (with start and stop codons)</option> |
40 </param> | 41 </param> |
47 <option value="all">All ORFs/CDSs from each sequence</option> | 48 <option value="all">All ORFs/CDSs from each sequence</option> |
48 <option value="top">All ORFs/CDSs from each sequence with the maximum length</option> | 49 <option value="top">All ORFs/CDSs from each sequence with the maximum length</option> |
49 <option value="one">First ORF/CDS from each sequence with the maximum length</option> | 50 <option value="one">First ORF/CDS from each sequence with the maximum length</option> |
50 </param> | 51 </param> |
51 <param name="min_len" type="integer" size="5" value="30" label="Minimum length ORF/CDS (in amino acids, e.g. 30 aa = 90 bp plus any stop codon)" /> | 52 <param name="min_len" type="integer" size="5" value="30" label="Minimum length ORF/CDS (in amino acids, e.g. 30 aa = 90 bp plus any stop codon)" /> |
52 <param name="strand" type="select" label="Strand to search" help="Use the forward only option if your sequence directionality is known (e.g. from poly-A tails, or strand specific RNA sequencing."> | 53 <param name="strand" type="select" label="Strand to search" help="Use the forward only option if your sequence directionality is known (e.g. from poly-A tails, or strand specific RNA sequencing)."> |
53 <option value="both">Search both the forward and reverse strand</option> | 54 <option value="both">Search both the forward and reverse strand</option> |
54 <option value="forward">Only search the forward strand</option> | 55 <option value="forward">Only search the forward strand</option> |
55 <option value="reverse">Only search the reverse strand</option> | 56 <option value="reverse">Only search the reverse strand</option> |
56 </param> | 57 </param> |
57 </inputs> | 58 </inputs> |
58 <outputs> | 59 <outputs> |
59 <data name="out_nuc_file" format="fasta" label="${ftype.value}s (nucleotides)" /> | 60 <data name="out_nuc_file" format="fasta" label="${ftype.value}s (nucleotides)" /> |
60 <data name="out_prot_file" format="fasta" label="${ftype.value}s (amino acids)" /> | 61 <data name="out_prot_file" format="fasta" label="${ftype.value}s (amino acids)" /> |
62 <data name="out_bed_file" format="bed6" label="${ftype.value}s (bed)" /> | |
61 </outputs> | 63 </outputs> |
62 <tests> | 64 <tests> |
63 <test> | 65 <test> |
64 <param name="input_file" value="get_orf_input.fasta" /> | 66 <param name="input_file" value="get_orf_input.fasta" /> |
65 <param name="table" value="1" /> | 67 <param name="table" value="1" /> |
68 <param name="mode" value="all" /> | 70 <param name="mode" value="all" /> |
69 <param name="min_len" value="10" /> | 71 <param name="min_len" value="10" /> |
70 <param name="strand" value="forward" /> | 72 <param name="strand" value="forward" /> |
71 <output name="out_nuc_file" file="get_orf_input.t1_nuc_out.fasta" /> | 73 <output name="out_nuc_file" file="get_orf_input.t1_nuc_out.fasta" /> |
72 <output name="out_prot_file" file="get_orf_input.t1_prot_out.fasta" /> | 74 <output name="out_prot_file" file="get_orf_input.t1_prot_out.fasta" /> |
75 <output name="out_bed_file" file="get_orf_input.t1_bed_out.bed" /> | |
73 </test> | 76 </test> |
74 <test> | 77 <test> |
75 <param name="input_file" value="get_orf_input.fasta" /> | 78 <param name="input_file" value="get_orf_input.fasta" /> |
76 <param name="table" value="11" /> | 79 <param name="table" value="11" /> |
77 <param name="ftype" value="CDS" /> | 80 <param name="ftype" value="CDS" /> |
78 <param name="ends" value="closed" /> | 81 <param name="ends" value="closed" /> |
79 <param name="mode" value="all" /> | 82 <param name="mode" value="all" /> |
80 <param name="min_len" value="10" /> | 83 <param name="min_len" value="10" /> |
81 <param name="strand" value="forward" /> | 84 <param name="strand" value="forward" /> |
82 <output name="out_nuc_file" file="get_orf_input.t11_nuc_out.fasta" /> | 85 <output name="out_nuc_file" file="get_orf_input.t11_nuc_out.fasta" /> |
83 <output name="out_prot_file" file="get_orf_input.t11_prot_out.fasta" /> | 86 <output name="out_prot_file" file="get_orf_input.t11_prot_out.fasta" /> |
87 <output name="out_bed_file" file="get_orf_input.t11_bed_out.bed" /> | |
84 </test> | 88 </test> |
85 <test> | 89 <test> |
86 <param name="input_file" value="get_orf_input.fasta" /> | 90 <param name="input_file" value="get_orf_input.fasta" /> |
87 <param name="table" value="11" /> | 91 <param name="table" value="11" /> |
88 <param name="ftype" value="CDS" /> | 92 <param name="ftype" value="CDS" /> |
90 <param name="mode" value="all" /> | 94 <param name="mode" value="all" /> |
91 <param name="min_len" value="10" /> | 95 <param name="min_len" value="10" /> |
92 <param name="strand" value="forward" /> | 96 <param name="strand" value="forward" /> |
93 <output name="out_nuc_file" file="get_orf_input.t11_open_nuc_out.fasta" /> | 97 <output name="out_nuc_file" file="get_orf_input.t11_open_nuc_out.fasta" /> |
94 <output name="out_prot_file" file="get_orf_input.t11_open_prot_out.fasta" /> | 98 <output name="out_prot_file" file="get_orf_input.t11_open_prot_out.fasta" /> |
99 <output name="out_bed_file" file="get_orf_input.t11_open_bed_out.bed" /> | |
95 </test> | 100 </test> |
96 <test> | 101 <test> |
97 <param name="input_file" value="Ssuis.fasta" /> | 102 <param name="input_file" value="Ssuis.fasta" /> |
98 <param name="table" value="11" /> | 103 <param name="table" value="11" /> |
99 <param name="ftype" value="ORF" /> | 104 <param name="ftype" value="ORF" /> |
101 <param name="mode" value="all" /> | 106 <param name="mode" value="all" /> |
102 <param name="min_len" value="100" /> | 107 <param name="min_len" value="100" /> |
103 <param name="strand" value="both" /> | 108 <param name="strand" value="both" /> |
104 <output name="out_nuc_file" file="get_orf_input.Suis_ORF.nuc.fasta" /> | 109 <output name="out_nuc_file" file="get_orf_input.Suis_ORF.nuc.fasta" /> |
105 <output name="out_prot_file" file="get_orf_input.Suis_ORF.prot.fasta" /> | 110 <output name="out_prot_file" file="get_orf_input.Suis_ORF.prot.fasta" /> |
111 <output name="out_bed_file" file="get_orf_input.Suis_ORF.bed" /> | |
106 </test> | 112 </test> |
107 </tests> | 113 </tests> |
108 <help> | 114 <help> |
109 **What it does** | 115 **What it does** |
110 | 116 |
132 When searching for ORFs, the sequences will run from stop codon to stop | 138 When searching for ORFs, the sequences will run from stop codon to stop |
133 codon, and any start codons are ignored. When searching for CDSs, the first | 139 codon, and any start codons are ignored. When searching for CDSs, the first |
134 potential start codon will be used, giving the longest possible CDS within | 140 potential start codon will be used, giving the longest possible CDS within |
135 each ORF, and thus the longest possible protein sequence. This is useful | 141 each ORF, and thus the longest possible protein sequence. This is useful |
136 for things like BLAST or domain searching, but since this may not be the | 142 for things like BLAST or domain searching, but since this may not be the |
137 correct start codon may not be appropriate for signal peptide detection | 143 correct start codon, it may not be appropriate for signal peptide detection |
138 etc. | 144 etc. |
139 | 145 |
140 **Example Usage** | 146 **Example Usage** |
141 | 147 |
142 Given some EST sequences (Sanger capillary reads) assembled into unigenes, | 148 Given some EST sequences (Sanger capillary reads) assembled into unigenes, |
143 or a transcriptome assembly from some RNA-Seq, each of your nucleotide | 149 or a transcriptome assembly from some RNA-Seq, each of your nucleotide |
144 sequences should (barring sequencing, assembly errors, frame-shifts etc) | 150 sequences should (barring sequencing, assembly errors, frame-shifts etc) |
145 encode one protein as a single ORF/CDS, which you wish to extract (and | 151 encode one protein as a single ORF/CDS, which you wish to extract (and |
146 perhaps translate into amino acids). | 152 perhaps translate into amino acids). |
147 | 153 |
148 If your RNS-Seq data was strand specific, and assembled taking this into | 154 If your RNA-Seq data was strand specific, and assembled taking this into |
149 account, you should only search for ORFs/CDSs on the forward strand. | 155 account, you should only search for ORFs/CDSs on the forward strand. |
150 | 156 |
151 **Citation** | 157 **Citation** |
152 | 158 |
153 If you use this Galaxy tool in work leading to a scientific publication please | 159 If you use this Galaxy tool in work leading to a scientific publication please |
166 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. | 172 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. |
167 | 173 |
168 This tool is available to install into other Galaxy Instances via the Galaxy | 174 This tool is available to install into other Galaxy Instances via the Galaxy |
169 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/get_orfs_or_cdss | 175 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/get_orfs_or_cdss |
170 </help> | 176 </help> |
177 <citations> | |
178 <citation type="doi">10.7717/peerj.167</citation> | |
179 <citation type="doi">10.1093/bioinformatics/btp163</citation> | |
180 </citations> | |
171 </tool> | 181 </tool> |