Mercurial > repos > frogs > frogs_3_1_0
comparison affiliation_OTU.xml @ 0:59bc96331073 draft default tip
planemo upload for repository https://github.com/geraldinepascal/FROGS-wrappers/tree/v3.1.0 commit 08296fc88e3e938c482c631bd515b3b7a0499647
author | frogs |
---|---|
date | Thu, 28 Feb 2019 10:14:49 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:59bc96331073 |
---|---|
1 <?xml version="1.0"?> | |
2 <!-- | |
3 # Copyright (C) 2015 INRA | |
4 # | |
5 # This program is free software: you can redistribute it and/or modify | |
6 # it under the terms of the GNU General Public License as published by | |
7 # the Free Software Foundation, either version 3 of the License, or | |
8 # (at your option) any later version. | |
9 # | |
10 # This program is distributed in the hope that it will be useful, | |
11 # but WITHOUT ANY WARRANTY; without even the implied warranty of | |
12 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
13 # GNU General Public License for more details. | |
14 # | |
15 # You should have received a copy of the GNU General Public License | |
16 # along with this program. If not, see <http://www.gnu.org/licenses/>. | |
17 --> | |
18 <tool id="FROGS_affiliation_OTU" name="FROGS Affiliation OTU" version="3.1"> | |
19 <description>Taxonomic affiliation of each OTU's seed by RDPtools and BLAST</description> | |
20 <requirements> | |
21 <requirement type="package" version="3.1.0">frogs</requirement> | |
22 </requirements> | |
23 <stdio> | |
24 <exit_code range="1:" /> | |
25 <exit_code range=":-1" /> | |
26 </stdio> | |
27 <command> | |
28 #set $reference_filename = str( $ref_file.fields.path ) | |
29 affiliation_OTU.py | |
30 --reference "${reference_filename}" | |
31 --input-biom $biom_abundance | |
32 --input-fasta $fasta_sequences | |
33 --output-biom $biom_affiliation | |
34 --summary $summary | |
35 --nb-cpus \${GALAXY_SLOTS:-1} | |
36 --java-mem $mem | |
37 #if $rdp | |
38 --rdp | |
39 #end if | |
40 </command> | |
41 <inputs> | |
42 <!-- JOB Parameter --> | |
43 <param name="mem" type="hidden" label="Memory allocation" help="The number of Go to allocation for java" value="20"></param> | |
44 <!-- Database Choice --> | |
45 <param name="ref_file" type="select" label="Using reference database" help="Select reference from the list"> | |
46 <options from_data_table="frogs_db"></options> | |
47 <validator type="no_options" message="A built-in database is not available" /> | |
48 </param> | |
49 <param name="rdp" type="boolean" label="Also perform RDP assignation?" help="Taxonomy affiliation will be perform thanks to Blast. This option allow you to perform it also with RDP classifier (default No)" /> | |
50 <!-- Files --> | |
51 <param format="fasta" name="fasta_sequences" type="data" label="OTU seed sequence" help="OTU sequences (format: fasta)."/> | |
52 <param format="biom1" name="biom_abundance" type="data" label="Abundance file" help="OTU abundances (format: BIOM)."/> | |
53 </inputs> | |
54 <outputs> | |
55 <data format="biom1" name="biom_affiliation" label="${tool.name}: affiliation.biom" from_work_dir="affiliation.biom" /> | |
56 <data format="html" name="summary" label="${tool.name}: report.html" from_work_dir="report.html"/> | |
57 </outputs> | |
58 <tests> | |
59 <test> | |
60 <param name="ref_file" value="ITS1_test"/> | |
61 <param name="fasta_sequences" value="references/04-filters.fasta"/> | |
62 <param name="biom_abundance" value="references/04-filters.biom"/> | |
63 <!-- <param name="fasta_sequences" value="swarm.fasta"/> | |
64 <param name="biom_abundance" value="swarm.biom"/> --> | |
65 <output name="biom_affiliation" file="references/06-affiliation.biom"/> | |
66 <output name="summary" file="references/06-affiliation.html" compare="sim_size" delta="0" /> | |
67 </test> | |
68 </tests> | |
69 | |
70 <help> | |
71 | |
72 .. image:: static/images/frogs_images/FROGS_logo.png | |
73 :height: 144 | |
74 :width: 110 | |
75 | |
76 | |
77 .. class:: infomark page-header h2 | |
78 | |
79 What it does | |
80 | |
81 Add taxonomic affiliation in abundance file. | |
82 | |
83 | |
84 .. class:: infomark page-header h2 | |
85 | |
86 Inputs/outputs | |
87 | |
88 .. class:: h3 | |
89 | |
90 Inputs | |
91 | |
92 **Sequence file**: | |
93 | |
94 The sequences (format `FASTA <https://en.wikipedia.org/wiki/FASTA_format>`_). | |
95 | |
96 **Abundance file**: | |
97 | |
98 The abundance of each OTU in each sample (format `BIOM <http://biom-format.org/>`_). | |
99 | |
100 .. class:: h3 | |
101 | |
102 Outputs | |
103 | |
104 **Abundance file** (tax_affiliation.biom): | |
105 | |
106 The abundance file with affiliation (format `BIOM <http://biom-format.org/>`_). | |
107 | |
108 **Summary file** (report.html): | |
109 | |
110 This file presents the number of sequences affiliated by blast, and the number of multi-affiliation (format `HTML <https://en.wikipedia.org/wiki/HTML>`_). | |
111 | |
112 .. image:: static/images/frogs_images/FROGS_affiliation_summary.png | |
113 :height: 800 | |
114 :width: 600 | |
115 | |
116 | |
117 .. class:: infomark page-header h2 | |
118 | |
119 Reference database | |
120 | |
121 All the databases we format (on demand) for RDPClassifier and NCBI Blast+ are inventoried here: http://genoweb.toulouse.inra.fr/frogs_databanks/assignation/readme.txt | |
122 | |
123 .. class:: infomark page-header h2 | |
124 | |
125 How it works | |
126 | |
127 .. csv-table:: | |
128 :header: "Steps", "Description" | |
129 :widths: 5, 150 | |
130 :class: table table-striped | |
131 | |
132 "1", "`RDPClassifier <https://rdp.cme.msu.edu/tutorials/classifier/RDPtutorial_RDP_classifier.html>`_ may be used with database to associate to each OTU a taxonomy and a bootstrap (example: *Bacteria;(1.0);Firmicutes;(1.0);Clostridia;(1.0);Clostridiales;(1.0);Clostridiaceae 1;(1.0);Clostridium sensu stricto;(1.0);*)." | |
133 "2", "`blastn+ <http://blast.ncbi.nlm.nih.gov/>`_ or `needlall <http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/needleall.html>`_ is used to find alignment between each OTU and the database. Only the bests hits with the same score are reported. blastn+ is used for merged read pair, and needall is used for artificially combined sequence. For each alignment returned, several metrics are computed: identity percentage, coverage percentage, and alignment length" | |
134 "3", "For each OTU with several blastn+/needlall alignment results a consensus is determined on each taxonomic level. If all the taxa in a taxonomic rank are identical the taxon name is reported otherwise *Multi-affiliation* is reported. By example, if you have an OTU with two corresponding sequences, the first is a *Bacteria;Proteobacteria;Gamma Proteobacteria;Enterobacteriales*, the second is a *Bacteria;Proteobacteria;Beta Proteobacteria;Methylophilales*, the consensus will be *Bacteria;Proteobacteria;Multi-affiliation;Multi-affiliation*." | |
135 | |
136 .. class:: infomark page-header h2 | |
137 | |
138 Alignment metrics details on identity percentage calculation | |
139 | |
140 With classicala %id computation method, we will obtain for overlapped amplicon sequence and for artificially combined amplicon sequence : | |
141 | |
142 * **Case 1: a sequencing of overlapping sequences i.e. 16S V3-V4 amplicon MiSeq sequencing** | |
143 | |
144 .. image:: static/images/frogs_images/FROGS_affiliation_overlapped_percent_id.png | |
145 :height: 325 | |
146 :width: 807 | |
147 | |
148 * **Case 2 : a sequencing of non-overlapping sequences: case of ITS1 amplicon MiSeq sequencing** | |
149 | |
150 .. image:: static/images/frogs_images/FROGS_affiliation_combined_percent_id.png | |
151 :height: 310 | |
152 :width: 887 | |
153 | |
154 * **Finally, how percentage identity is computed ?** | |
155 | |
156 With the classical method of %id calculation, filtering on %id will systematically removed “FROGS combined” OTUs. So, we proposed to replace the classical %id by a %id computed on the sequenced bases only. | |
157 | |
158 .. image:: static/images/frogs_images/FROGS_affiliation_percent_id_formula.png | |
159 :height: 36 | |
160 :width: 637 | |
161 | |
162 For the precedent use cases we will obtain: | |
163 | |
164 * Case 1: 16S V3V4 overlapped sequence | |
165 % sequenced bases identity = 400 matches / 400 bp = 100% | |
166 | |
167 * Case 2: very large ITS1 “FROGS combined” shorter than the real sequence | |
168 % sequenced bases identity = (250 + 250 ) / (600 - 100) = 100% | |
169 | |
170 This calculation allows to return 100% of identity on sequenced bases for “FROGS combined” shorter or longer than reality in case of perfect sequencing, and a smaller percentage of identity in the case of small overlap repeat kept in FROGS combined sequence. | |
171 | |
172 .. class:: infomark page-header h2 | |
173 | |
174 | |
175 Advices | |
176 | |
177 This tool can take large time. It is recommended to filter your abundance and your sequence file before this tool (see **FROGS Filters**). | |
178 | |
179 As you can see the affiliation of each OTU is not human readable in outputed abundance file. We provide a tools to convert these BIOM file in tabulated file, see the **FROGS BIOM to TSV** tool. | |
180 | |
181 | |
182 ---- | |
183 | |
184 **Contact** | |
185 | |
186 Contacts: frogs@inra.fr | |
187 | |
188 Repository: https://github.com/geraldinepascal/FROGS | |
189 website: http://frogs.toulouse.inra.fr/ | |
190 | |
191 Please cite the **FROGS article**: *Escudie F., et al. Bioinformatics, 2018. FROGS: Find, Rapidly, OTUs with Galaxy Solution.* | |
192 | |
193 </help> | |
194 <citations> | |
195 <citation type="doi">10.1093/bioinformatics/btx791</citation> | |
196 </citations> | |
197 | |
198 </tool> |