Mercurial > repos > frogs > frogs_3_1_0
diff affiliation_OTU.xml @ 0:59bc96331073 draft default tip
planemo upload for repository https://github.com/geraldinepascal/FROGS-wrappers/tree/v3.1.0 commit 08296fc88e3e938c482c631bd515b3b7a0499647
author | frogs |
---|---|
date | Thu, 28 Feb 2019 10:14:49 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/affiliation_OTU.xml Thu Feb 28 10:14:49 2019 -0500 @@ -0,0 +1,198 @@ +<?xml version="1.0"?> +<!-- +# Copyright (C) 2015 INRA +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +--> +<tool id="FROGS_affiliation_OTU" name="FROGS Affiliation OTU" version="3.1"> + <description>Taxonomic affiliation of each OTU's seed by RDPtools and BLAST</description> + <requirements> + <requirement type="package" version="3.1.0">frogs</requirement> + </requirements> + <stdio> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <command> + #set $reference_filename = str( $ref_file.fields.path ) + affiliation_OTU.py + --reference "${reference_filename}" + --input-biom $biom_abundance + --input-fasta $fasta_sequences + --output-biom $biom_affiliation + --summary $summary + --nb-cpus \${GALAXY_SLOTS:-1} + --java-mem $mem + #if $rdp + --rdp + #end if + </command> + <inputs> + <!-- JOB Parameter --> + <param name="mem" type="hidden" label="Memory allocation" help="The number of Go to allocation for java" value="20"></param> + <!-- Database Choice --> + <param name="ref_file" type="select" label="Using reference database" help="Select reference from the list"> + <options from_data_table="frogs_db"></options> + <validator type="no_options" message="A built-in database is not available" /> + </param> + <param name="rdp" type="boolean" label="Also perform RDP assignation?" help="Taxonomy affiliation will be perform thanks to Blast. This option allow you to perform it also with RDP classifier (default No)" /> + <!-- Files --> + <param format="fasta" name="fasta_sequences" type="data" label="OTU seed sequence" help="OTU sequences (format: fasta)."/> + <param format="biom1" name="biom_abundance" type="data" label="Abundance file" help="OTU abundances (format: BIOM)."/> + </inputs> + <outputs> + <data format="biom1" name="biom_affiliation" label="${tool.name}: affiliation.biom" from_work_dir="affiliation.biom" /> + <data format="html" name="summary" label="${tool.name}: report.html" from_work_dir="report.html"/> + </outputs> + <tests> + <test> + <param name="ref_file" value="ITS1_test"/> + <param name="fasta_sequences" value="references/04-filters.fasta"/> + <param name="biom_abundance" value="references/04-filters.biom"/> + <!-- <param name="fasta_sequences" value="swarm.fasta"/> + <param name="biom_abundance" value="swarm.biom"/> --> + <output name="biom_affiliation" file="references/06-affiliation.biom"/> + <output name="summary" file="references/06-affiliation.html" compare="sim_size" delta="0" /> + </test> + </tests> + + <help> + +.. image:: static/images/frogs_images/FROGS_logo.png + :height: 144 + :width: 110 + + +.. class:: infomark page-header h2 + +What it does + +Add taxonomic affiliation in abundance file. + + +.. class:: infomark page-header h2 + +Inputs/outputs + +.. class:: h3 + +Inputs + +**Sequence file**: + +The sequences (format `FASTA <https://en.wikipedia.org/wiki/FASTA_format>`_). + +**Abundance file**: + +The abundance of each OTU in each sample (format `BIOM <http://biom-format.org/>`_). + +.. class:: h3 + +Outputs + +**Abundance file** (tax_affiliation.biom): + + The abundance file with affiliation (format `BIOM <http://biom-format.org/>`_). + +**Summary file** (report.html): + + This file presents the number of sequences affiliated by blast, and the number of multi-affiliation (format `HTML <https://en.wikipedia.org/wiki/HTML>`_). + + .. image:: static/images/frogs_images/FROGS_affiliation_summary.png + :height: 800 + :width: 600 + + +.. class:: infomark page-header h2 + +Reference database + +All the databases we format (on demand) for RDPClassifier and NCBI Blast+ are inventoried here: http://genoweb.toulouse.inra.fr/frogs_databanks/assignation/readme.txt + +.. class:: infomark page-header h2 + +How it works + +.. csv-table:: + :header: "Steps", "Description" + :widths: 5, 150 + :class: table table-striped + + "1", "`RDPClassifier <https://rdp.cme.msu.edu/tutorials/classifier/RDPtutorial_RDP_classifier.html>`_ may be used with database to associate to each OTU a taxonomy and a bootstrap (example: *Bacteria;(1.0);Firmicutes;(1.0);Clostridia;(1.0);Clostridiales;(1.0);Clostridiaceae 1;(1.0);Clostridium sensu stricto;(1.0);*)." + "2", "`blastn+ <http://blast.ncbi.nlm.nih.gov/>`_ or `needlall <http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/needleall.html>`_ is used to find alignment between each OTU and the database. Only the bests hits with the same score are reported. blastn+ is used for merged read pair, and needall is used for artificially combined sequence. For each alignment returned, several metrics are computed: identity percentage, coverage percentage, and alignment length" + "3", "For each OTU with several blastn+/needlall alignment results a consensus is determined on each taxonomic level. If all the taxa in a taxonomic rank are identical the taxon name is reported otherwise *Multi-affiliation* is reported. By example, if you have an OTU with two corresponding sequences, the first is a *Bacteria;Proteobacteria;Gamma Proteobacteria;Enterobacteriales*, the second is a *Bacteria;Proteobacteria;Beta Proteobacteria;Methylophilales*, the consensus will be *Bacteria;Proteobacteria;Multi-affiliation;Multi-affiliation*." + +.. class:: infomark page-header h2 + +Alignment metrics details on identity percentage calculation + +With classicala %id computation method, we will obtain for overlapped amplicon sequence and for artificially combined amplicon sequence : + +* **Case 1: a sequencing of overlapping sequences i.e. 16S V3-V4 amplicon MiSeq sequencing** + +.. image:: static/images/frogs_images/FROGS_affiliation_overlapped_percent_id.png + :height: 325 + :width: 807 + +* **Case 2 : a sequencing of non-overlapping sequences: case of ITS1 amplicon MiSeq sequencing** + +.. image:: static/images/frogs_images/FROGS_affiliation_combined_percent_id.png + :height: 310 + :width: 887 + +* **Finally, how percentage identity is computed ?** + +With the classical method of %id calculation, filtering on %id will systematically removed “FROGS combined” OTUs. So, we proposed to replace the classical %id by a %id computed on the sequenced bases only. + +.. image:: static/images/frogs_images/FROGS_affiliation_percent_id_formula.png + :height: 36 + :width: 637 + +For the precedent use cases we will obtain: + +* Case 1: 16S V3V4 overlapped sequence + % sequenced bases identity = 400 matches / 400 bp = 100% + +* Case 2: very large ITS1 “FROGS combined” shorter than the real sequence + % sequenced bases identity = (250 + 250 ) / (600 - 100) = 100% + +This calculation allows to return 100% of identity on sequenced bases for “FROGS combined” shorter or longer than reality in case of perfect sequencing, and a smaller percentage of identity in the case of small overlap repeat kept in FROGS combined sequence. + +.. class:: infomark page-header h2 + + +Advices + +This tool can take large time. It is recommended to filter your abundance and your sequence file before this tool (see **FROGS Filters**). + +As you can see the affiliation of each OTU is not human readable in outputed abundance file. We provide a tools to convert these BIOM file in tabulated file, see the **FROGS BIOM to TSV** tool. + + +---- + +**Contact** + +Contacts: frogs@inra.fr + +Repository: https://github.com/geraldinepascal/FROGS +website: http://frogs.toulouse.inra.fr/ + +Please cite the **FROGS article**: *Escudie F., et al. Bioinformatics, 2018. FROGS: Find, Rapidly, OTUs with Galaxy Solution.* + + </help> + <citations> + <citation type="doi">10.1093/bioinformatics/btx791</citation> + </citations> + +</tool>