Galaxy | Tool Preview

RDP MultiClassifier (version 1.1)
The Multi-Classifier provides two training models: 16S rRNA or Fungal LSU genes.
This is to complete the intermediate OTU Table generated by the 'Map Reads to OTU' tool of the metagenomics workflow. The intermediate OTU table and relabelled OTUs output files of the 'Map Reads to OTU' tool will be required.
Specifies the assignment confidence cutoff used to determine the assignment count in the hierarchical format. Range [0-1], Default is 0.8. For sequences shorter than 250 base pairs, the confidence threshold 50% is recommended to improve classification coverage.
Please see the description below on the 'Tab delimited output format' options.

Description

The RDP MultiClassifier allows rapid Assignment of rRNA sequences into the new bacterial taxonomy. This version of the RDP MultiClassifier allows the completion of the intermediate OTU table generated by the USEARCH - 'Map Reads to OTU' tool of the metagenomics workflow.


Input

No OTU Table generation selected:

  1. File of reads in FASTA format.

Input sequences should be at least 50bp for accurate results. Uppercase and lowercase formats are allowed.

OTU Table generations is selected:

  1. Relabelled OTU input reads in FASTA format of the 'Map Reads to OTU' tool.

Please note the 'relabelled OTU' output of the 'Map Reads to OTU' tool is hidden. To access the hidden output, click on the cog wheel in the upper right corner of the History panel and select 'Include Hidden Datasets'. The output dataset will appear with a dialog box. Follow the instruction in the dialog box and click 'here' to unhide the dataset.

  1. Pre-OTU Table of the 'Map Reads to OTU' tool in tabular format.

Parameters

Gene Trainings Model
RDP naive Bayesian Classifier offers two hierarchy models for 16S rRNA and Fungal LSU genes
OTU Table generation
For OTU Table generation, check the above checkbox and provide the intermediate OTU table (Pre-OTU Table) and the 'relabelled OTU' input reads of the 'Map Reads to OTU' tool of the metagenomics workflow.
Confidence cutoff
Used to determine the assignment count in the hierarchial format. Range[0-1], default is 0.8. For sequences shorter than 250 base pairs, the confidence threshold 50% is recommended to improve classification coverage.
Tab delimited output format
  1. allrank: outputs the results for all ranks applied for each sequence: seqname, orientation, taxon name, rank, conf, etc
  2. fixrank: only outputs the results for fixed ranks in order: domain, phylum, class, order, family, genus
  3. db: outputs the seqname, trainset_no, tax_id, conf

Output

The tool generates 2 or 3 outputs depending if 'OTU Table generations' is selected.

No OTU Table generation selected:

  1. Sequence count for each taxon in the hierarchy in tab-format: classification_assignment_hierarchical.tab
  2. Sequence-by-sequence classification results including confidence scores at each level of the hierarchy in tab-format: classification_assignment_details.tab

OTU Table generations is selected:

  1. Sequence count for each taxon in the hierarchy in tab-format: classification_assignment_hierarchical.tab
  2. Sequence-by-sequence classification results including confidence scores at each level of the hierarchy in tab-format: classification_assignment_details.tab
  3. OTU Table in tab-format

Resources

RDP_MultiClassifier_Tutorial

Wrapper Author

QFAB Bioinformatics (support@qfab.org)