What it does
Resolves multi-hit ambiguities if exact amplicon length are available and aggregrated OTUs sharing same taxonomy based on alignment metrics thresholds
Inputs/outputs
Inputs
Abundance file:
The abundance of each OTU in each sample (format BIOM) with taxonomic affiliations metadata.
Sequence file:
The sequences (format FASTA) of each OTU seed.
Reference file (optionnal):
The exact amplicon reference sequences (format FASTA).
Outputs
Abundance file:
The abundance file of OTUs and aggregated OTUs, with their affiliation (format BIOM) and with potentially less ambiguities.
Sequence file:
The sequences (format FASTA) of each aggregated OTU seed.
Composition file:
The aggregation composition file (format text) describing the composition of each resulting OTU.
How it works
If a reference fasta file is provided, for each OTU with multiaffiliation, among the different possible affiliations, we only keep the affiliation of the sequence with the shorter length. The aim is to resolve ambiguities due to potential inclusive sequences such as ITS.
Second step is the OTUs aggregation that share the same taxonomy inferred on alignment metrics. We start with the most abundant OTU. If an OTU shares at least one affiliation with another OTU with at least I% of identity and C% of alignment coverage, so the OTUs are aggregated together (The different affiliations, which then generate the multi-affiliation tag, are merged, abundance counts are summed). The seed of the most abundant OTU is kept. ----
Contact
Support: please contact first your galaxy support team.
Contacts: frogs@inra.fr
Repository: https://github.com/geraldinepascal/FROGS website: http://frogs.toulouse.inra.fr/
Please cite the FROGS article: Escudie F., et al. Bioinformatics, 2018. FROGS: Find, Rapidly, OTUs with Galaxy Solution.