This tool implements dada2's assignTaxonomy and assignSpecies functions.
Input
Output
The intended use of the dada2 tools for paired sequencing data is shown in the following image.
Note: In particular for the analysis of paired collections the collections should be sorted lexicographical before the analysis.
For single end data you the steps "Unzip collection" and "mergePairs" are not necessary.
More information may be found on the dada2 homepage:: https://benjjneb.github.io/dada2/index.html (in particular tutorials) or the documentation of dada2's R package https://bioconductor.org/packages/release/bioc/html/dada2.html (in particular the pdf which contains the full documentation of all parameters)
For ** taxonomy assignment ** the following is needed:
The reference fasta data base for taxonomic assignment (fasta or compressed fasta) needs to encode the taxonomy corresponding to each sequence in the fasta header lines in the following fashion (note, the second sequence is not assigned down to level 6):
>Level1;Level2;Level3;Level4;Level5;Level6; ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC >Level1;Level2;Level3;Level4;Level5; CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC
The list of required taxonomic ranks could be for instance: "Kingdom,Phylum,Class,Order,Family,Genus"
The reference data base for ** species assignment ** is a fasta file (or compressed fasta file), with the id line formatted as follows:
>ID Genus species ACCTAGAAAGTCGTAGATCGAAGTTGAAGCATCGCCCGATGATCGTCTGAAGCTGTAGCATGAGTCGATTTTCACATTCAGGGATACCATAGGATAC >ID Genus species CGCTAGAAAGTCGTAGAAGGCTCGGAGGTTTGAAGCATCGCCCGATGGGATCTCGTTGCTGTAGCATGAGTACGGACATTCAGGGATCATAGGATAC