This tool can be used to remove chimeric sequences, i.e. sequences that can be constructed by combining a left-segment and a right-segment from two more abundant “parent” sequences.. Two methods to identify chimeras are supported: Identification from pooled sequences and identification by consensus across samples.
Note: pooled should only be used in combination with pooled denoising.
Input
Output
A data set of type: - dada2_sequenceTable (resp. dada2_mergepairs) if the input is of type dada2_sequenceTable (resp. dada2_mergepairs) - dada2_uniques otherwise
The frequency of chimeric sequences varies substantially from dataset to dataset, and depends on on factors including experimental procedures and sample complexity. Here chimeras make up about 21% of the merged sequence variants, but when we account for the abundances of those variants we see they account for only about 4% of the merged sequence reads.
Considerations for your own data: Most of your reads should remain after chimera removal (it is not uncommon for a majority of sequence variants to be removed though). If most of your reads were removed as chimeric, upstream processing may need to be revisited. In almost all cases this is caused by primer sequences with ambiguous nucleotides that were not removed prior to beginning the DADA2 pipeline. You can check for present primer sequences with the tool dada2: primer check
The intended use of the dada2 tools for paired sequencing data is shown in the following image.
Note: In particular for the analysis of paired collections the collections should be sorted lexicographical before the analysis.
For single end data you the steps "Unzip collection" and "mergePairs" are not necessary.
More information may be found on the dada2 homepage:: https://benjjneb.github.io/dada2/index.html (in particular tutorials) or the documentation of dada2's R package https://bioconductor.org/packages/release/bioc/html/dada2.html (in particular the pdf which contains the full documentation of all parameters)