Nonpareil uses the redundancy of the reads in metagenomic datasets to estimate the average coverage and predict the amount of sequences that will be required to achieve "nearly complete coverage".
Nonpareil outputs three files and one optional log file:
- Redundancy summary is a tab-delimited file with six columns. The first column indicates the sequencing effort (in number of reads), and the remaining columns indicate the summary of the distribution of redundancy (from the replicates, 1,024 by default) at the given sequencing effort. These five columns are: average redundancy, standard deviation, quartile 1, median (quartile 2), and quartile 3.
- Redundancy values is a tab-delimited file with three columns. Similar to the redundancy summary file, it contains information about the redundancy at each sequencing effort, but it provides ALL the results from the replicates, not only the summary at each point. The first column indicates the sequencing effort (as a fraction of the dataset), the second column indicates the ID of the replicate (a number used only to introduce some controlled noise in plots), and the third column indicates the estimated redundancy value.
- Mates distribution is a raw list with the number of reads in the dataset matching a query read. A set of query reads is randomly drawn by Nonpareil (1,000 by default), and compared against all reads in the dataset. Each line on this file corresponds to a query read (the order is not important). We have seen certain correspondance between these numbers and the distribution of abundances in the community (compared, for example, as rank-abundance plots), but this file is provided only for quality-control purposes and comparisons with other tools.
- Log is a verbose log of internal Nonpareil processing. The number to the left (inside squared brackets) indicate the CPU time (in minutes). This file also provide quality assessment of the Nonpareil run (automated consistency evaluation). Ideally, the last line should read "Everything seems correct". Otherwise, it suggests alternative parameters that may improve the estimation.
For more details about the tool, please check: http://nonpareil.readthedocs.io/en/latest/index.html