Description

Collector's curve calculation based on mothur's collect.single command.

Collector's curves can be calculated using calculators, that describe the richness, diversity, and other features of individual samples. Collector's curves describe how richness or diversity change as you sample additional individuals. If a collector's curve becomes parallel to the x-axis, you can be reasonably confident that you have done a good job of sampling and can trust the last value in the curve. Otherwise, you need to keep sampling. For calculator parameter choices see: mothur_wiki

Input

OTU list: rabund, sabund, list or shared format

Parameters

Labels - OTU labels

Select labels you want the collector's curve calculated for (e.g. lines labelled 0.03) by default collector's curves will be calculated for all labels listed

Calculators

Find following the list of available calculators (see mothur_wiki for a more detailed description) and select calculators for calculating collector's curves . Default selection: chao - Community richness, npshannon (non-parametric) and invsimpson - Community diversity

chao	Community richness the Chao1 estimator
invsimpson	Community diversity the Simpson index
npshannon	Community diversity the non-parametric Shannon index
ace	Community richness the ACE estimator
bootstrap	Community richness the bootstrap estimator
jack	Community richness the jackknife estimator
sobs	Community richness the observed richness
simpsoneven	Community evenness a Simpson index-based measure of evenness
shannoneven	Community evenness a Shannon index-based measure of evenness
heip	Community evenness Heip's metric of community evenness
smithwilson	Community evenness Smith and Wilson's metric of community evenness
bergerparker	Community diversity the Berger-Parker index
coverage	Community diversity the sampling coverage
goodscoverage	Community diversity the Good's estimate of sampling coverage
simpson	Community diversity the Simpson index
qstat	Community diversity the Q statistic
shannon	Community diversity the Shannon index
boneh	Estimator Boneh's estimator
efron	Estimator Efron's estimator
shen	Estimator Shen's estimator
solow	Estimator Solow's estimator
logseries	Statistical distribution tests whether observed data follow the log series distribution
geometric	Statistical distribution tests whether observed data follow the geometric series distribution
bstick	Statistical distribution tests whether observed data follow the broken stick distribution
nseqs	Utility the number of sequences in a sample

Optional advanced parameters

ACE estimator threshold: By default the ACE estimator uses 10 as the cutoff between OTUs that are rare and abundant. So if an OTU has more than 10 individuals in it, then it is considered abundant. This is really just an empirical decision and we are merely following the lead of Anne Chao and others who implement 10 in their software. If you would like to use a different cutoff, you can use the abund option.
Size - Sample Size: Within the suite of calculators available in mothur are a set that will predict the number of additional OTUs that will be observed for a given sample size. By default these calculators will base the prediction on a sample that is the same size as the initial sampling. If you would like to use a different sample size, use the size option. The value of size should be between 1 and the size of the initial sampling.
Frequency: For larger datasets you might not be interested in obtaining all of the data for the number of sequences sampled. For instance, if you have 100,000 sequences, you may only want to output the data every 100 sequences. Alternatively, if you only have 100 sequences, you may only want to output all of the data. The default setting is to output data every 100 sequences.

Output

Please note, the number of outputs is depending on the number of selected calculators. Each selected calculator will result in an extra output, which is indicated by the calculator name in brackets at the end of the output's filename. In case the outputs for the selected calculators are not showing in the History panel, refresh your history by clicking on the refresh icon.

A summary file in table format containing the following fields, number of sequences, the sample coverage, the number of observed OTUs, the chao richness estimate, the invsimpson diversity estimate, and the npshannon non-parametric diversity estimate. The summary gives results for each of the listed fields when all available data is used.
Followed by a file for each calculator selected (indicated by the calculator's name in brackets at the end of the output's filename) which can be plotted as collector's curve and used to evaluate how the results of the calculator change with sampling effort.

Use Galaxy's integrated visualization tool to plot the collector's curve. The visualization tool is accessible via the 'Visualize' icon in the extended dataset information area. After having launched the integrated visualization tool select the 'Data Controls' tab. In the 'Data Controls' tab select column 1 (number sampled) as 'Data column for X' and use column 2 (minimum identity, according to select labels) as 'Data column for Y'.

Resources

mothur

Author

Patrick D. Schloss (pschloss@umich.edu)

Wrapper Author

QFAB Bioinformatics (support@qfab.org) based on jjohnson mothur_toolsuite wrapper