Evaluate expected vs. observed taxonomic composition of samples
This visualizer compares the feature composition of pairs of observed and
expected samples containing the same sample ID in two separate feature
tables. Typically, feature composition will consist of taxonomy
classifications or other semicolon-delimited feature annotations. Taxon
accuracy rate, taxon detection rate, and linear regression scores between
expected and observed observations are calculated at each semicolon-
delimited rank, and plots of per-level accuracy and observation
correlations are plotted. A histogram of distance between false positive
observations and the nearest expected feature is also generated, where
distance equals the number of rank differences between the observed feature
and the nearest common lineage in the expected feature. This visualizer is
most suitable for testing per-run data quality on sequencing runs that
contain mock communities or other samples with known composition. Also
suitable for sanity checks of bioinformatics pipeline performance.
Parameters
- expected_features : FeatureTable[RelativeFrequency]
- Expected feature compositions
- observed_features : FeatureTable[RelativeFrequency]
- Observed feature compositions
- depth : Int, optional
- Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 =
root, 7 = species for the greengenes reference sequence database).
- palette : Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow'), optional
- Color palette to utilize for plotting.
- plot_tar : Bool, optional
- Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true
positive features divided by the total number of observed features (TAR
= true positives / (true positives + false positives)).
- plot_tdr : Bool, optional
- Plot taxon detection rate (TDR) on score plot. TDR is the number of
true positive features divided by the total number of expected features
(TDR = true positives / (true positives + false negatives)).
- plot_r_value : Bool, optional
- Plot expected vs. observed linear regression r value on score plot.
- plot_r_squared : Bool, optional
- Plot expected vs. observed linear regression r-squared value on score
plot.
- plot_bray_curtis : Bool, optional
- Plot expected vs. observed Bray-Curtis dissimilarity scores on score
plot.
- plot_jaccard : Bool, optional
- Plot expected vs. observed Jaccard distances scores on score plot.
- plot_observed_features : Bool, optional
- Plot observed features count on score plot.
- plot_observed_features_ratio : Bool, optional
- Plot ratio of observed:expected features on score plot.
- metadata : MetadataColumn[Categorical], optional
- Optional sample metadata that maps observed_features sample IDs to
expected_features sample IDs.
Returns
visualization : Visualization