VKMZ 1.4dev2

VKMZ is a metabolomics prediction and vizualization tool which creates van Krevelen diagrams from mass spectrometry data. A van Krevelen diagram (VKD) plots a molecule on a scatterplot by the molecules oxygen to carbon ratio (O:C) against it's hydrogen to carbon ratio (H:C). Classes of metabolites cluster together on a VKD [0]. Plotting a complex mixture of metabolites on a VKD briefly conveys untargeted metabolomics data.

Documentation

Input Data

VKMZ is designed to use W4M-XCMS [1] or tabular data as input.

W4M mode requires three files which W4M's XCMS wrapper generates: the data matrix, sample metadata, and variable metadata files.

Tabular mode requires a tab delimited file with the first five columns being: sample_id, polarity, mz, retention_time, and intensity.

Advanced Input Options

Override polarity allows users to set the polarity of all features to either Positive or Negative. Set this if input does not contain the correct polarity information. This option should not be used if data contains both positive and negative polarity.

Neutral disable mass adjustment of features in case nominal exact mass information is present instead of m/z.

Prediction Options

For each feature VKMZ attempts to predict a molecular formula by comparing the feature's uncharged mass to a database of known formula masses. A prediction is made when a known mass is within a mass error of observed, uncharged, mass. VKMZ finds all predictions for an observed mass within a specified mass error. The prediction with the lowest delta (absolute difference between observed and known mass) is plotted. Features without predictions are discarded. Using low resolution data may result in finding too many predictions per feature to be useful, especially for large mass metabolites.

Mass error sets the mass error in parts per million. Mass error will be specific to your mass spectrometer, calibration and other methods. Mass error can be approximated by running similar methods with targeted standards with a range in mass.

Database can be set to the provided heuristically generated databases for unlabeled and labeled molecules [2] or to a custom database.

Tabular Output

Tabular output contians the columns: sample_id, polarity, mz, rt (retentnion time), intensity, predictions (list of list-elements which contains: predicted mz, predicted formula, and predicted delta), hc (hydrogen to carbon ratio), oc, nc.

HTML Output

The HTML web page is an interactive van Krevelen diagram for exploring data.

Predicted features are plotted as circle symbols.

Min Size and Max Size sets the minimal and maximum area of symbols (absolute scaling).

The Sizer dropdown sets the algorithm to size each symbol.

Uniform sets the symbols of all features to the Max Size.

Relative Intensity sets the symbol size of each feature by the feature's intensity divided by the maxium intensity in the dataset mutiplied by the Maxium Symbol Size.

Relative Log Intensity sets the symbol size of each feature by the feature's log intensity divided by the maxium log intensity in the dataset mutiplied by the Maxium Symbol Size.

Threshold is a slider which removes low-intensity features. The slider exponentially scales.

Setting the slider to 50% removes features with intensities lower than 25% of the maximum inensity.

Setting the slider to 75% removes features with intensities lower than 50% of the maximum inensity.

x-axis and y-axis allow setting axis to alternate elemental ratios.

Opacity sets the opacity of feature symbols.

Visible Samples sets the visibility of features from given sample IDs. Checked samples are visible. By default all samples are visible.