Mercurial > repos > eslerm > vkmz
diff README @ 6:35b984684450 draft
planemo upload for repository https://github.com/HegemanLab/VKMZ commit 5ef8d2b36eb35ff5aad5d5e9b78c38405fc95c1a
author | eslerm |
---|---|
date | Tue, 10 Jul 2018 17:58:35 -0400 |
parents | 04079c34452a |
children | b0ce669ce794 |
line wrap: on
line diff
--- a/README Thu May 31 12:06:20 2018 -0400 +++ b/README Tue Jul 10 17:58:35 2018 -0400 @@ -1,72 +1,31 @@ -# VKMZ version 1.0 +# VKMZ version 1.2.0 + +VKMZ is a metabolomics prediction and vizualization tool which creates van Krevelen diagrams from mass spectrometry data. A van Krevelen diagram (VKD) plots a molecule on a 2D scatterplot based on the molecule's oxygen to carbon ratio (O:C) against it's hydrogen to carbon ratio (H:C). Classes of metabolites cluster together on a VKD [0]. Plotting a complex mixture of metabolites on a VKD can be used to briefly convey untargeted metabolomics data. -VKMZ is a metabolomics vizualization tool which creates van Krevelen diagrams from mass spectrometry data. A van Krevelen diagram (VKD) plots a molecule on a scatterplot based on the molecule's oxygen to carbon ratio (O:C) against it's hydrogen to carbon ratio (H:C). Classes of metabolites cluster together on a VKD [0]. Plotting a complex mixture of metabolites on a VKD can be used to briefly convey untargeted metabolomics data. +VKMZ attempts to predict a molecular formula for each feature in LC-MS data. Each feature's mass is compared to a database of known formula masses. A prediction is made when a known mass is within the mass error range of an feature's uncharged (neutral) mass. A binary search algorithm is used to quickly make matches. Heristically generated databases for labeled and unlabeled metabolites are included [1]. VKMZ finds all predictions for an observed mass within the mass error. The prediction with the lowest delta (absolute difference between an feature's neutral mass and the predicted mass) is plotted. Features without predictions are discarded. Outputed is saved as a tabular and html file. -For each feature in the data VKMZ attempts to predict a molecular formula by comparing the feature's mass to a database of known formula masses though a binary search. Heristically generated databases for labeled and unlabeled data are included with VKMZ [1]. A prediction is made when a known mass is within a mass error of the observed mass. VKMZ finds all predictions for an observed mass within the mass error. The prediction with the lowest delta (absolute difference between an observed and predicted mass) is plotted. Features without predictions are discarded. Using low resolution data may result in finding too many predictions per feature to be useful, especially for large mass metabolites. A VKD is created from predictions and outputed as a tabular and html file. Predictions and original feature information can be found in VKMZ' output. +This software works best with, accurate, high resolution LC-MS data. A well calibrated LC-MS is essential for correct predictions. It is best to emperically derive mass error etiher from the data or from data using the same methods and spiked standards. Using low resolution data will result in false positive predictions, especially for large mass metabolites. VKMZ can be used as a command line tool or on the Galaxy web platform [2]. A Galaxy wrapper for VKMZ is maintatined in this repository. VKMZ was developed on the Workflow4Metabolomics version of Galaxy [3]. -## Using VKMZ from command line - -VKMZ is designed to use data processed by XCMS [4] as input. Tabular data can also be used as input. +## Using VKMZ command line ### Input modes -VKMZ has three modes: +VKMZ has two input modes: 1. `xcms` mode reads features from XCMS data 2. `tsv` mode reads a specially formatted tabular file - 3. `plot` mode replots VKMZ tabular data Select a mode by declaring it as the first argument to `vkmz.py`. > **Example:** > ``` -> python vkmz.py xcms [options] +> python vkmz.py xcms [other parameters] > ``` Different modes allow different parameters. -### All modes - -All modes require an output parameter: - * `--output [FILENAME]` - * A `.tsv` and `.html` file will be generated by VKMZ with the given filename - -All modes allow these optional parameters: - * `--plot-type [scatter-2d]` - * There is currently only one plot type - * Default is `scatter-2d` - * `--size [INTEGER]` - * Set base size of marker dots on VKD - * Default size is 5 - * `--size-algorithm [{1,2}]` - * Choose one of the following algorithms: - * 1: Sets all markers to the base size specified by `--size` - * Default - * 2: Marker sizes are relative to feature's log intensity - -#### xcms and tsv modes - -Both xcms and tsv mode require the mass error, in parts-per-million, of the mass spectrometer which generated the data: - * `--error [PPM_ERROR_NUMBER]` - * It is critical to set the error correctly - -There are several optional parameters for xcms and tsv modes: - * `--no-adjustment` - * Using this flag disables nominal mass adjustment - * Without this flag VKMZ adjusts feature masses by adding or removing that mass of a proton based on the features polarity - * `--database [DATABASE_FILE_PATH]` - * Default is BMRB's monoisotopic heuristically generated database - * This path is relative - * `--directory [TOOL_PATH]` - * Explicitly define tool directory - * Sets root directory for database file path - * `--polarity` - * Set polarity for all samples overriding input files - * `--unique` - * Remove features with multiple predictions - * `--no-plot` - * Disable html output +### Required parameters #### xcms mode @@ -77,24 +36,56 @@ ##### xcms mode example: ``` -python vkmz.py xcms --data-matrix test-data/datamatrix.tabular --sample-metadata test-data/sampleMetadata.tabular --variable-metadata test-data/variableMetadata.tabular --output report --error 3 +python vkmz.py xcms --data-matrix test-data/datamatrix.tabular --sample-metadata test-data/sampleMetadata.tabular --variable-metadata test-data/variableMetadata.tabular [other parameters] ``` #### tsv mode tsv mode requires a tabular file of a specific format as input: - * `--input [TSV FILE]` + * `--input [TSV_FILE]` The first five columns of the input tabular file must be: ->| sample ID | polarity | mz | retention time | intensity | ->|-----------|----------|----|----------------|-----------| +>| sample_id | polarity | mz | rt | intensity | +>|-----------|----------|----|----|-----------| + + +#### All modes + +Mass error of LC-MS in parts-per-million: + * `--error [PPM_ERROR_NUMBER]` + * It is critical to set the mass error correctly -#### plot mode +Output name: + * `--output [FILENAME]` + * A `.tsv` and `.html` file will be generated by VKMZ with the given filename + +### Optional parameters + +Database: + * `--database [DATABASE_FILE_PATH]` + * Default is BMRB's monoisotopic heuristically generated database + * Path is relative to `--directory` -plot mode reads a previously generated VKMZ tabular files to create VKD html files. +Directory: + * `--directory [TOOL_PATH]` + * Explicitly define tool directory + * Paths are relative if unset + * Affects database and web page template paths -Specifying the VKMZ tabular file is required: - * `--input [VKMZ_TSV_FILE]` +Forced Polarity: + * `--polarity [positive|negative]` + * Set all features to have either a positive or negative polarity + * Overrides input files polarity information + * Do not use this parameter on data containing both polarities + +Neutral: + * `--neutral` + * Using this flag disables charged mass adjustment + * Without this flag VKMZ adjusts a feature mass by adding or removing that mass of a proton based on the features charged polarity + +Unique: + * `--unique` + * Remove features with multiple predictions from output ## Special thanks to