view README.md @ 1:b02af8eb8e6e draft

planemo upload for repository https://github.com/HegemanLab/VKMZ commit 5e7a43415df3902b44b7623cb2c6ffb8845751ac
author eslerm
date Wed, 30 May 2018 13:17:32 -0400
parents 0b8ddf650752
children
line wrap: on
line source

# VKMZ version 1.0 

VKMZ is a metabolomics vizualization tool which creates van Krevelen diagrams from mass spectrometry data. A van Krevelen diagram (VKD) plots a molecule on a scatterplot based on the molecule's oxygen to carbon ratio (O:C) against it's hydrogen to carbon ratio (H:C). Classes of metabolites cluster together on a VKD [0]. Plotting a complex mixture of metabolites on a VKD can be used to briefly convey untargeted metabolomics data.

For each feature in the data VKMZ attempts to predict a molecular formula by comparing the feature's mass to a database of known formula masses though a binary search. Heristically generated databases for labeled and unlabeled data are included with VKMZ [1]. A prediction is made when a known mass is within a mass error of the observed mass. VKMZ finds all predictions for an observed mass within the mass error. The prediction with the lowest delta (absolute difference between an observed and predicted mass) is plotted. Features without predictions are discarded. Using low resolution data may result in finding too many predictions per feature to be useful, especially for large mass metabolites. A VKD is created from predictions and outputed as a tabular and html file. Predictions and original feature information can be found in VKMZ' output.

VKMZ can be used as a command line tool or on the Galaxy web platform [2]. A Galaxy wrapper for VKMZ is maintatined in this repository. VKMZ was developed on the Workflow4Metabolomics version of Galaxy [3].

## Using VKMZ from command line

VKMZ is designed to use data processed by XCMS [4] as input. Tabular data can also be used as input.

### Input modes

VKMZ has three modes:
  1. `xcms` mode reads features from XCMS data
  2. `tsv` mode reads a specially formatted tabular file
  3. `plot` mode replots VKMZ tabular data

Select a mode by declaring it as the first argument to `vkmz.py`.

> **Example:**
> ```
> python vkmz.py xcms [options]
> ```

Different modes allow different parameters.

### All modes

All modes require an output parameter:
  * `--output [FILENAME]`
    * A `.tsv` and `.html` file will be generated by VKMZ with the given filename

All modes allow these optional parameters:
  * `--plot-type [scatter-2d]`
    * There is currently only one plot type
    * Default is `scatter-2d`
  * `--size [INTEGER]`
    * Set base size of marker dots on VKD
    * Default size is 5
  * `--size-algorithm [{1,2}]`
    * Choose one of the following algorithms:
      * 1: Sets all markers to the base size specified by `--size`
        * Default
      * 2: Marker sizes are relative to feature's log intensity

#### xcms and tsv modes

Both xcms and tsv mode require the mass error, in parts-per-million, of the mass spectrometer which generated the data:
  * `--error [PPM_ERROR_NUMBER]`
    * It is critical to set the error correctly

There are several optional parameters for xcms and tsv modes:
  * `--no-adjustment`
    * Using this flag disables nominal mass adjustment
    * Without this flag VKMZ adjusts feature masses by adding or removing that mass of a proton based on the features polarity
  * `--database [DATABASE_FILE_PATH]`
    * Default is BMRB's monoisotopic heuristically generated database
    * This path is relative
  * `--directory [TOOL_PATH]`
    * Explicitly define tool directory
    * Sets root directory for database file path
  * `--no-plot`
    * Disable html output

#### xcms mode

xcms mode requires three tabular files generated by XCMS:
  * `--data-matrix [XCMS_DATA_MATRIX_FILE]`
  * `--sample-metadata [XCMS_SAMPLE_METADATAFILE]`
  * `--variable-metadata [XCMS_VARIABLE_METADATAFILE]`

##### xcms mode example:
```
python vkmz.py xcms --data-matrix test-data/datamatrix.tabular --sample-metadata test-data/sampleMetadata.tabular --variable-metadata test-data/variableMetadata.tabular --output report --error 3
```

#### tsv mode

tsv mode requires a tabular file of a specific format as input:
  * `--input [TSV FILE]`

The first five columns of the input tabular file must be:
>| sample ID | polarity | mz | retention time | intensity |
>|-----------|----------|----|----------------|-----------|

#### plot mode

plot mode reads a previously generated VKMZ tabular files to create VKD html files. 

Specifying the VKMZ tabular file is required:
  * `--input [VKMZ_TSV_FILE]`

## Special thanks to

Adrian, Art, Eric, Jerry, Kevin, Renata, Stephen, Tim, and Yuan.

## Citations

0. Brockman et al. [doi:10.1007/s11306-018-1343-y](https://doi.org/10.1007/s11306-018-1343-y)
1. Hegeman et al. [doi:10.1021/ac070346t](https://doi.org/10.1021/ac070346t)
2. [Galaxy Project](https://galaxyproject.org/)
3. [Workflow4Metabolomics](http://workflow4metabolomics.org/)
4. Smith et al. [doi:10.1021/ac051437y](https://www.ncbi.nlm.nih.gov/pubmed/16448051)