comparison README.md @ 1:b02af8eb8e6e draft

planemo upload for repository https://github.com/HegemanLab/VKMZ commit 5e7a43415df3902b44b7623cb2c6ffb8845751ac
author eslerm
date Wed, 30 May 2018 13:17:32 -0400
parents 0b8ddf650752
children
comparison
equal deleted inserted replaced
0:0b8ddf650752 1:b02af8eb8e6e
1 # VKMZ version 1.0 1 # VKMZ version 1.0
2 2
3 VKMZ is a metabolomics vizualization tool which creates van Krevelen diagrams from mass spectrometry data. A van Krevelen diagram (VKD) plots a molecule on a scatterplot based on the molecule's oxygen to carbon ratio (O:C) against it's hydrogen to carbon ratio (H:C). Classes of metabolites cluster together on a VKD [0]. Plotting a complex mixture of metabolites on a VKD can be used to briefly convey untargeted metabolomics data. 3 VKMZ is a metabolomics vizualization tool which creates van Krevelen diagrams from mass spectrometry data. A van Krevelen diagram (VKD) plots a molecule on a scatterplot based on the molecule's oxygen to carbon ratio (O:C) against it's hydrogen to carbon ratio (H:C). Classes of metabolites cluster together on a VKD [0]. Plotting a complex mixture of metabolites on a VKD can be used to briefly convey untargeted metabolomics data.
4 4
5 VKMZ can be used as a standalone tool or on the Galaxy Project web platform [1]. 5 For each feature in the data VKMZ attempts to predict a molecular formula by comparing the feature's mass to a database of known formula masses though a binary search. Heristically generated databases for labeled and unlabeled data are included with VKMZ [1]. A prediction is made when a known mass is within a mass error of the observed mass. VKMZ finds all predictions for an observed mass within the mass error. The prediction with the lowest delta (absolute difference between an observed and predicted mass) is plotted. Features without predictions are discarded. Using low resolution data may result in finding too many predictions per feature to be useful, especially for large mass metabolites. A VKD is created from predictions and outputed as a tabular and html file. Predictions and original feature information can be found in VKMZ' output.
6 ## Using VKMZ
7 6
8 VKMZ is designed to use XCMS [2] data as input. Tabular data can also be used as input. For each feature in the data VKMZ attempts to predict it's molecular formula by comparing the features mass to a database of known formula masses. Heristically generated databases for unlabeled and labeled data is included with VKMZ. Users can define their own database. A VKD is created from formulas with predictions and outputed as a webpage and tabular file. 7 VKMZ can be used as a command line tool or on the Galaxy web platform [2]. A Galaxy wrapper for VKMZ is maintatined in this repository. VKMZ was developed on the Workflow4Metabolomics version of Galaxy [3].
8
9 ## Using VKMZ from command line
10
11 VKMZ is designed to use data processed by XCMS [4] as input. Tabular data can also be used as input.
9 12
10 ### Input modes 13 ### Input modes
11 14
12 VKMZ has three modes: 15 VKMZ has three modes:
13 1. `tsv` mode reads a specially formatted tabular file 16 1. `xcms` mode reads features from XCMS data
14 2. `xcms` mode reads features in [XCMS](https://bioconductor.org/packages/release/bioc/html/xcms.html) data 17 2. `tsv` mode reads a specially formatted tabular file
15 3. `plot` mode replots VKMZ tabular data 18 3. `plot` mode replots VKMZ tabular data
16 19
17 Select a mode by declaring it as the first argument to `vkmz.py`. 20 Select a mode by declaring it as the first argument to `vkmz.py`.
18 21
19 > **Example:** 22 > **Example:**
20 > ``` 23 > ```
21 > python vkmz.py xcms [options] 24 > python vkmz.py xcms [options]
22 > ``` 25 > ```
23 26
24 Different modes take different parameters. 27 Different modes allow different parameters.
28
29 ### All modes
25 30
26 All modes require an output parameter: 31 All modes require an output parameter:
27 * `--output [FILENAME]` 32 * `--output [FILENAME]`
28 * A `.tsv` and/or `.html` will be generated by VKMZ with this paraameter as the file name. 33 * A `.tsv` and `.html` file will be generated by VKMZ with the given filename
29 * A `.tsv` and `.html` files generated by VKMZ are named by this option
30 34
31 All modes allow these options: 35 All modes allow these optional parameters:
32 * `--plot-type [scatter-2d]` 36 * `--plot-type [scatter-2d]`
37 * There is currently only one plot type
38 * Default is `scatter-2d`
33 * `--size [INTEGER]` 39 * `--size [INTEGER]`
34 * Set base size of marker dots of the VKD 40 * Set base size of marker dots on VKD
41 * Default size is 5
35 * `--size-algorithm [{1,2}]` 42 * `--size-algorithm [{1,2}]`
36 * Choose algorithm to modify marker size 43 * Choose one of the following algorithms:
37 1. Uniform base size 44 * 1: Sets all markers to the base size specified by `--size`
38 2. Intensity relative size 45 * Default
46 * 2: Marker sizes are relative to feature's log intensity
39 47
40 #### xcms and tsv modes 48 #### xcms and tsv modes
41 49
42 Both xcms and tsv mode require the mass error, in parts-per-million, of the mass spectrometer which generated the data: 50 Both xcms and tsv mode require the mass error, in parts-per-million, of the mass spectrometer which generated the data:
43 * `--error [PPM_ERROR_NUMBER]` 51 * `--error [PPM_ERROR_NUMBER]`
52 * It is critical to set the error correctly
44 53
45 There are several options for xcms and tsv modes: 54 There are several optional parameters for xcms and tsv modes:
46 * `--database [DATABASE_FILE]` 55 * `--no-adjustment`
47 * default is BMRB's monoisotopic heuristically generated database [3] 56 * Using this flag disables nominal mass adjustment
57 * Without this flag VKMZ adjusts feature masses by adding or removing that mass of a proton based on the features polarity
58 * `--database [DATABASE_FILE_PATH]`
59 * Default is BMRB's monoisotopic heuristically generated database
60 * This path is relative
48 * `--directory [TOOL_PATH]` 61 * `--directory [TOOL_PATH]`
49 * define tool directory 62 * Explicitly define tool directory
63 * Sets root directory for database file path
50 * `--no-plot` 64 * `--no-plot`
51 * disable html plot generation 65 * Disable html output
52 66
53 #### xcms mode 67 #### xcms mode
54 68
55 xcms mode requires tabular files generated by XCMS: 69 xcms mode requires three tabular files generated by XCMS:
56 * `--data-matrix [XCMS_DATA_MATRIX_FILE]` 70 * `--data-matrix [XCMS_DATA_MATRIX_FILE]`
57 * `--sample-metadata [XCMS_SAMPLE_METADATAFILE]` 71 * `--sample-metadata [XCMS_SAMPLE_METADATAFILE]`
58 * `--variable-metadata [XCMS_VARIABLE_METADATAFILE]` 72 * `--variable-metadata [XCMS_VARIABLE_METADATAFILE]`
59 73
60 ##### xcms mode example: 74 ##### xcms mode example:
62 python vkmz.py xcms --data-matrix test-data/datamatrix.tabular --sample-metadata test-data/sampleMetadata.tabular --variable-metadata test-data/variableMetadata.tabular --output report --error 3 76 python vkmz.py xcms --data-matrix test-data/datamatrix.tabular --sample-metadata test-data/sampleMetadata.tabular --variable-metadata test-data/variableMetadata.tabular --output report --error 3
63 ``` 77 ```
64 78
65 #### tsv mode 79 #### tsv mode
66 80
67 tsv mode requires a tabular file of a specific format as input. 81 tsv mode requires a tabular file of a specific format as input:
68 * `--input [TSV FILE]` 82 * `--input [TSV FILE]`
69 83
70 The first five columns of the input tabular file must be: 84 The first five columns of the input tabular file must be:
71 85 >| sample ID | polarity | mz | retention time | intensity |
72 | sample ID | polarity | mz | retention time | intensity | 86 >|-----------|----------|----|----------------|-----------|
73 |-----------|----------|----|----------------|-----------|
74 87
75 #### plot mode 88 #### plot mode
76 89
77 plot mode reads previously generated VKMZ tabular files to create VKD html files. 90 plot mode reads a previously generated VKMZ tabular files to create VKD html files.
78 91
79 Specifying the VKMZ tabular file is required: 92 Specifying the VKMZ tabular file is required:
80 * `--input [VKMZ_TSV_FILE]` 93 * `--input [VKMZ_TSV_FILE]`
81 94
95 ## Special thanks to
96
97 Adrian, Art, Eric, Jerry, Kevin, Renata, Stephen, Tim, and Yuan.
98
82 ## Citations 99 ## Citations
83 100
84 0. Brockman et al. [doi:10.1007/s11306-018-1343-y](https://doi.org/10.1007/s11306-018-1343-y) 101 0. Brockman et al. [doi:10.1007/s11306-018-1343-y](https://doi.org/10.1007/s11306-018-1343-y)
85 1. Galaxy Project [Galaxy](https://github.com/galaxyproject/galaxy) 102 1. Hegeman et al. [doi:10.1021/ac070346t](https://doi.org/10.1021/ac070346t)
86 2. Giacomoni et al. [doi:10.1093/bioinformatics/btu813](https://doi.org/10.1093/bioinformatics/btu813) 103 2. [Galaxy Project](https://galaxyproject.org/)
87 3. Hegeman et al. [doi:10.1021/ac070346t](https://doi.org/10.1021/ac070346t) 104 3. [Workflow4Metabolomics](http://workflow4metabolomics.org/)
105 4. Smith et al. [doi:10.1021/ac051437y](https://www.ncbi.nlm.nih.gov/pubmed/16448051)