This tool performs LC/MS matching on an input list of MZ/RT values, using either a provided in-house single file database or a connection to Peakforest database.
When selecting the database, you have the choice between a Peakforest database or an in-house file.
For the Peakforest database, a default REST web base address is already provided. But you can change it to use a custom database. A field is also available for setting a token key in case the access to the Peakforest database you want to use is restricted. This is the case of the default database URL.
For the in-house file, please refer to the paragraph "Single file database" below.
Be careful to always provide UTF-8 encoded files, unless you do not use special characters at all. For instance, greek letters in molecule names give errors if the file is in latin1 (ISO 8859-1) or Windows 1252 (not distinguishable from latin1) encoding.
In this case, the database used is provided as a single file by the user, in tabular format, through the Database file field. This file must contain a list of MS peaks, with possibly retention times. Peaks are "duplicated" as much as necessary. For instance if 3 retention times are available on a compound with 10 peaks in positive mode, then there will be 30 lines for this compound in positive mode.
The file must contain a header with the column names. The names are free, but must be provided through the Column names field as a comma separated list of key/value pairs. See default value as an example. Of course it is much easier if your database file uses the default column names used in the default value of the Column names field. The column names shown in the default values, are only the ones used by the algorithm. You can provide any additional columns in your database file, they will be copied in the output.
Then you must provide the values used to identify the MS modes (positive and negative), using field MS modes.
A last information about the single file database is the unit of the retention times, either in seconds or in minutes. Use the field "Retention time unit" to provide this information.
Example of database file (totally fake, no meaning):
molid | mode | mz | composition | attribution | col | rt | molcomp | molmass | molnames |
A10 | "POS" | 112.07569 | "P9Z6W410 O" | "[(M+H)-(H2O)-(NH3)]+" | "colzz" | 5.69 | "J114L6M62O2" | 146.10553 | "Blablaine'" |
A10 | "POS" | 112.07569 | "P9Z6W410 O" | "[(M+H)-(H2O)-(NH3)]+" | "col12" | 0.8 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 112.07569 | "P9Z6W410 O" | "[(M+H)-(H2O)-(NH3)]+" | "somecol" | 8.97 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 191.076694 | "P92Z6W413 Na2 O2" | "[(M-H+2Na)]+" | "colAA" | 1.58 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 191.076694 | "P92Z6W413 Na2 O2" | "[(M-H+2Na)]+" | "colzz2" | 4.08 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 294.221687 | "U1113P94ZW429 O4" | "[(2M+H)]+ (13C)" | "somecol" | 8.97 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 72.080775 | "P9Z4W410 O0" | "[(M+H)-(J15L2M6O2)]+" | "hcoltt" | 0.8 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 112.07569 | "P9Z6W410 O" | "[(M+H)-(H2O)-(NH3)]+" | "colzz3" | 4.54 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 72.080775 | "P9Z4W410 O0" | "[(M+H)-(J15L2M6O2)]+" | "colzz3" | 4.54 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 72.080775 | "P9Z4W410 O0" | "[(M+H)-(J15L2M6O2)]+" | "colpp" | 0.89 | "J114L6M62O2" | 146.10553 | "Blablaine" |
A10 | "POS" | 145.097154 | "P92Z6W413 O2" | "[(M+H)-(H2)]+" | "hcoltt" | 0.8 | "J114L6M62O2" | 146.10553 | "Blablaine" |
The corresponding value of the Column names field for this database field would be: mztheo=mz,chromcolrt=rt,compoundid=molid,chromcol=col,msmode=mode,peakattr=attribution.
And the value of the MS modes field would be: pos=POS,neg=NEG.
The input to provide is a dataset in a tabular format (or TSV: Tab Seperated Values), containing the list of M/Z values, with possibly also RT values. The dataset is chosen through the field Input file - MZ(/RT) values.
The column names for the M/Z and RT values must be provided through the field Input column names, as a comma separated list of key/value pairs. The file/dataset must contain a header line with the same names specified in the field Input column names.
The unit of the retention time has to be provided with the field Retention time unit.
Example of file input:
mz | rt |
75.02080998 | 49.38210915 |
75.05547146 | 0.658528069 |
75.08059797 | 1743.94267 |
76.03942694 | 51.23158899 |
76.07584477 | 50.51249853 |
76.07593168 | 0.149308136 |
In the simplest form of the algorithm only the M/Z values are matched against the database peaks. This happens if both Retention time match and Precursor match are off.
The first parameter is the MS mode, specified through the MS mode parameter.
The parameters M/Z precision and M/Z shift are used by the algorithm in the following formula in order to match an M/Z value:
mz - shift - precision < mzref < mz - shift + precision
Where mzref is the M/Z of reference from the database peak that is tested. If this double inequality is true, then the M/Z value is matched with this peak.
The parameters shift and precision can be input in either PPM values of M/Z or in plain values. Use the field M/Z tolerance unit to set the unit.
If at least one column is selected inside the Chromatographic columns parameter section, then retention time is also matched, in addition to the M/Z value, according to the following formula:
rt - x - rt^y < colrt < rt + x + rt^y
Where x is the value of the parameter RTX and y the value of the parameter RTY.
If for a reference compound the database does not contain retention time for at least one of the specified columns, then only the M/Z value is matched against the peaks of the reference compound. This means that in the results you can find compounds that do no match the provided retention time value.
The RTZ parameter is used in the Precursor match algorithm (see below).
If the "Precursor match" option is enabled inside the parameters section, then a more sophisticated version of the algorithm, which is executed in two steps, is used.
This algorithm takes two more parameters, one for each MS mode. These are the lists of precursors. Since the matching is run for one MS mode only, only one of the two parameters is used. Inside the single file database, all the peaks whose peakattr column value is equal to one of the precursor listed in List of negative precursors or List of positive precursors, depending on the mode, are considered as precursor peaks.
- Using the normal M/Z matching algorithm described above, we first look only for precursor peaks ([(M+H)]+, [(M+Na)]+, [(M+Cl)]-, ...).
- From step 1, we construct a list of matched molecules.
- We look at all peaks inside the molecule list obtained in step 2, using the normal M/Z matching algorithm described above.
- Using the normal MZ/RT matching algorithm described above, we first look only for precursor peaks ([(M+H)]+, [(M+Na)]+, [(M+Cl)]-, ...).
- From step 1, we construct a list of matched molecules, retaining the matched retention time of each molecule.
- For each input couple (m/z,rt), we look at all peaks inside the molecules taken from step 2, whose matched retention time between rt - z and rt + z, where z is the value of parameter RTZ.
The Multiple matches separator character is used to customize the character used to separate the multiple values inside each row in the main output dataset. The main output contains as much rows as the MZ/RT input dataset, thus when for one MZ/RT value the algorithm finds more than one match, it concatenates the matches using this separator character.
Three files are output by the tool.
Outputs | File name | Description |
Main output | lcmsmatching_{input_file_name} | Contains the same data as the input dataset, with match result included on each row. If more than one match is found for a row, the different values of the match are concatenated using the provided separator character. |
Peak list | lcmsmatching_{input_file_name}_peaks | Contains the same data as the input dataset, with match result included on each row. If more than one match is found for a row, then the row is duplicated. Hence there is either no match for a row, or one single match. |
HTML output | lcmsmatching_{input_file_name}.html | Contains the same table as Peak list but in HTML format and with links to external databases if columns for PubChem Compound, ChEBI, HMDB Metabolites or KEGG Compounds are provided. |
The match results are output as new columns appended to the columns provided inside the MZ/RT input dataset, and prefixed with "lcmsmatching.".
Version 4.0.0 - 02/01/2019