Mercurial > repos > recetox > recetox_aplcms_compute_clusters

<macros>

<token name="@GENERAL_HELP@">
General Information
===================

Overview
--------

recetox-aplcms is a software package for peak detection in high resolution mass spectrometry (HRMS) data.
It supports reading .mzml files in raw profile mode and uses a bi-Gaussian chromatographic peak shape for feature detection and quantification.

recetox-aplcms is based on the apLCMS package developed by Tianwei Yu at Emory University - see the citations and the apLCMS section beneath.
This version includes various software updates and is actively developed and maintained on `GitHub`_.
Please submit eventual bug reports as `issues`_ on the repository.

.. _GitHub: https://github.com/RECETOX/recetox-aplcms
.. _issues: https://github.com/RECETOX/recetox-aplcms/issues/new


Workflow
--------

.. image:: https://raw.githubusercontent.com/RECETOX/galaxytools/aee0dd6cf6c05936269efe4337c50e27cc68e86b/tools/recetox_aplcms/images/scheme.png
   :width: 2560
   :height: 788
   :scale: 40
   :alt: A picture of a workflow diagram.

The individual steps of the recetox-aplcms package can be combined in 2 separate workflows processing HRMS data in an unsupervised manner or by including a-priori knowledge.
The workflows consist of the following building blocks:

(1) remove noise - denoise the raw data and extract the EIC
(2) generate feature table - group features in EIC into peaks using peak-shape model
(3) compute clusters - compute mz and rt clusters across samples
(4) compute template - find the template for rt correction
(5) correct time - correct the rt across samples using splines
(6) align features - align identical features across samples
(7) recover weaker signals - recover missed features in samples based on the aligned features
(8) merge known table - add known features to detected features table and vice versa

For detailed documentation on the individual steps please see the individual tool wrappers.


apLCMS (Original Reference)
---------------------------

apLCMS is a software which generates a feature table from a batch of LC/MS spectra. The m/z and retention time
tolerance levels are estimated from the data. A run-filter is used to detect peaks and remove noise.
Non-parametric statistical methods are used to find-tune peak selection and grouping. After retention time
correction, a feature table is generated by aligning peaks across spectra. For further information on apLCMS
please refer to https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/.
</token>

<token name="@REMOVE_NOISE_HELP@">
recetox-aplcms - remove noise
=============================

This tool is the first step of recetox-aplcms.
It removes noise from the raw data and performs a first clustering step of points with close m/z values into the extracted ion chromatograms (EICs).
Only peaks with a minimum elution length of `min_run` seconds are kept.

Example Output
--------------
The raw data points contained in the scans of the `mzml` file are filtered for noise and grouped into clusters based on m/z values.
See an example output in the table below. The `group_number` column indicates the cluster index.

+----------------------+-------------------+-----------------------+--------------------+
| mz                   |    rt             |    intensity          |    group_number    |
+======================+===================+=======================+====================+
| 70.01060119055192    |    350.58654      |    21178.330810546875 |    5               |
+----------------------+-------------------+-----------------------+--------------------+
| 70.02334120404554    |    130.175262     |    287869.5478515625  |    10              |
+----------------------+-------------------+-----------------------+--------------------+
| 70.0287408273165     |    134.801352     |    60883.15185546875  |    11              |
+----------------------+-------------------+-----------------------+--------------------+
| 70.02872416715464    |    183.991896     |    9201.574584960938  |    11              |
+----------------------+-------------------+-----------------------+--------------------+
| ...                  |    ...            |    ...                |    ...             |
+----------------------+-------------------+-----------------------+--------------------+
</token>

<token name="@GENERATE_FEATURE_TABLE_HELP@">
recetox-aplcms - generate feature table
=======================================
The second step in the recetox-aplcms workflow performing peak shape parameter estimation.

This tool takes the grouped features created with `recetox-aplcms-remove-noise` and computes the peak shape in `rt` domain and integrates the peak area.


Example Output
--------------
The output contains the `mz` and `rt` of the peaks as well as the standard deviation in both direction of the peak for the bi-gaussian peak shape.

+----------------------+-------------------+-----------------+-------------------+----------------------+
| mz                   |   rt              |    sd1          |    sd2            |   area               |
+======================+===================+=================+===================+======================+
| 70.02317542938793    |   142.36033       |   11.436659559  |    14.592754933   |   4159269.24595184   |
+----------------------+-------------------+-----------------+-------------------+----------------------+
| 70.02869594233522    |   205.48765       |   0.263230763   |    0.285101428707 |   8849767.11861127   |
+----------------------+-------------------+-----------------+-------------------+----------------------+
| 78.04643252598305    |   294.01713       |   0.51677558617 |    1.317028944141 |   1333044.50659719   |
+----------------------+-------------------+-----------------+-------------------+----------------------+
| ...                  |    ...            |    ...          |    ...            |   ...                |
+----------------------+-------------------+-----------------+-------------------+----------------------+
</token>

<token name="@COMPUTE_CLUSTERS_HELP@">
recetox-aplcms - compute clusters
=================================

Group features with `mz` and `rt` using tolerances within the tolerance into clusters, creating larger features from raw data points.
The tool takes a collection of all detected features and computes the clusters over a global feature table, adding the `sample_id` and `cluster` columns to the table.

Example Output
--------------

+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| mz                   |   rt              |    sd1          |    sd2            |   area               |  sample_id          |  cluster      |
+======================+===================+=================+===================+======================+=====================+===============+
| 70.02317542938793    |   142.36033       |   11.436659559  |    14.592754933   |   4159269.245951841  | 21_qc_no_dil_milliq |  7            |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| 70.02869594233522    |   205.48765       |   0.263230763   |    0.285101428707 |   8849767.11861127   | 21_qc_no_dil_milliq |  9            |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| 78.04643252598305    |   294.01713       |   0.51677558617 |    1.317028944141 |   1333044.506597194  | 21_qc_no_dil_milliq |  13           |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| ...                  |    ...            |    ...          |    ...            |   ...                | ...                 |  ...          |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
</token>

<token name="@CORRECT_TIME_HELP@">
recetox-aplcms - correct time
=============================

Apply spline-based retention time correction to a feature table given the template table and the `mz` and `rt` tolerances.

Example Output
--------------
The output has the same format as `compute clusters` but the retention time values are corrected based on the template table.

+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| mz                   |   rt              |    sd1          |    sd2            |   area               |  sample_id          |  cluster      |
+======================+===================+=================+===================+======================+=====================+===============+
| 70.02317542938793    |   142.36033       |   11.436659559  |    14.592754933   |   4159269.245951841  | 21_qc_no_dil_milliq |  7            |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| 70.02869594233522    |   205.48765       |   0.263230763   |    0.285101428707 |   8849767.11861127   | 21_qc_no_dil_milliq |  9            |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| 78.04643252598305    |   294.01713       |   0.51677558617 |    1.317028944141 |   1333044.506597194  | 21_qc_no_dil_milliq |  13           |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| ...                  |    ...            |    ...          |    ...            |   ...                | ...                 |  ...          |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
</token>
<token name="@COMPUTE_TEMPLATE_HELP@">
recetox-aplcms - compute template
=================================
Compute the template from a set of feature tables, choosing the one with the most features as the template.
</token>

<token name="@RECOVER_WEAKER_SIGNALS_HELP@">
recetox-aplcms - recover weaker signals
=======================================
Second stage peak detection based on the aligned feature table from the `feature alignment` step.
If a feature is contained in the aligned feature table, this step revisits the raw data and searches
for this feature at the retention time obtained by mapping the corrected retention time back to the original sample.

This recovers features which are present in a sample but might have been filtered out initially as noise due to low signal intensity.

Example Output
--------------
The table has the same format as the `compute clusters` output but might contain additional features which have been extracted based
on their presence in the aligned feature table.

+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| mz                   |   rt              |    sd1          |    sd2            |   area               |  sample_id          |  cluster      |
+======================+===================+=================+===================+======================+=====================+===============+
| 70.02317542938793    |   142.36033       |   11.436659559  |    14.592754933   |   4159269.245951841  | 21_qc_no_dil_milliq |  7            |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| 70.02869594233522    |   205.48765       |   0.263230763   |    0.285101428707 |   8849767.11861127   | 21_qc_no_dil_milliq |  9            |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| 78.04643252598305    |   294.01713       |   0.51677558617 |    1.317028944141 |   1333044.506597194  | 21_qc_no_dil_milliq |  13           |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
| ...                  |    ...            |    ...          |    ...            |   ...                | ...                 |  ...          |
+----------------------+-------------------+-----------------+-------------------+----------------------+---------------------+---------------+
</token>

<token name="@ALIGN_FEATURES_HELP@">
recetox-aplcms - align features
===============================
This step performs feature alignment after clustering and retention time correction.
The peaks clustered across samples are grouped based on the given tolerances to create an aligned feature table, connecting identical features across samples.
The parameter controls in how many samples a feature has to be detected at least in order to be included in the aligned feature table.

Example Output
--------------
The tool outputs 3 tables: the peak related `metadata`, the `retention times` and the `intensities` for all features across all samples.

Metadata Table
~~~~~~~~~~~~~~
The `npeaks` column denotes the number of peaks which have been grouped into this feature. The columns with the sample names indicate whether this feature is present in the sample.

+-------+--------------+--------------+---------------+----------------+---------------+---------------+-----------+------------------------+------------------------+------------------------+
|  id   | mz           |  mzmin       |  mzmax        |  rt            |  rtmin        |  rtmax        |   npeaks  |  21_qc_no_dil_milliq   |  29_qc_no_dil_milliq   |  8_qc_no_dil_milliq    |
+=======+==============+==============+===============+================+===============+===============+===========+========================+========================+========================+
|  1    | 70.03707021  |  70.037066   |  70.0370750   |  294.1038014   |  294.0634942  |  294.149985   |   3       |  1                     |  1                     |  1                     |
+-------+--------------+--------------+---------------+----------------+---------------+---------------+-----------+------------------------+------------------------+------------------------+
|  2    | 70.06505677  |  70.065045   |  70.0650676   |  141.9560055   |  140.5762528  |  143.335758   |   2       |  1                     |  0                     |  1                     |
+-------+--------------+--------------+---------------+----------------+---------------+---------------+-----------+------------------------+------------------------+------------------------+
|  57   | 78.04643252  |  78.046429   |  78.0464325   |  294.0063397   |  293.9406777  |  294.072001   |   2       |  1                     |  1                     |  0                     |
+-------+--------------+--------------+---------------+----------------+---------------+---------------+-----------+------------------------+------------------------+------------------------+
|  ...  | ...          |   ...        |  ...          |  ...           |  ...          |  ...          |   ...     |  ...                   |  ...                   |  ...                   |
+-------+--------------+--------------+---------------+----------------+---------------+---------------+-----------+------------------------+------------------------+------------------------+

Intensity Table
~~~~~~~~~~~~~~~
This table contains the peak area for aligned features in all samples.

+-------+------------------------+------------------------+------------------------+
|  id   |  21_qc_no_dil_milliq   |  29_qc_no_dil_milliq   |  8_qc_no_dil_milliq    |
+=======+========================+========================+========================+
|  1    |  13187487.20482895     |  7957395.699119729     |  11700594.397257797    |
+-------+------------------------+------------------------+------------------------+
|  2    |  2075168.6398983458    |  0                     |  2574362.159289044     |
+-------+------------------------+------------------------+------------------------+
|  57   |  2934524.4406785755    |  1333044.5065971944    |  0                     |
+-------+------------------------+------------------------+------------------------+
|  ...  |  ...                   |  ...                   |  ...                   |
+-------+------------------------+------------------------+------------------------+

Retention Time Table
~~~~~~~~~~~~~~~~~~~~
This table contains the retention times for all aligned features in all samples.

+-------+------------------------+------------------------+------------------------+
|  id   |  21_qc_no_dil_milliq   |  29_qc_no_dil_milliq   |  8_qc_no_dil_milliq    |
+=======+========================+========================+========================+
|  1    |  294.09792478513236    |  294.1499853056912     |  294.0634942428341     |
+-------+------------------------+------------------------+------------------------+
|  2    |  140.57625284242982    |  0                     |  143.33575827589172    |
+-------+------------------------+------------------------+------------------------+
|  57   |  294.07200187644435    |  293.9406777222317     |  0                     |
+-------+------------------------+------------------------+------------------------+
|  ...  |  ...                   |  ...                   |  ...                   |
+-------+------------------------+------------------------+------------------------+
</token>

<token name="@MERGE_KNOWN_TABLES_HELP@">
recetox-aplcms - merge known table
==================================

This tool allows merging the detected features back into the table of known features and vice versa.
It is used in the hybrid version of recetox-aplcms to augment the aligned feature table with the suspect peaks
and to augment this table with successfully detected features.
</token>
</macros>
author	recetox
date	Wed, 24 May 2023 14:48:47 +0000
parents	82737757f3d5
children	b9b19a74ac01