Galaxy | Tool Preview

Align Samples (version 2.0.0+galaxy1)
Only provide a filelist if you like to exclude Peaklists, update the metadata (e.g. classLabel), or if you have not provided a filelist for Process Scans or Replicate Filter.
Maximum tolerated m/z deviation across samples in parts per million (ppm).
Show options for addtional output (*.tsv files)s
Show options for addtional output (*.tsv files) 0
Advanced options
Advanced options 0

Align Samples


Description

Standard DIMS processing workflow: Process Scans -> [Replicate Filter] -> Align Samples -> Blank Filter -> Sample Filter -> [Missing values sample filter] -> Pre-processing -> Statistics


This tool takes the peaklists for all study samples and merges them in to single aligned peak matrix. The peak matrix comprises a table, with samples along one axis and the mass-to-charge ratios of detected mass spectral peaks along the opposite axis. At the intersection of sample and mass-to-charge ratio, the intensity is given for a specific peak in a specific sample (if no intensity recorded, then ‘0’ is inserted).


Parameters

Peaklists (HDF5 file) (REQUIRED) - a HDF5 file containing all peaklists to undergo alignment.


Filelist / Samplelist (REQUIRED) - a file of type ‘tabular’ with the following required columns (additional metadata columns may also be included, e.g. “collectionTime”, etc.):

  • filename - the name of the .raw or .mzML files from which peaklists were extracted using the “Process Scans” tool
  • batch - a numeric value indicating the analysis batches samples were analysed in (if a single large analytica run then the default = 1)
  • classLabel - a string indicating the experiment classes the samples belong to (e.g. control, QC, blank/placebo, exposed/treatment)
  • injectionOrder - a numeric value indicating the order in which samples were analysed.

ppm error tolerance (REQUIRED; default = 2.0) - a numeric value equal-to or greater-than 0.

This parameter will influence the alignment of peaks from input peaklists. Peaks from distinct peaklists (corresponding to individual study samples) are aligned if the difference between their mass-to-charge ratios, when divided by the average of their mass-to-charge ratios and multiplied by 1 × 106 , is equal-to or less-than than this parameter value (i.e. the difference between the mass-to-charge ratios, measured on the ppm scale, is less than the user-defined “ppm error tolerance”).


Show options for additional output(s) (OPTIONAL):

  • Standard output (default = No) - boolean toggle where selection of:

    • No - prevent the export of a .txt formatted peak matrix to the active Galaxy history.
    • Yes - export a .txt formatted peak matrix to the active Galaxy history that includes only those peaks from the input peak intensity matrix that passed the filtering procedure.
  • Comprehensive output (default = "No") - boolean toggle where selection of:

    • No - prevents export of a .txt formatted comprehensive peak matrix.
    • Yes - exports a .txt formatted comprehensive peak matrix to the active Galaxy history that contains the m/z, missing values and other metrics associated with all peaks included in the input peak intensity matrix, including the metric defined by the "The peak matrix should contain intensity | m/z | SNR values" parameter.
  • Should rows or columns represent the samples? (default = rows) - binary toggle where selection of:

    • rows - sample information is presented in the rows and m/z values (for aligned mass spectral peaks) in the columns of any output peak matrix.
    • columns - sample information is presented in the columns and m/z values (for aligned mass spectral peaks) in the rows of any output peak matrix.
  • The peak matrix should contain intensity | m/z | SNR values - use this option to define which peak metric is inserted in to the cells of any optionally-output peak matrix:

    • Intensity - writes the absolute peak intensity to the cells of the peak matrix
    • m/z - writes the mass-to-charge ratio to the cells of the peak matrix
    • signal-to-noise ratio (SNR) - writes the signal-to-noise ratio to the cells of the peak matrix


Output file(s)

Default output - a HDF5 file containing the aligned peak intensity matrix.


Optional outputs - the metric recorded in any optionally output peak matrix/matrices is defined using the parameter "The peak matrix should contain intensity | m/z | SNR values". By default, study samples are listed row-wise, while mass-to-charge ratios of the aligned mass spectral peaks are presented in columns (to adjust, users must adjust the "Should rows or columns represent samples" toggle to “columns”).

  • Standard output - an aligned peak matrix in tab-delimited format (“.” as decimal and NA for missing values).

    Example of a standard peak intensity matrix:

    mz 96.04216 99.08062 100.0759 100.8672 ...
    QC_1 0 0 0 0 ...
    Blank_1 3342.626 0 0 0 ...
    Control_10 0 0 45432.2 0 ...
    Sample_2 0 3423.3 0 0 ...
    Control_5 0 0 49759 0 ...
    Control_10 0   39890.5 0 ...
    Sample_20 0 14563.7 0 0 ...
    Sample_2 0 34676.4 0 0 ...
    Sample_14 0 13134.9 0 521.4 ...
    ... ... ... ... ... ...

  • Comprehensive output - an aligned peak matrix, as described for the "standard output" (above), including all metadata from the "Process Scans" Filelist/samplelist and the following additional mass spectral peak metrics:

    • present - a positive integer value (0 < value < total number of study samples in the filelist / samplelist) that indicates the total number of study samples in which a peak was detected with the specified mass-to-charge ratio, plus or minus the user-defined ppm error tolerance.
    • occurrence - a positive integer value indicating the number of peaks that were grouped together during the alignment procedure and thus, that were used to calculate the average mass-to-charge ratio indicated for the aligned peak. A value greater than given in the “Present” metric indicates that one or more peaklists contained more-than one mass spectral peak with the specified mass-to-charge, plus or minus the user-defined ppm error tolerance.
    • purity - a proportion ranging from 0 to 1 that indicates the number of scans in which only a single peak was detected during the peaklist alignment process. If the value in the “occurrence” metric is greater than the “present” metric, purity will be < 1. A purity < 1 means that in at least one peaklist there was more-than one mass spectral peak with the specified mass-to-charge, plus or minus the user-defined ppm error tolerance.
    • rsd_all - a numeric value indicating the percent relative standard deviation (otherwise termed the percent coefficient of variation) of peak intensities for peaks aligned together using the Align Samples tool. If fewer than 2 peaks were aligned across samples, then the rsd_all column will be filled in with ‘nan’
    • blank_flag (may be absent if "Blank filter” tool was not applied) - a boolean value where 0 = reject peak, 1 = accept peak. A peak is accepted during blank filtering if a user-defined minimum proportion of study samples had peak intensity values greater-than the product of the average of “reference” sample peak intensities and the “min_fold_change” parameter.
    • fraction_flag (may be absent if "Sample filter” tool was not applied)- a boolean value where 0 = reject peak, 1 = accept peak. If greater-than a user-defined minimum fraction of samples (whether checked across ALL experimental classes, or within ANY of the individual experimental classes) had recorded intensity values for a given peak, then this peak is accepted, i.e. it is considered in downstream processing procedures, while rejected peaks are not.
    • flags - a boolean value indicating whether a peak should be included (“1”) or excluded (“0”) from downstream processing procedures. Exclusion of a peak occurs if the thresholds for “relative standard deviation” and/or “minimum number of technical replicates a peak has to be present in” were not met.

    Example of a comprehensive peak intensity matrix:

    mz missing values tags_batch tags_replicates tags_replicate tags_injectionOrder tags_classLabel tags_untyped 96.04216 99.08062 100.0759 100.8672 ...
    present*               1 4 3 1 ...
    occurrence*               1 4 4 1 ...
    purity*               1 1 1 1 ...
    rsd_all*               nan nan 10.98 nan ...
    flags*               1 1 1 1 ...
    QC_1 2901 1 2_3_4 2 2 QC   0 0 0 0 ...
    Blank_1 2948 1 1_2_4 1 5 Blank   3342.626 0 0 0 ...
    Control_10 2921 1 2_3_4 2 10 Control   0 0 45432.2 0 ...
    Sample_2 2819 1 1_2_4 1 13 Exposed   0 3423.3 0 0 ...
    Control_5 2877 1 2_3_4 2 18 Control   0 0 49759 0 ...
    Control_10 2856 1 1_2_3 1 21 Control   0   39890.5 0 ...
    Sample_20 2855 1 1_2_4 1 25 Exposed   0 14563.7 0 0 ...
    Sample_2 2814 1 1_2_4 1 29 Exposed   0 34676.4 0 0 ...
    Sample_14 2870 1 1_2_3 1 33 Exposed   0 13134.9 0 521.4 ...
    ... ... ... ... ... ... ... ... ... ... ... ... ...

Developers and contributors

License

DIMSpy is released under the GNU General Public License v3.0 (see LICENSE file)