Galaxy | Tool Preview

Sample Filter (version 2.0.0+galaxy0)
Minimum fraction (i.e. percentage) of samples a peak has to be present
Leave empty to skip this step
Peaks that have a larger RSD than the QC peaks are removed (Leave empty to skip this step).
Show options for addtional output (*.tsv files)s
Show options for addtional output (*.tsv files) 0

Sample Filter


Description

Standard DIMS processing workflow: Process Scans -> [Replicate Filter] -> Align Samples -> Blank Filter -> Sample Filter -> [Missing values sample filter] -> Pre-processing -> Statistics


There are many and varied reasons why a peak may not have been detected in all study samples, including: due to having an intensity (concentration) close to the signal-to-noise limit of the system; due to having been present in only one of the study classes (e.g. a drug administered to the ‘treatment’ class samples); due to ion suppression/enhancement effects in the mass spectrometer source region; etc. The Sample Filter tool allows users to remove peaks from the input peak intensity matrix that were detected in fewer-than a user-defined minimum number of study samples.


Parameters

Peak Intensity Matrix (HDF5 file) (REQUIRED) - a HDF5 file containing the peak intensity matrix to undergo sample filtering.

Apply sample filter within each sample class (REQUIRED; default No)

  • No - check across ALL classes simultaneously whether greater-than the user-defined “Minimum fraction” of samples contained an intensity value for a specific mass spectral peak.

  • Yes - check within EACH class separately whether greater-than the user-defined “Minimum fraction” of samples contained an intensity value for a specific mass spectral peak.

    IMPORTANT: if in ANY class a peak is detected in greater-than the user-defined minimum fraction of samples, then the peak is retained in the output peak matrix. For classes in which this condition is not met, the peak intensity recorded for that peak (if any) will still be presented in the output peak matrix. If no peak intensity was recorded in a sample, then a ‘0’ is inserted in to the peak matrix.

Minimum fraction (REQUIRED, default 0.5) - a numeric value between 0 and 1 indicating the proportion of study samples in which a peak must have a recorded intensity value in order for it to be retained in the output peak intensity matrix; e.g. 0.5 means that at least 50% of samples (whether assessed across all classes, or within each class individually) must have a recorded intensity value for a specific peak in order for it to be retained in the output peak matrix.


Show options for additional output(s) (OPTIONAL):

  • Standard output (default = No) - boolean toggle where selection of:

    • No - prevent the export of a .txt formatted peak matrix to the active Galaxy history.
    • Yes - export a .txt formatted peak matrix to the active Galaxy history that includes only those peaks from the input peak intensity matrix that passed the filtering procedure.
  • Comprehensive output (default = "No") - boolean toggle where selection of:

    • No - prevents export of a .txt formatted comprehensive peak matrix.
    • Yes - exports a .txt formatted comprehensive peak matrix to the active Galaxy history that contains the m/z, missing values and other metrics associated with all peaks included in the input peak intensity matrix, including the metric defined by the "The peak matrix should contain intensity | m/z | SNR values" parameter.
  • Should rows or columns represent the samples? (default = rows) - binary toggle where selection of:

    • rows - sample information is presented in the rows and m/z values (for aligned mass spectral peaks) in the columns of any output peak matrix.
    • columns - sample information is presented in the columns and m/z values (for aligned mass spectral peaks) in the rows of any output peak matrix.
  • The peak matrix should contain intensity | m/z | SNR values - use this option to define which peak metric is inserted in to the cells of any optionally-output peak matrix:

    • Intensity - writes the absolute peak intensity to the cells of the peak matrix
    • m/z - writes the mass-to-charge ratio to the cells of the peak matrix
    • signal-to-noise ratio (SNR) - writes the signal-to-noise ratio to the cells of the peak matrix

Output file(s)

IMPORTANT - in all outputs except for the (optional) comprehensive output, if fewer-than the user-defined “Minimum fraction” of samples (whether assessed within a class, or across all classes) had a recorded intensity value (either within or across classes) for a given peak, then that peak will be removed from the output peak matrix. Note that when assessing within classes, only one class needs to have recorded intensity values in greater-than the "Minimum fraction" of samples in order for the peak to be retained in the output peak matrix.

Default output - a HDF5 file containing the aligned peak intensity matrix.


Optional outputs - the metric recorded in any optionally output peak matrix/matrices is defined using the parameter "The peak matrix should contain intensity | m/z | SNR values". By default, study samples are listed row-wise, while mass-to-charge ratios of the aligned mass spectral peaks are presented in columns (to adjust, users must adjust the "Should rows or columns represent samples" toggle to “columns”).

  • Standard output - an aligned peak matrix in tab-delimited format (“.” as decimal and NA for missing values).

    Example of a standard peak intensity matrix:

    mz 96.04216 99.08062 100.0759 100.8672 ...
    QC_1 0 0 0 0 ...
    Blank_1 3342.626 0 0 0 ...
    Control_10 0 0 45432.2 0 ...
    Sample_2 0 3423.3 0 0 ...
    Control_5 0 0 49759 0 ...
    Control_10 0   39890.5 0 ...
    Sample_20 0 14563.7 0 0 ...
    Sample_2 0 34676.4 0 0 ...
    Sample_14 0 13134.9 0 521.4 ...
    ... ... ... ... ... ...

  • Comprehensive output - an aligned peak matrix, as described for the "standard output" (above), including all metadata from the "Process Scans" Filelist/samplelist and the following additional mass spectral peak metrics:

    • present - a positive integer value (0 < value < total number of study samples in the filelist / samplelist) that indicates the total number of study samples in which a peak was detected with the specified mass-to-charge ratio, plus or minus the user-defined ppm error tolerance.
    • occurrence - a positive integer value indicating the number of peaks that were grouped together during the alignment procedure and thus, that were used to calculate the average mass-to-charge ratio indicated for the aligned peak. A value greater than given in the “Present” metric indicates that one or more peaklists contained more-than one mass spectral peak with the specified mass-to-charge, plus or minus the user-defined ppm error tolerance.
    • purity - a proportion ranging from 0 to 1 that indicates the number of scans in which only a single peak was detected during the peaklist alignment process. If the value in the “occurrence” metric is greater than the “present” metric, purity will be < 1. A purity < 1 means that in at least one peaklist there was more-than one mass spectral peak with the specified mass-to-charge, plus or minus the user-defined ppm error tolerance.
    • rsd_all - a numeric value indicating the percent relative standard deviation (otherwise termed the percent coefficient of variation) of peak intensities for peaks aligned together using the Align Samples tool. If fewer than 2 peaks were aligned across samples, then the rsd_all column will be filled in with ‘nan’
    • blank_flag (may be absent if "Blank filter” tool was not applied) - a boolean value where 0 = reject peak, 1 = accept peak. A peak is accepted during blank filtering if a user-defined minimum proportion of study samples had peak intensity values greater-than the product of the average of “reference” sample peak intensities and the “min_fold_change” parameter.
    • fraction_flag (may be absent if "Sample filter” tool was not applied)- a boolean value where 0 = reject peak, 1 = accept peak. If greater-than a user-defined minimum fraction of samples (whether checked across ALL experimental classes, or within ANY of the individual experimental classes) had recorded intensity values for a given peak, then this peak is accepted, i.e. it is considered in downstream processing procedures, while rejected peaks are not.
    • flags - a boolean value indicating whether a peak should be included (“1”) or excluded (“0”) from downstream processing procedures. Exclusion of a peak occurs if the thresholds for “relative standard deviation” and/or “minimum number of technical replicates a peak has to be present in” were not met.

    Example of a comprehensive peak intensity matrix:

    mz missing values tags_batch tags_replicates tags_replicate tags_injectionOrder tags_classLabel tags_untyped 96.04216 99.08062 100.0759 100.8672 ...
    present*               1 4 3 1 ...
    occurrence*               1 4 4 1 ...
    purity*               1 1 1 1 ...
    rsd_all*               nan nan 10.98 nan ...
    flags*               1 1 1 1 ...
    QC_1 2901 1 2_3_4 2 2 QC   0 0 0 0 ...
    Blank_1 2948 1 1_2_4 1 5 Blank   3342.626 0 0 0 ...
    Control_10 2921 1 2_3_4 2 10 Control   0 0 45432.2 0 ...
    Sample_2 2819 1 1_2_4 1 13 Exposed   0 3423.3 0 0 ...
    Control_5 2877 1 2_3_4 2 18 Control   0 0 49759 0 ...
    Control_10 2856 1 1_2_3 1 21 Control   0   39890.5 0 ...
    Sample_20 2855 1 1_2_4 1 25 Exposed   0 14563.7 0 0 ...
    Sample_2 2814 1 1_2_4 1 29 Exposed   0 34676.4 0 0 ...
    Sample_14 2870 1 1_2_3 1 33 Exposed   0 13134.9 0 521.4 ...
    ... ... ... ... ... ... ... ... ... ... ... ... ...

Developers and contributors

License

DIMSpy is released under the GNU General Public License v3.0 (see LICENSE file)