Standard DIMS processing workflow: Process Scans -> [Replicate Filter] -> Align Samples -> Blank Filter -> Sample Filter -> [Missing values sample filter] -> Pre-processing -> Statistics
There are many and varied reasons why a peak may not have been detected in all study samples, including: due to having an intensity (concentration) close to the signal-to-noise limit of the system; due to having been present in only one of the study classes (e.g. a drug administered to the ‘treatment’ class samples); due to ion suppression/enhancement effects in the mass spectrometer source region; etc. The Sample Filter tool allows users to remove peaks from the input peak intensity matrix that were detected in fewer-than a user-defined minimum number of study samples.
Peak Intensity Matrix (HDF5 file) (REQUIRED) - a HDF5 file containing the peak intensity matrix to undergo sample filtering.
Apply sample filter within each sample class (REQUIRED; default No)
No - check across ALL classes simultaneously whether greater-than the user-defined “Minimum fraction” of samples contained an intensity value for a specific mass spectral peak.
Yes - check within EACH class separately whether greater-than the user-defined “Minimum fraction” of samples contained an intensity value for a specific mass spectral peak.
IMPORTANT: if in ANY class a peak is detected in greater-than the user-defined minimum fraction of samples, then the peak is retained in the output peak matrix. For classes in which this condition is not met, the peak intensity recorded for that peak (if any) will still be presented in the output peak matrix. If no peak intensity was recorded in a sample, then a ‘0’ is inserted in to the peak matrix.
Minimum fraction (REQUIRED, default 0.5) - a numeric value between 0 and 1 indicating the proportion of study samples in which a peak must have a recorded intensity value in order for it to be retained in the output peak intensity matrix; e.g. 0.5 means that at least 50% of samples (whether assessed across all classes, or within each class individually) must have a recorded intensity value for a specific peak in order for it to be retained in the output peak matrix.
Show options for additional output(s) (OPTIONAL):
Standard output (default = No) - boolean toggle where selection of:
- No - prevent the export of a .txt formatted peak matrix to the active Galaxy history.
- Yes - export a .txt formatted peak matrix to the active Galaxy history that includes only those peaks from the input peak intensity matrix that passed the filtering procedure.
Comprehensive output (default = "No") - boolean toggle where selection of:
- No - prevents export of a .txt formatted comprehensive peak matrix.
- Yes - exports a .txt formatted comprehensive peak matrix to the active Galaxy history that contains the m/z, missing values and other metrics associated with all peaks included in the input peak intensity matrix, including the metric defined by the "The peak matrix should contain intensity | m/z | SNR values" parameter.
Should rows or columns represent the samples? (default = rows) - binary toggle where selection of:
- rows - sample information is presented in the rows and m/z values (for aligned mass spectral peaks) in the columns of any output peak matrix.
- columns - sample information is presented in the columns and m/z values (for aligned mass spectral peaks) in the rows of any output peak matrix.
The peak matrix should contain intensity | m/z | SNR values - use this option to define which peak metric is inserted in to the cells of any optionally-output peak matrix:
- Intensity - writes the absolute peak intensity to the cells of the peak matrix
- m/z - writes the mass-to-charge ratio to the cells of the peak matrix
- signal-to-noise ratio (SNR) - writes the signal-to-noise ratio to the cells of the peak matrix
IMPORTANT - in all outputs except for the (optional) comprehensive output, if fewer-than the user-defined “Minimum fraction” of samples (whether assessed within a class, or across all classes) had a recorded intensity value (either within or across classes) for a given peak, then that peak will be removed from the output peak matrix. Note that when assessing within classes, only one class needs to have recorded intensity values in greater-than the "Minimum fraction" of samples in order for the peak to be retained in the output peak matrix.
Default output - a HDF5 file containing the aligned peak intensity matrix.
Optional outputs - the metric recorded in any optionally output peak matrix/matrices is defined using the parameter "The peak matrix should contain intensity | m/z | SNR values". By default, study samples are listed row-wise, while mass-to-charge ratios of the aligned mass spectral peaks are presented in columns (to adjust, users must adjust the "Should rows or columns represent samples" toggle to “columns”).
Standard output - an aligned peak matrix in tab-delimited format (“.” as decimal and NA for missing values).
Example of a standard peak intensity matrix:
mz | 96.04216 | 99.08062 | 100.0759 | 100.8672 | ... |
QC_1 | 0 | 0 | 0 | 0 | ... |
Blank_1 | 3342.626 | 0 | 0 | 0 | ... |
Control_10 | 0 | 0 | 45432.2 | 0 | ... |
Sample_2 | 0 | 3423.3 | 0 | 0 | ... |
Control_5 | 0 | 0 | 49759 | 0 | ... |
Control_10 | 0 | 39890.5 | 0 | ... | |
Sample_20 | 0 | 14563.7 | 0 | 0 | ... |
Sample_2 | 0 | 34676.4 | 0 | 0 | ... |
Sample_14 | 0 | 13134.9 | 0 | 521.4 | ... |
... | ... | ... | ... | ... | ... |
Comprehensive output - an aligned peak matrix, as described for the "standard output" (above), including all metadata from the "Process Scans" Filelist/samplelist and the following additional mass spectral peak metrics:
Example of a comprehensive peak intensity matrix:
mz | missing values | tags_batch | tags_replicates | tags_replicate | tags_injectionOrder | tags_classLabel | tags_untyped | 96.04216 | 99.08062 | 100.0759 | 100.8672 | ... |
present* | 1 | 4 | 3 | 1 | ... | |||||||
occurrence* | 1 | 4 | 4 | 1 | ... | |||||||
purity* | 1 | 1 | 1 | 1 | ... | |||||||
rsd_all* | nan | nan | 10.98 | nan | ... | |||||||
flags* | 1 | 1 | 1 | 1 | ... | |||||||
QC_1 | 2901 | 1 | 2_3_4 | 2 | 2 | QC | 0 | 0 | 0 | 0 | ... | |
Blank_1 | 2948 | 1 | 1_2_4 | 1 | 5 | Blank | 3342.626 | 0 | 0 | 0 | ... | |
Control_10 | 2921 | 1 | 2_3_4 | 2 | 10 | Control | 0 | 0 | 45432.2 | 0 | ... | |
Sample_2 | 2819 | 1 | 1_2_4 | 1 | 13 | Exposed | 0 | 3423.3 | 0 | 0 | ... | |
Control_5 | 2877 | 1 | 2_3_4 | 2 | 18 | Control | 0 | 0 | 49759 | 0 | ... | |
Control_10 | 2856 | 1 | 1_2_3 | 1 | 21 | Control | 0 | 39890.5 | 0 | ... | ||
Sample_20 | 2855 | 1 | 1_2_4 | 1 | 25 | Exposed | 0 | 14563.7 | 0 | 0 | ... | |
Sample_2 | 2814 | 1 | 1_2_4 | 1 | 29 | Exposed | 0 | 34676.4 | 0 | 0 | ... | |
Sample_14 | 2870 | 1 | 1_2_3 | 1 | 33 | Exposed | 0 | 13134.9 | 0 | 521.4 | ... | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Galaxy Tool Wrappers: https://github.com/computational-metabolomics/dimspy-galaxy/ DIMSpy package: https://github.com/computational-metabolomics/dimspy/
DIMSpy is released under the GNU General Public License v3.0 (see LICENSE file)