# HG changeset patch # User pieter.lukasse@wur.nl # Date 1423234166 -3600 # Node ID 9bd2597c8851223a5b040e1f70188070d5713136 # Parent d685210eef3ecab31c30c55c163446369d04c1bd r diff -r d685210eef3e -r 9bd2597c8851 MsClust.jar Binary file MsClust.jar has changed diff -r d685210eef3e -r 9bd2597c8851 msclust.xml --- a/msclust.xml Fri Dec 19 15:30:13 2014 +0100 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,356 +0,0 @@ - - Extracts fragmentation spectra from aligned data - - - MsClust.jar - -peaksFileName $inputPeaks - -dataType $dataType - -imputationMethod $imputationMethod.type - #if $imputationMethod.type == "valueRange" - -rangeUpperLimit $imputationMethod.rangeUpperLimit - #end if - -plInputFormat $plInputFormat - -potDensFuncType $potDensFuncType.type - -centerSelectionType $centerSelectionType.type - -clusteringType $clusteringType.type - -neighborhoodWindowSize $potDensFuncType.pdf_neighborhoodWindowSize - -clusterSearchStopCriterium $centerSelectionType.cs_stop_criterion - -pearsonDistTreshold $potDensFuncType.pdf_pears_treshold - -pearsonTresholdConfidence $potDensFuncType.pdf_pears_conf - -pearsonPDReductionThreshold $centerSelectionType.cs_pears_pd_reductionTreshold - -pearsonPDReductionSlope $centerSelectionType.cs_pears_pd_reductionSlope - -rtDistTolUnit $potDensFuncType.rt_dist_tol_unit.type - -rtDistTol $potDensFuncType.rt_dist_tol_unit.pdf_rt_toler - -rtDistanceConfidence $potDensFuncType.pdf_scan_conf - #if $clusteringType.type == "original" - -clustMembershipCutoff $clusteringType.clust_membership_cutoff - #end if - -centrotypesOut $centrotypesOut - -simOut $simOut - -micOut $micOut - -mspOut $mspOut - -classOut $classOut - -outReport $htmlReportFile - -outReportPicturesPath $htmlReportFile.files_path - #if $clusteringType.type == "fuzzyCMeans" - -fcmMembershipWeightingExponent $clusteringType.fcmMembershipWeightingExponent - -fcmStopCriterion $clusteringType.fcmStopCriterion - -fcmCorrelationWeight $clusteringType.fcmCorrelationWeight - -fcmFinalAssemblyType $clusteringType.finalClusterAssembly.type - #if $clusteringType.finalClusterAssembly.type == "membershipBased" - -fcmMembershipCutoff $clusteringType.finalClusterAssembly.fcmMembershipCutoff - #end if - #end if - -verbose "false" - #if $advancedSettings.settings == True - -advancedSettings YES - -saturationLimit $advancedSettings.saturationLimit - -sampleSelectionSortType $advancedSettings.sampleSelectionSortType - -simSelectionAlgorithm $advancedSettings.simSelectionAlgorithm - -simMassFilter "$advancedSettings.simMassFilter" - -simMembershipThreshold $advancedSettings.simMembershipThreshold - -simSaturationThreshold $advancedSettings.simSaturationThreshold - -simAbsenseThreshold $advancedSettings.simAbsenseThreshold - -micMembershipThreshold $advancedSettings.micMembershipThreshold - -peakIntensityCorrectionAlgorithm $advancedSettings.peakIntensityCorrectionAlgorithm - #else - -advancedSettings YES - -sampleSelectionSortType SIM_INTENSITY - -peakIntensityCorrectionAlgorithm CORRELATION_BASED - #end if - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ( summaryReport == True ) - - - - - - - - - -.. class:: infomark - -This tool extracts spectra from ion-wise aligned MS(/MS) results. It uses expression profiles and -retention times of the putative ions to cluster them. Each cluster is then used to generate -one spectrum containing the clustered ions (peaks). - -.. image:: msclust_summary.png - - ------ - -**Input** - -The input file should contain the following columns (in this order), followed by the sample intensity columns (one column with the -intensity value for each sample): - -*ScanNR* - -*Ret(umin)* - -*Mass(uD)* - -*(Optional)retentionMean* - -*(only required if retentionMean is present)retentionSD* - -*N sample intensity columns...* - - ------ - -**Output** - -This tools returns a number of ouptut files and a small report. - -**Parameters index** - - -*Select the approach used for imputing missing values:* only select this if you have used a specific method to -fill in the data gaps in the input file. One example is replacing zero values by some randomly generated low value. -If MeTot is chosen, then a value is considered generated if: the value contains a dot '.' and some number -other than 0 (zero) after the dot. - -*Effective Peaks:* Neighborhood window size to consider when calculating density. Smaller values increase -performance but are less reliable. - -*Peak Width, in scans:* Scan window width of scans to consider 'close'. One can see this as the -'tolerated variation in scans' for the apex positions of the fragment peaks composing a cluster. -Note: if MetAlign was used, this is the variation *after* pre-processing by MetAlign. - -*Peak Width confidence:* The higher the confidence, the stricter the threshold. - -*Correlation threshold (0.0 - 1.0):* Tolerance center for pearson distance calculation. The higher this value, -the higher the correlation between 2 items has to be for them to be considered 'close'. - -*Correlation threshold confidence:* The higher the confidence, the stricter the threshold. `More...`__ - -*Potential Density reduction (0.0 - 1.0):* Reduction tolerance center for pearson distance calculation. -The higher this value, the less the low correlated items get reduced, getting a chance to form a cluster of their own. - -*Potential Density reduction softness:* Reduction curve slope for pearson distance tolerance. Lower -values = stricter separation at the value determined in 'Potential Density reduction' above -(TODO review this comment). - -*Stop Criterion:* When to stop reducing and looking for new clusters. Lower values = more iterations - -.. __: javascript:window.open('.. image:: confidence_and_slope_params_explain.png'.replace('.. image:: ', ''),'popUpWindow','height=700,width=800,left=10,top=10,resizable=yes,scrollbars=yes,toolbar=yes,menubar=no,location=no,directories=no,status=yes') - - ------ - -**Output files described below** - ------ - -*SPECTRA:* this file can be submitted to NIST for identification of the spectra. - -`Click here for more details on the Sample selection and Spectrum peak intensity correction algorithm parameters related to SPECTRA generation`_ - -.. _Click here for more details on the Sample selection and Spectrum peak intensity correction algorithm parameters related to SPECTRA generation: javascript:window.open('.. image:: sample_sel_and_peak_height_correction.png'.replace('.. image:: ', ''),'popUpWindow','height=700,width=800,left=10,top=10,resizable=yes,scrollbars=yes,toolbar=yes,menubar=no,location=no,directories=no,status=yes') - ------ - -*MIC:* stands for Measured Ions Count -> it contains, for each cluster, the sum of the ion count -values (corrected by their membership) for all MEASURED cluster ions in the given sample. - -The MIC for a **cluster i** in **sample s**, where **cluster i** has **n** members is thus: - -sum ( [intensity of member n in **sample s**] x [membership value of member n in **cluster i** ] ) - ------ - -*SIM:* stands for Selective Ion Mode -> it contains, for each cluster, the intensity values of the -most representative member ion peak of this cluster. The most representative member peak is the one with the -highest membership*average_intensity. This definition leads to conflicts as a peak can have a -membership in two or more clusters. The assignment of a SIM peak to a cluster depends on -the configured data type (LC or GC-MS). NB: this can be overruled in the "advanced settings": - -(1) LC-MS SIM: select SIM peak only once and for the centrotype in which this specific mass has its -highest membership; for neighboring centrotypes use its "second best SIM", etcetera. In other words, -if the SIM peak has been identified as the SIM in more than 1 cluster, assign as SIM to the cluster -with highest membership. Continue searching for other SIM peaks to assign to the other clusters until -all ambiguities are solved. - -(2) GC-MS SIM: the SIM peak can be "shared" by multiple clusters. However, the intensity values are corrected -by the membership value of the peak in the cluster in case the SIM peak is "shared". If the SIM peak is not -"shared" then the "raw" intensity values of the SIM peak are recorded in the SIM file. - -`Click here for more details on the SIM output file`_ - -.. _Click here for more details on the SIM output file: javascript:window.open('.. image:: sample_SIM.png'.replace('.. image:: ', ''),'popUpWindow','height=700,width=800,left=10,top=10,resizable=yes,scrollbars=yes,toolbar=yes,menubar=no,location=no,directories=no,status=yes') - - -**References** - -If you use this Galaxy tool in work leading to a scientific publication please -cite the following papers: - -Y. M. Tikunov, S. Laptenok, R. D. Hall, A. Bovy, and R. C. H. de Vos (2012). -MSClust: a tool for unsupervised mass spectra extraction of -chromatography-mass spectrometry ion-wise aligned data -http://dx.doi.org/10.1007%2Fs11306-011-0368-2 - - - 10.1007%2Fs11306-011-0368-2 - - - - -