Mercurial > repos > galaxyp > msstats

<tool id="msstats" name="MSstats" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@">
    <description>statistical relative protein significance analysis in DDA, SRM and DIA Mass Spectrometry</description>
    <macros>
        <token name="@TOOL_VERSION@">4.0.0</token>
        <token name="@VERSION_SUFFIX@">1</token>
        <xml name="useUniquePeptide">
            <param name="useUniquePeptide" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Remove peptides that are assigned for more than one proteins"/>
        </xml>
        <xml name="summaryforMultipleRows">
            <param name="summaryforMultipleRows" type="select" label="Summary for MultipleRows" help="When there are multiple measurements for certain feature and certain run, use highest or sum of all">
                <option value="max" selected="true">max</option>
                <option value="sum">sum</option>
            </param>
        </xml>
        <xml name="fewMeasurements">
            <param name="fewMeasurements" type="select" label="Features with few measurements " help="Remove the features that have 1 or 2 measurements across runs or keep all features or keep all features (the latter could give an error in fitting the statistical model)">
                <option value="remove" selected="true">remove</option>
                <option value="keep">keep</option>
            </param>
        </xml>
        <xml name="removeProtein_with1Peptide">
            <param name="removeProtein_with1Peptide" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove the proteins which have only 1 peptide and charge"/>
        </xml>

    </macros>
    <xrefs>
        <xref type="bio.tools">msstatstmt</xref>
    </xrefs>
    <requirements>
        <requirement type="package" version="@TOOL_VERSION@">bioconductor-msstats</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
        cat '$msstats_script' > '$r_script' &&
        Rscript '$msstats_script'
    ]]></command>
    <configfiles>
        <configfile name="msstats_script"><![CDATA[

library('MSstats', warn.conflicts = F, quietly = T, verbose = F)

#if $input.input_src == 'MSstats'

  #if $input.msstats_input.is_of_type('csv')
raw <- read.csv("$input.msstats_input")
  #else
raw <- read.table("$input.msstats_input", sep="\t", header=TRUE)
  #end if

#elif $input.input_src == 'MaxQuant'
\# Read in MaxQuant files
mq_evidence <- read.table("$input.evidence", sep="\t", header=TRUE)

mq_proteinGroups <- read.table("$input.proteinGroups", sep="\t", header=TRUE)

\# Read in annotation including condition and biological replicates per run.
\# Users should make this annotation file. It is not the output from MaxQuant.
  #if $input.annotation.is_of_type('csv')
	annot <- read.csv("$input.annotation", header=TRUE)
  #else
	annot <- read.table("$input.annotation", sep="\t", header=TRUE)
  #end if

raw <- MaxQtoMSstatsFormat(evidence=mq_evidence,
                           proteinGroups=mq_proteinGroups,
                           annotation=annot,
                           proteinID="$input.proteinID",
                           useUniquePeptide=$input.input_options.useUniquePeptide,
                           summaryforMultipleRows=$input.input_options.summaryforMultipleRows,
                           fewMeasurements="$input.input_options.fewMeasurements",
                           removeMpeptides=$input.input_options.removeMpeptides,
                           removeOxidationMpeptides=$input.input_options.removeOxidationMpeptides,
                           removeProtein_with1Peptide=$input.input_options.removeProtein_with1Peptide,
                           use_log_file = TRUE,
                           append = TRUE,
                           log_file_pat = "log.txt")

#elif $input.input_src == 'OpenMS'

  #if $input.openms_input.is_of_type('csv')
	input <- read.csv("$input.openms_input", header=TRUE)
  #else
	input <- read.table("$input.openms_input", sep="\t", header=TRUE)
  #end if

   #if $input.annotation:
       #if $input.annotation.is_of_type('csv')
	    annot <- read.csv("$input.annotation", header=TRUE)
       #else
	    annot <- read.table("$input.annotation", sep="\t", header=TRUE)
       #end if
   #end if

    raw <- OpenMStoMSstatsFormat(input,
                             #if $input.annotation:
                             annotation=annot,
                             #end if
                             useUniquePeptide=$input.input_options.useUniquePeptide,
                             summaryforMultipleRows=$input.input_options.summaryforMultipleRows,
                             fewMeasurements="$input.input_options.fewMeasurements",
                             removeProtein_with1Feature=$input.input_options.removeProtein_with1Feature,
                             use_log_file = TRUE,
                             append = TRUE,
                             log_file_pat = "log.txt")


#elif $input.input_src == 'OpenSWATH'

  #if $input.openswath_input.is_of_type('csv')
	input <- read.csv("$input.openswath_input", header=TRUE)
  #else
	input <- read.table("$input.openswath_input", sep="\t", header=TRUE)
  #end if
  #if $input.annotation.is_of_type('csv')
	annot <- read.csv("$input.annotation", header=TRUE)
  #else
	annot <- read.table("$input.annotation", sep="\t", header=TRUE)
  #end if

raw <- OpenSWATHtoMSstatsFormat(input,
                                annotation=annot,
                                filter_with_mscore=$input.input_options.filter_with_mscore,
                                mscore_cutoff=$input.input_options.mscore_cutoff,
                                useUniquePeptide=$input.input_options.useUniquePeptide,
                                fewMeasurements="$input.input_options.fewMeasurements",
                                removeProtein_with1Feature=$input.input_options.removeProtein_with1Feature,
                                summaryforMultipleRows=$input.input_options.summaryforMultipleRows,
                                use_log_file = TRUE,
                                append = TRUE,
                                log_file_pat = "log.txt")

#elif $input.input_src == 'Skyline'

  #if $input.skyline_input.is_of_type('csv')
	input <- read.csv("$input.skyline_input", header=TRUE)
  #else
	input <- read.table("$input.skyline_input", sep="\t", header=TRUE)
  #end if

  #if $input.annotation:
      #if $input.annotation.is_of_type('csv')
	    annot <- read.csv("$input.annotation", header=TRUE)
      #else
	    annot <- read.table("$input.annotation", sep="\t", header=TRUE)
      #end if
  #end if

raw <- SkylinetoMSstatsFormat(input,
                  	        #if $input.annotation:
				annotation = annot,
				#end if
				removeiRT = $input.input_options.removeiRT,
				filter_with_Qvalue = $input.input_options.filter_with_Qvalue,
				qvalue_cutoff = $input.input_options.qvalue_cutoff,
				useUniquePeptide = $input.input_options.useUniquePeptide,
				fewMeasurements="$input.input_options.fewMeasurements",
				removeOxidationMpeptides = $input.input_options.removeOxidationMpeptides,
				removeProtein_with1Feature = $input.input_options.removeProtein_with1Feature,
				use_log_file = TRUE,
				append = TRUE,
		                log_file_pat = "log.txt")

#end if

processed_data <- dataProcess(raw,
                          logTrans=$dp_options.logTrans,
                          normalization="$dp_options.norm.normalization",
                          #if $dp_options.norm.normalization == 'globalStandards'
                          nameStandards=c($dp_options.norm.nameStandards),
                          #end if
                          featureSubset="$dp_options.features.featureSubset",
                          #if $dp_options.features.featureSubset == 'topN'
                          n_top_feature=$dp_options.features.n_top_feature,
                          #end if
                          #if $dp_options.features.featureSubset == 'highQuality'
                          remove_uninformative_feature_outlier=$dp_options.features.remove_uninformative_feature_outlier,
                          #end if
                          summaryMethod="$dp_options.summarize.summaryMethod",
                          #if $dp_options.summarize.summaryMethod == 'TMP'
                          MBimpute=$dp_options.summarize.MBimpute,
                          remove50missing=$dp_options.summarize.remove50missing,
                          #end if
                          #if $dp_options.summarize.summaryMethod == 'linear'
                          equalFeatureVar=$dp_options.summarize.equalFeatureVar,
                          #end if
                          #if $dp_options.censoredInt == 'NULL'
                          censoredInt=NULL,
                          #else
                          censoredInt="$dp_options.censoredInt",
                          #end if
                          #if $dp_options.maxQuantileforCensored == ''
                          maxQuantileforCensored = NULL,
                          #else
                          maxQuantileforCensored = $dp_options.maxQuantileforCensored,
                          #end if
                          use_log_file = TRUE,
                          append = TRUE,
                          log_file_pat = "log.txt")


#if 'raw_data' in $dp_options.selected_outputs
write.table(raw, "raw.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
#end if

#if 'featurelevel_data' in $dp_options.selected_outputs
write.table(processed_data\$FeatureLevelData, "featurelevelData.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
#end if

#if 'proteinlevel_data' in $dp_options.selected_outputs
write.table(processed_data\$ProteinLevelData, "proteinlevelData.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
#end if

#for $plot_type in $dp_options.out_plots_opt.selected_vis_outputs


    #if $plot_type[-4:] == "Plot"

	dataProcessPlots(data = processed_data,
			     type = '$plot_type',
			     featureName = "$dp_options.out_plots_opt.proc_plots_advanced.featureName",
			     #if $dp_options.out_plots_opt.proc_plots_advanced.ylimUp:
                            ylimUp = $dp_options.out_plots_opt.proc_plots_advanced.ylimUp,
                            #end if
                            #if $dp_options.out_plots_opt.proc_plots_advanced.ylimDown:
                            ylimDown = $dp_options.out_plots_opt.proc_plots_advanced.ylimDown,
                            #end if
                            scale = $dp_options.out_plots_opt.proc_plots_advanced.scale,
                            interval = "$dp_options.out_plots_opt.proc_plots_advanced.interval",
                            x.axis.size = $dp_options.out_plots_opt.proc_plots_advanced.x_axis_size,
                            y.axis.size = $dp_options.out_plots_opt.proc_plots_advanced.y_axis_size,
                            text.size = $dp_options.out_plots_opt.proc_plots_advanced.text_size,
                            text.angle = $dp_options.out_plots_opt.proc_plots_advanced.text_angle,
                            legend.size = $dp_options.out_plots_opt.proc_plots_advanced.legend_size,
                            dot.size.profile = $dp_options.out_plots_opt.proc_plots_advanced.dot_size_profile,
                            dot.size.condition = $dp_options.out_plots_opt.proc_plots_advanced.dot_size_condition,
                            width = $dp_options.out_plots_opt.width,
                            height = $dp_options.out_plots_opt.height,
                            #if $dp_options.out_plots_opt.which_Protein.select == 'list'
                                which.Protein = unlist(read.table("$dp_options.out_plots_opt.which_Protein.protein_list", sep = "\n", header = FALSE), use.names = FALSE),
                            #elif $dp_options.out_plots_opt.which_Protein.select == 'allonly'
                            	#if $plot_type == "QCPlot"
                                    which.Protein = "allonly",
                                #else
                                    which.Protein = "all",
                                #end if
                            #else
                                which.Protein = "all",
                            #end if
                            remove_uninformative_feature_outlier = $dp_options.out_plots_opt.proc_plots_advanced.remove_uninformative_feature_outlier,
                            address="MSStats_only_")
    #end if
#end for

## Quantifiaction
#if 'quant_sample_matrix' in $dp_options.selected_outputs
sampleQuantMatrix <- quantification(processed_data,  type="Sample", use_log_file = TRUE, append = TRUE, log_file_pat = "log.txt")
write.table(sampleQuantMatrix, "SampleQuantificationMatrix.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
#end if

#if 'quant_sample_long' in $dp_options.selected_outputs
sampleQuantLong <- quantification(processed_data,  type="Sample", format="long", use_log_file = TRUE, append = TRUE,  log_file_pat = "log.txt")
write.table(sampleQuantLong, "SampleQuantificationLong.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
#end if

#if 'quant_group_matrix' in $dp_options.selected_outputs
groupQuantMatrix <- quantification(processed_data,  type="Group", use_log_file = TRUE, append = TRUE,  log_file_pat = "log.txt")
write.table(groupQuantMatrix, "GroupQuantificationMatrix.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
#end if

#if 'quant_group_long' in $dp_options.selected_outputs
groupQuantLong <- quantification(processed_data,  type="Group", format="long", use_log_file = TRUE, append = TRUE,  log_file_pat = "log.txt")
write.table(groupQuantLong, "GroupQuantificationLong.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
#end if

## Group Comparison
#if $group.group_comparison == 'yes'
\# Group Comparison
  #if $group.comparison_matrix.is_of_type('csv')
comp_matrix <- read.csv("$group.comparison_matrix", header=TRUE)
  #else
comp_matrix <- read.table("$group.comparison_matrix", sep="\t", header=TRUE)
  #end if

## first columns contains comparison names, use as row name
comparison <- comp_matrix[,-1]
row.names(comparison) <- as.character(comp_matrix[,1])

## order of conditions has to be the same as they appear in the levels function
comparison <- as.matrix(comparison[levels(processed_data\$FeatureLevelData\$GROUP)])

## perform group comparison
comparisons <- groupComparison(contrast.matrix = comparison, data = processed_data, use_log_file = TRUE, append = TRUE, log_file_pat = "log.txt")

#if 'fittedmodel' in $group.select_outputs
    capture.output(print(comparisons\$FittedModel), file="ComparisonFittedModel.txt")
#end if


  #if 'comparison_result' in $group.select_outputs
write.table(comparisons\$ComparisonResult, "ComparisonResult.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
  #end if

  #if 'model_qc' in $group.select_outputs
write.table(comparisons\$ModelQC, "ModelQC.tsv", sep = "\t", quote = F, row.names = F, dec = ".")
  #end if

## Visualizations:

#for $plot_type in $group.comparison_plots_opt.select_comparison_plots


    #if $plot_type == "QQPlots" or $plot_type == "ResidualPlots"

	modelBasedQCPlots(data = comparisons,
				type = "$plot_type",
				axis.size = $group.comparison_plots_opt.comparison_vis_options.axis_size,
				dot.size = $group.comparison_plots_opt.comparison_vis_options.dot_size,
				width = $group.comparison_plots_opt.width,
				height = $group.comparison_plots_opt.height,
				#if $group.comparison_plots_opt.which_Protein.select != 'list'
		                which.Protein = "$group.comparison_plots_opt.which_Protein.select",
		                #else
		                which.Protein = unlist(read.table("$group.comparison_plots_opt.which_Protein.protein_list", sep = "\n", header = FALSE), use.names = FALSE),
		                #end if
				address="MSStats_group_")


    #elif $plot_type == "VolcanoPlot" or $plot_type == "Heatmap" or $plot_type == "ComparisonPlot"

	groupComparisonPlots(data = comparisons\$ComparisonResult,
				type = "$plot_type",
				sig = $group.comparison_plots_opt.comparison_vis_options.sig,
				#if $group.comparison_plots_opt.comparison_vis_options.FCcutoff:
				FCcutoff = $group.comparison_plots_opt.comparison_vis_options.FCcutoff,
				#end if
				logBase.pvalue = $group.comparison_plots_opt.comparison_vis_options.logBase_pvalue,
			        #if $group.comparison_plots_opt.comparison_vis_options.ylimUp:
				ylimUp = $group.comparison_plots_opt.comparison_vis_options.ylimUp,
				#end if
				#if $group.comparison_plots_opt.comparison_vis_options.ylimDown:
				ylimDown = $group.comparison_plots_opt.comparison_vis_options.ylimDown,
				#end if
				x.axis.size = $group.comparison_plots_opt.comparison_vis_options.x_axis_size,
				y.axis.size = $group.comparison_plots_opt.comparison_vis_options.y_axis_size,
				dot.size = $group.comparison_plots_opt.comparison_vis_options.dot_size,
				text.size = $group.comparison_plots_opt.comparison_vis_options.text_size,
				text.angle = $group.comparison_plots_opt.comparison_vis_options.text_angle,
				legend.size = $group.comparison_plots_opt.comparison_vis_options.legend_size,
				ProteinName = $group.comparison_plots_opt.comparison_vis_options.ProteinName,
				colorkey = $group.comparison_plots_opt.comparison_vis_options.colorkey,
				numProtein = $group.comparison_plots_opt.comparison_vis_options.numProtein,
				clustering = "$group.comparison_plots_opt.comparison_vis_options.clustering",
				width = $group.comparison_plots_opt.width,
				height =  $group.comparison_plots_opt.height,
				#if $group.comparison_plots_opt.which_Protein.select != 'list'
		                which.Protein = "$group.comparison_plots_opt.which_Protein.select",
		                #else
		                which.Protein = unlist(read.table("$group.comparison_plots_opt.which_Protein.protein_list", sep = "\n", header = FALSE), use.names = FALSE),
		                #end if
		                #if $group.comparison_plots_opt.comparison_vis_options.which_Comparison.select != 'list'
		                which.Comparison = "$group.comparison_plots_opt.comparison_vis_options.which_Comparison.select",
		                #else
		                which.Comparison = unlist(read.table("$group.comparison_plots_opt.comparison_vis_options.which_Comparison.comparison_list", sep = "\n", header = FALSE), use.names = FALSE),
		                #end if
		                address="MSStats_group_")


     #end if
#end for

#end if
        ]]></configfile>
    </configfiles>
    <inputs>
        <conditional name="input">
            <param name="input_src" type="select" label="input source">
                <option value="MSstats">MStats 10 column format</option>
                <option value="MaxQuant">MaxQuant</option>
                <option value="OpenMS">OpenMS</option>
                <option value="OpenSWATH">OpenSWATH</option>
                <!--option value="DIAUmpire">DIA-Umpire</option-->
                <option value="Skyline">Skyline</option>
            </param>
            <when value="MSstats">
                <param name="msstats_input" type="data" format="tabular,csv" label="MSstats 10-column input"/>
            </when>
            <when value="MaxQuant">
                <param name="evidence" type="data" format="tabular,csv" label="evidence.txt - feature-level data"/>
                <param name="proteinGroups" type="data" format="tabular,csv" label="proteinGroups.txt - protein-level data" help="It needs to match protein group ID. If not selected use Proteins in 'evidence.txt'"/>
                <param name="annotation" type="data" format="tabular,csv" label="annotation file" help="Columns: Raw.file, Condition (the name of the condition is not allowed to start with a number or contain any special characters.), BioReplicate, Run, IsotopeLabelType information"/>

                <param name="proteinID" type="select" label="Select Protein ID in evidence.txt">
                    <option value="Proteins">Protein column</option>
                    <option value="Leading.razor.protein">Leading razor protein column</option>
                </param>
                <section name="input_options" title="MaxQtoMSstatsFormat Options" expanded="false">
                    <expand macro="useUniquePeptide"/>
                    <expand macro="summaryforMultipleRows"/>
                    <expand macro="fewMeasurements"/>
                    <param name="removeMpeptides" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove the peptides including 'M' sequence"/>
                    <param name="removeOxidationMpeptides" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove the peptides including Oxidized 'M' sequence"/>
                    <expand macro="removeProtein_with1Peptide"/>
                </section>
            </when>
            <when value="OpenMS">
                <param name="openms_input" type="data" format="tabular,csv" label="OpenMS input (e.g. output of MSstatsConverter)"/>
                <param name="annotation" type="data" format="tabular,csv" optional="true" label="If annotation is not yet complete in OpenMS, use annotation with Raw.file, Condition (the name of the condition is not allowed to start with a number or contain any special characters), BioReplicate, and Runinformation"/>
                <section name="input_options" title="OpenMStoMSstatsFormat Options" expanded="false">
                    <expand macro="useUniquePeptide"/>
                    <expand macro="summaryforMultipleRows"/>
                    <expand macro="fewMeasurements"/>
                    <param name="removeProtein_with1Feature" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove the proteins which have only 1 peptide and charge"/>
                </section>
            </when>
            <when value="OpenSWATH">
                <param name="openswath_input" type="data" format="tabular,csv" label="OpenSWATH_input"/>
                <param name="annotation" type="data" format="tabular,csv" label="annotation file"/>
                <section name="input_options" title="OpenSWATHtoMSstatsFormat Options" expanded="false">
                    <param name="filter_with_mscore" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Remove the peptides including 'M' sequence"/>
                    <param name="mscore_cutoff" type="float" value="0.01" min="0" max="1.0" label="m_score cutoff"/>
                    <expand macro="useUniquePeptide"/>
                    <expand macro="fewMeasurements"/>
                    <expand macro="summaryforMultipleRows"/>
                    <param name="removeProtein_with1Feature" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove the proteins which have only 1 peptide and charge"/>
                </section>
            </when>
	    <when value="Skyline">
		<param name="skyline_input" type="data" format="tabular,csv" label="Skyline input"/>
	        <param name="annotation" type="data" optional="true" format="tabular,csv" label="annotation file"/>
	        <section name="input_options" title="SkylinetoMSstatsFormat Options" expanded="false">
	            <param name="removeiRT" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Remove iRT" help="Yes (default) will remove the proteins or peptides which are labeld ’iRT’ in ’StandardType’ column. No will keep them."/>
	            <param name="filter_with_Qvalue" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Filter with Qvalue" help="Yes (default) will filter out the intensities that have greater than qvalue_cutoff in Detection QValue column. Those intensities will be replaced with zero and will be considered as censored missing values for imputation purpose."/>
	            <param name="qvalue_cutoff" type="float" value="0.01" min="0" max="1.0" label="Cutoff for Detection QValue."/>
	            <expand macro="removeProtein_with1Peptide"/>
	            <expand macro="useUniquePeptide"/>
                    <expand macro="fewMeasurements"/>
	            <param name="removeOxidationMpeptides" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove Oxidation M peptides" help="Yes will remove the peptides including ’oxidation (M)’ in modification."/>
	            <param name="removeProtein_with1Feature" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove proteins with 1 feature" help="Yes will remove the proteins which have only 1 peptide and charge."/>
	        </section>
	    </when>
            </conditional>

        <section name="dp_options" title="dataProcess Options" expanded="true">
            <param name="selected_outputs" type="select" display="checkboxes" multiple="true" label="Select outputs">
		<option value="log" selected="true">MSstats log</option>
		<option value="r_script" selected="false">MSstats Rscript</option>
		<option value="raw_data" selected="true">MSstats RawData</option>
		<option value="featurelevel_data" selected="true">MSstats FeatureLevelData</option>
		<option value="proteinlevel_data" selected="false">MSstats ProteinLevelData</option>
		<option value="quant_sample_matrix" selected="false">Sample Quantification Matrix Table</option>
		<option value="quant_sample_long" selected="false">Sample Quantification Long Table</option>
		<option value="quant_group_matrix" selected="true">Group Quantification Matrix Table</option>
		<option value="quant_group_long" selected="false">Group Quantification Long Table</option>
	    </param>
            <param name="logTrans" type="select" label="logarithm transformation of intensities with base 2 or 10." help="Intensities for original intensity between 0 and 1 will be replaced with zero value after normalization.">
                <option value="2" selected="true">2</option>
                <option value="10">10</option>
            </param>
            <conditional name="norm">
                <param name="normalization" type="select" label="Normalization to remove systematic bias between MS runs">
                    <option value="equalizeMedians" selected="true">equalizeMedians - represents constant normalization</option>
                    <option value="quantile">quantile - quantile normalization</option>
                    <option value="globalStandards">globalStandards - normalization with global standards proteins</option>
                    <option value="FALSE">false - no normalization is performed</option>
                </param>
                <when value="equalizeMedians"/>
                <when value="quantile"/>
                <when value="globalStandards">
                    <param name="nameStandards" type="text" value="" label="global standard peptide names" help="Peptide names should be double-quoted and separated by commas">
                        <validator type="empty_field" />
                        <validator type="regex" message="double-quoted names separated by commas"><![CDATA[^".+"(,".+")*$]]></validator>
                    </param>
                </when>
                <when value="FALSE"/>
            </conditional>
            <conditional name="features">
                <param name="featureSubset" type="select" label="Feature Subset">
                    <option value="all" selected="true">Use all features that the data set has</option>
                    <option value="top3">Use the top 3 features which have highest average of log2(intensity) across runs</option>
                    <option value="topN">Use the top N features which have highest average of log2(intensity) across runs</option>
                    <option value="highQuality">High quality: Flag uninformative feature and outliers</option>
                </param>
                <when value="all"/>
                <when value="top3"/>
                <when value="topN">
                    <param name="n_top_feature" type="integer" value="3" min="1" label="The number of top features for Feature Subset"/>
                </when>
                <when value="highQuality">
                    <param name="remove_uninformative_feature_outlier" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove features flagged with uninformative feature quality"/>
                </when>
            </conditional>
            <conditional name="summarize">
                <param name="summaryMethod" type="select" label="Summary Method">
                    <option value="TMP" selected="true">TMP - Tukey's median polish</option>
                    <option value="linear" selected="true">linear - linear mixed model</option>
                </param>
                <when value="TMP">
                    <param name="MBimpute" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Impute Missing Values" help="Yes: inserts 'NA' or '0' (depending on censored intensity), No: uses the values assigned by cutoff value for censoring"/>
                    <param name="remove50missing" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove runs which have more than 50% missing values"/>
                </when>
                <when value="linear">
                    <param name="equalFeatureVar" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Account for heterogeneous variation among intensities from different features" help="Yes: assumes equal variance among intensities from features. No: means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features"/>
                </when>
            </conditional>
            <param name="censoredInt" type="select" label="Censored intensity">
                <help>The processing tools report missing values differently. This option is for distinguish which value should be considered as missing, and further whether it is censored or at random. Skyline and OpenSWATH input should use '0'. MaxQuant input should use 'NA'</help>
                <option value="NA" selected="true">NA - Assume that all 'NA's in 'Intensity' column are censored</option>
                <option value="0">0 - Use zero intensities '0' as censored intensity</option>
                <!--option value="NULL">NULL - Assume all NA intensites are randomly missing</option-->
            </param>
            <param name="maxQuantileforCensored" type="float" optional="true" value="0.999" min="0" max="1.0" label="Maximum quantile for deciding censored missing values." help="If you don't want to apply the threshold of noise intensity in your data, remove the value (empty field)"/>


            <section name="out_plots_opt" title="DataProcess Plot Options" expanded="false">
            <param name="selected_vis_outputs" type="select" display="checkboxes" multiple="true" label="Select visualization outputs">
                <option value="QCPlot" selected="false">MSstats QCPlot</option>
                <option value="ProfilePlot" selected="false">MSstats ProfilePlot</option>
                <option value="profile_wsum_plot" selected="false">MSstats ProfilePlot_wSummarization</option>
                <option value="ConditionPlot" selected="false">MSstats ConditionPlot</option>
            </param>
            <conditional name="which_Protein">
                <param name="select" type="select" label="Select protein IDs to draw plots">
                    <option value="all" selected="true">generate all plots for each protein</option>
                    <option value="allonly">Option for QC plot: "allonly" will generate one QC plot with all proteins</option>
                    <option value="list">Protein IDs as tabular input</option>
                </param>
                <when value="all"/>
                <when value="allonly"/>
                <when value="list">
                    <param name="protein_list" type="data" format="tabular" label="List of proteins"/>
                </when>
            </conditional>
            <param name="width" type="integer" min="1" value="8" label="Width of the saved pdf file"/>
            <param name="height" type="integer" min="1" value="5" label="Height of the saved pdf file"/>

                <section name="proc_plots_advanced" title="Advanced visualization parameters" expanded="false">
            	<param name="featureName" type="select" display="radio" label="Feature name for Profile Plot" help="Transition means printing feature legend intransition-level; Peptide means printing feature legend in peptide-level; NA means no feature legend printing.">
                    <option value="Transition" selected="true">Transition</option>
                    <option value="Peptide">Peptide</option>
                    <option value="NA">NA</option>
            	</param>
             	<param name="ylimUp" type="float" optional="true" label="For all three plots, upper limit for y-axis." help="Empty (default) for Profile Plot and QC Plot uses the upper limit as rounded off maximum of log2(intensities) after normalization + 3; for Condition Plot maximum of log ratio + SD or CI. Alternatively, insert specific value of y-axis limit."/>
            	<param name="ylimDown" type="float" optional="true" label="For all three plots, lower limit for y-axis in the log scale" help="Empty (default) for Profile Plot and QCPlot uses 0; for Condition Plot is minimum of log ratio - SD or CI. Alternatively, insert specific value of lower y-axis limit.  "/>

            	<param name="scale" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Scale for Condition Plot" help=" No (Default) means each conditional level is not scaled  at  x-axis according to its actual value (equal space at x-axis). Yes means each conditional level is scaled at x-axis according to its actual value (unequal space at x-axis)."/>
            	<param name="interval" type="select" display="radio" label="Interval for Condition Plot" help="CI (default) uses confidence interval with 0.95 significant level for the width of error bar. SD uses standard deviation for the width of error bar.">
                    <option value="CI" selected="true">CI - confidence interval</option>
                    <option value="SD">SD - standard deviation</option>
            	</param>
            	<param name="x_axis_size" type="integer" min="1" value="10" label="Size of x-axis labeling for 'Run' in Profile Plot and QC Plot, and 'Condition' in Condition Plot"/>
            	<param name="y_axis_size" type="integer" min="1" value="10" label="Size of y-axis labeling"/>
            	<param name="text_size" type="integer" min="1" value="4" label="Size of labeling for feature names in normal QQPlots separately for each feature and size of labels represented each condition at the top of graph in Profile Plot and QC plot."/>
            	<param name="text_angle" type="integer" min="0" max="360" value="90" label="Angle of labels represented each condition at the top of graph in Profile Plot and QC plot or x-axis labeling in Condition plot."/>
            	<param name="legend_size" type="integer" min="1" value="7" label="Size of  feature names in residual plots and feature legend (transition-level or peptide-level) above graph in Profile Plot. "/>
            	<param name="dot_size_profile" type="integer" min="1" value="2" label="Size of dots in Profile plot"/>
            	<param name="dot_size_condition" type="integer" min="1" value="3" label="Size of dots in Condition plot"/>

            	<param name="remove_uninformative_feature_outlier" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="false" label="Remove uninformative feature outlier in profile plots" help="It only works after  when feature subset high Quality was used in dataProcess options. Yes allows to remove 1) the features are flagged in the column, feature_quality=Uninformative which are features with bad quality, 2) outliers that are flagged in the column, is_outlier=TRUE in profile plots. No (default) shows all features and intensities in profile plots."/>
            	</section>
            </section>
        </section>

        <conditional name="group">
            <param name="group_comparison" type="select" label="Compare Groups">
                <option value="no">No</option>
                <option value="yes">Yes</option>
            </param>
            <when value="no"/>
            <when value="yes">
                <param name="comparison_matrix" type="data" format="tabular,csv" label="Comparison Matrix"/>
                <param name="select_outputs" type="select" display="checkboxes" multiple="true" label="Select outputs">
                    <option value="comparison_result" selected="true">MSstats ComparisonResult.tsv</option>
                    <option value="fittedmodel" selected="false">MSstats ComparisonFittedModel.txt</option>
                    <option value="model_qc" selected="false">MSstats ModelQC.tsv</option>
                </param>

		<section name="comparison_plots_opt" title="Comparison Visualization Options" expanded="false">
                <param name="select_comparison_plots" type="select" display="checkboxes" multiple="true" label="Select visualization outputs">
                    <option value="VolcanoPlot" selected="false">MSstats VolcanoPlot</option>
                    <option value="ComparisonPlot" selected="false">MSstats ComparisonPlot</option>
                    <option value="QQPlots" selected="false">MSstats QQPlot</option>
                    <option value="ResidualPlots" selected="false">MSstats ResidualPlot</option>
                    <option value="Heatmap" selected="false">MSstats Heatmap (only possible for at least 2 comparisons)</option>
                </param>
                <param name="width" type="integer" min="1" value="8" label="Width of the saved pdf file"/>
		<param name="height" type="integer" min="1" value="5" label="Height of the saved pdf file"/>
		    <conditional name="which_Protein">
			<param name="select" type="select" label="Select protein IDs to draw plots">
			    <option value="all" selected="true">generate all plots for each protein</option>
			    <option value="list">Protein IDs as tabular input</option>
			</param>
			<when value="all"/>
			<when value="list">
			    <param name="protein_list" type="data" format="tabular" label="List of proteins"/>
			</when>
		    </conditional>

		<section name="comparison_vis_options" title="Advanced visualization parameters">

			<param name="sig" type="float" min="0" max="1" value="0.05" label="FDR cutoff for the adjusted p-values in heatmap and volcano plot" help="Level of significance for comparison plot. 100(1-sig)% confidence interval will be drawn."/>
			<param name="FCcutoff" type="float" optional="true" label="Involve fold change cutoff or not for volcano plot or heatmap." help="Empty (default) means no fold change cutoff is applied for significance analysis. Specific value means specific fold change cutoff is applied"/>
			<param name="logBase_pvalue" type="select" label="For volcano plot or heatmap, logarithm transformation of adjusted p-valuewith base 2 or 10">
			    <option value="2">2</option>
			    <option value="10" selected="true">10</option>
			</param>
			<param name="ylimUp" type="float" optional="true" label="For all three plots, upper limit for y-axis." help="Empty (default) for volcano plot/heatmap use maximum of -log2 (adjusted p-value) or -log10 (adjusted p-value), for comparison plot uses maximum of log-fold change + CI. Alternatively, insert specific value of y-axis limit. "/>
			<param name="ylimDown" type="float" optional="true" label="For all tree plots, lower limit for y-axis in the log scale" help="Empty (default) for volcano plot/heatmap use minimum of -log2 (adjusted p-value) or -log10 (adjusted p-value), for comparison plot uses minimum of log-fold change - CI. Alternatively, insert specific value of y-axis limit.  "/>
			<param name="xlimUp" type="float" optional="true" label="For Volcano plot, the limit for x-axis" help="Empty (default) for use maximum for absolute value of log-fold change or 3 as default if maximum for absolute value of log-fold change is less than 3. Alternatively, insert specific value of y-axis limit."/>
			<param name="axis_size" type="integer" min="1" value="10" label="Size of axes labels for Residual and QQ Plots"/>
			<param name="x_axis_size" type="integer" min="1" value="10" label="Size of x-axis labeling"/>
			<param name="y_axis_size" type="integer" min="1" value="10" label="Size of y-axis labeling"/>
			<param name="dot_size" type="integer" min="1" value="3" label="Size of dots in residual plots, QQPlots, volcano plot and comparison plot."/>
			<param name="text_size" type="integer" min="1" value="4" label="Size  of Protein Name label in the graph for Volcano Plot."/>
			<param name="text_angle" type="integer" min="0" max="360" value="90" label="Angle of x-axis labels represented each comparison at the bottom of graph incomparison plot."/>
			<param name="legend_size" type="integer" min="1" value="7" label="Size of legend for color at the bottom of volcano plot. "/>
			<param name="ProteinName" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Display protein names in Volcano Plot." help="Yes (default) means protein names, which are significant, are displayed next to the points. No means no protein names are displayed."/>
			<param name="colorkey" type="boolean" truevalue="TRUE" falsevalue="FALSE" checked="true" label="Show colour key"/>
			<param name="numProtein" type="integer" min="1" value="100" max="180" label="Number of proteins which will be presented in each heatmap."/>
			<param name="clustering" type="select" label="Determines how to order proteins and comparisons. Hierarchical cluster analysis with Ward method(minimum variance) is performed.">
			   	<help>’protein’ means that protein dendrogram is computed and reordered based on protein means (the order of row is changed). ’comparison’ means comparison dendrogram is computed and reordered based on comparison means (the order of comparison is changed). ’both’ means to reorder both protein and comparison.</help>
				<option value="protein" selected="true">protein</option>
				<option value="comparison">comparison</option>
				<option value="both">both</option>
			</param>
			    <conditional name="which_Comparison">
				<param name="select" type="select" label="Select comparisons to draw plots">
				    <option value="all" selected="true">Generate all plots for each comparison</option>
				    <option value="list">Comparison names as tabular input</option>
				</param>
				<when value="all"/>
				<when value="list">
				    <param name="comparison_list" type="data" format="tabular" label="List of comparisons"/>
				</when>
			    </conditional>
		    </section>
		</section>
            </when>
        </conditional>
    </inputs>
    <outputs>
        <data name="log" format="txt" label="${tool.name} on ${on_string}: log" from_work_dir="log.txt">
            <filter>'log' in dp_options['selected_outputs']</filter>
        </data>
        <data name="r_script" format="txt" label="${tool.name} on ${on_string}: Rscript">
            <filter>'r_script' in dp_options['selected_outputs']</filter>
        </data>
        <data name="raw_data" format="tabular" label="${tool.name} on ${on_string}: RawData" from_work_dir="raw.tsv">
            <filter>'raw_data' in dp_options['selected_outputs']</filter>
        </data>
        <data name="featurelevel_data" format="tabular" label="${tool.name} on ${on_string}: FeatureLevelData" from_work_dir="featurelevelData.tsv">
            <filter>'featurelevel_data' in dp_options['selected_outputs']</filter>
            <!--actions>
                <action name="column_names" type="metadata" default="PROTEIN,PEPTIDE,TRANSITION,FEATURE,LABEL,GROUP_ORIGINAL,SUBJECT_ORIGINAL,RUN,GROUP,SUBJECT,INTENSITY,SUBJECT_NESTED,ABUNDANCE,FRACTION,originalRUN,censored" />
            </actions-->
        </data>
        <data name="proteinlevel_data" format="tabular" label="${tool.name} on ${on_string}: ProteinLevelData" from_work_dir="proteinlevelData.tsv">
            <filter>'proteinlevel_data' in dp_options['selected_outputs']</filter>
            <!--actions>
               <action name="column_names" type="metadata" default="RUN,Protein,LogIntensities,NumMeasuredFeature,MissingPercentage,more50missing,NumImputedFeature,originalRUN,GROUP,GROUP_ORIGINAL,SUBJECT_ORIGINAL,SUBJECT_NESTED,SUBJECT" />
            </actions-->
        </data>
        <data name="QCPlot" format="pdf" label="${tool.name} on ${on_string}: QCPlot" from_work_dir="MSStats_only_QCPlot.pdf">
            <filter>dp_options['out_plots_opt']['selected_vis_outputs'] and 'QCPlot' in dp_options['out_plots_opt']['selected_vis_outputs']</filter>
        </data>
        <data name="ProfilePlot" format="pdf" label="${tool.name} on ${on_string}: Profile Plot" from_work_dir="MSStats_only_ProfilePlot.pdf">
            <filter>dp_options['out_plots_opt']['selected_vis_outputs'] and 'ProfilePlot' in dp_options['out_plots_opt']['selected_vis_outputs']</filter>
        </data>
        <data name="profile_wsum_plot" format="pdf" label="${tool.name} on ${on_string}: Profile Plot with Summarization" from_work_dir="MSStats_only_ProfilePlot_wSummarization.pdf">
            <filter>dp_options['out_plots_opt']['selected_vis_outputs'] and 'profile_wsum_plot' in dp_options['out_plots_opt']['selected_vis_outputs']</filter>
        </data>
        <data name="ConditionPlot" format="pdf" label="${tool.name} on ${on_string}: Condition Plot" from_work_dir="MSStats_only_ConditionPlot.pdf">
            <filter>dp_options['out_plots_opt']['selected_vis_outputs'] and 'ConditionPlot' in dp_options['out_plots_opt']['selected_vis_outputs']</filter>
        </data>
        <data name="quant_sample_matrix" format="tabular" label="${tool.name} on ${on_string}: Sample Quantification Matrix" from_work_dir="SampleQuantificationMatrix.tsv">
            <filter>'quant_sample_matrix' in dp_options['selected_outputs']</filter>
        </data>
        <data name="quant_sample_long" format="tabular" label=" ${tool.name} on ${on_string}:Sample Quantification Long" from_work_dir="SampleQuantificationLong.tsv">
            <filter>'quant_sample_long' in dp_options['selected_outputs']</filter>
            <!--actions>
                <action name="column_names" type="metadata" default="Protein,Group_Subject,LogIntensity" />
            </actions-->
        </data>
        <data name="quant_group_matrix" format="tabular" label="${tool.name} on ${on_string}: Group Quantification Matrix" from_work_dir="GroupQuantificationMatrix.tsv">
            <filter>'quant_group_matrix' in dp_options['selected_outputs']</filter>
        </data>
        <data name="quant_group_long" format="tabular" label="${tool.name} on ${on_string}: Group Quantification Long" from_work_dir="GroupQuantificationLong.tsv">
            <filter>'quant_group_long' in dp_options['selected_outputs']</filter>
            <!--actions>
                <action name="column_names" type="metadata" default="Protein,Group,LogIntensity" />
            </actions-->
        </data>
        <data name="comparison_result" format="tabular" label="${tool.name} on ${on_string}: Comparison Result" from_work_dir="ComparisonResult.tsv">
            <filter> group['group_comparison'] == 'yes' and 'comparison_result' in group['select_outputs']</filter>
            <!--actions>
                <action name="column_names" type="metadata" default="Protein,Label,log2FC,SE,Tvalue,DF,pvalue,adj.pvalue,issue,MissingPercentage,ImputationPercentage" />
            </actions-->
        </data>
        <data name="fittedmodel" format="txt" label="${tool.name} on ${on_string}: Comparison Fitted Model" from_work_dir="ComparisonFittedModel.txt">
            <filter> group['group_comparison'] == 'yes' and 'fittedmodel' in group['select_outputs']</filter>
        </data>
        <data name="model_qc" format="tabular" label="${tool.name} on ${on_string}: Model QC" from_work_dir="ModelQC.tsv">
            <filter> group['group_comparison'] == 'yes' and 'model_qc' in group['select_outputs']</filter>
            <!--actions>
                <action name="column_names" type="metadata" default="RUN,PROTEIN,ABUNDANCE,NumMeasuredFeature,MissingPercentage,more50missing,NumImputedFeature,originalRUN,GROUP,GROUP_ORIGINAL,SUBJECT_ORIGINAL,SUBJECT_NESTED,SUBJECT,residuals,fitted" />
            </actions-->
        </data>
        <data name="QQPlots" format="pdf" label="${tool.name} on ${on_string}: Model QQ" from_work_dir="MSStats_group_QQPlot.pdf">
            <filter> group['group_comparison'] == 'yes' and group['comparison_plots_opt']['select_comparison_plots'] and 'QQPlots' in group['comparison_plots_opt']['select_comparison_plots']</filter>
        </data>
        <data name="ResidualPlots" format="pdf" label="${tool.name} on ${on_string}: Residual Plot" from_work_dir="MSStats_group_ResidualPlot.pdf">
            <filter> group['group_comparison'] == 'yes' and group['comparison_plots_opt']['select_comparison_plots'] and 'ResidualPlots' in group['comparison_plots_opt']['select_comparison_plots']</filter>
        </data>
        <data name="VolcanoPlot" format="pdf" label="${tool.name} on ${on_string}:Volcano Plot" from_work_dir="MSStats_group_VolcanoPlot.pdf">
            <filter> group['group_comparison'] == 'yes' and group['comparison_plots_opt']['select_comparison_plots'] and 'VolcanoPlot' in group['comparison_plots_opt']['select_comparison_plots']</filter>
        </data>
        <data name="Heatmap" format="pdf" label="${tool.name} on ${on_string}: Heatmap" from_work_dir="MSStats_group_Heatmap.pdf">
            <filter> group['group_comparison'] == 'yes' and group['comparison_plots_opt']['select_comparison_plots'] and 'Heatmap' in group['comparison_plots_opt']['select_comparison_plots']</filter>
        </data>
        <data name="ComparisonPlot" format="pdf" label="${tool.name} on ${on_string}: Comparison Plot" from_work_dir="MSStats_group_ComparisonPlot.pdf">
            <filter> group['group_comparison'] == 'yes' and group['comparison_plots_opt']['select_comparison_plots'] and 'ComparisonPlot' in group['comparison_plots_opt']['select_comparison_plots']</filter>
        </data>
    </outputs>
    <tests>
        <test expect_num_outputs="6">
            <conditional name="input">
                <param name="input_src" value="MSstats"/>
                <param name="msstats_input" ftype="csv" value="msstats_testfile.txt"/>
            </conditional>
            <param name="selected_outputs" value="raw_data,featurelevel_data,quant_sample_matrix,quant_group_long"/>
            <param name="selected_vis_outputs" value="ProfilePlot,profile_wsum_plot"/>
            <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="-.PHSHPALTPEQK_347_NA_347_NA" />
                    <has_n_columns n="15" />
                    <has_n_lines n="2071" />
                </assert_contents>
            </output>
            <output name="quant_sample_matrix">
                <assert_contents>
                    <has_text text="C2_1" />
                    <has_n_columns n="7" />
                    <has_n_lines n="7" />
                </assert_contents>
            </output>
            <output name="quant_group_long">
                <assert_contents>
                    <has_text text="LogIntensity" />
                    <has_n_columns n="3" />
                    <has_n_lines n="37" />
                </assert_contents>
            </output>
            <output name="ProfilePlot" file="MSstats ProfilePlot.pdf" compare="sim_size"/>
            <output name="profile_wsum_plot" file="profile_wsum_plot.pdf" compare="sim_size"/>
        </test>

        <test expect_num_outputs="6">
            <conditional name="input">
                <param name="input_src" value="MSstats"/>
                <param name="msstats_input" ftype="tabular" value="msstats_testfile.tsv"/>
            </conditional>
            <conditional name="group">
            <param name="group_comparison" value="yes"/>
            <param name="comparison_matrix" ftype="csv" value="comparison_matrix.csv"/>
            </conditional>
            <param name="select_outputs" value="model_qc"/>
            <param name="select_comparison_plots" value="ResidualPlots"/>
            <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="D.GPLTGTYR" />
                    <has_n_columns n="15" />
                    <has_n_lines n="2071" />
                </assert_contents>
            </output>
            <output name="model_qc">
                <assert_contents>
                    <has_text text="MissingPercentage" />
                    <has_n_columns n="13" />
                    <has_n_lines n="108" />
                </assert_contents>
            </output>
            <output name="ResidualPlots" file="residual_plot.pdf" compare="sim_size"/>
        </test>

        <test expect_num_outputs="5">
            <conditional name="input">
                <param name="input_src" value="MaxQuant"/>
                <param name="evidence" ftype="tabular" value="test_MQ_evidence.tabular"/>
                <param name="annotation" ftype="tabular" value="test_MQ_annotation.txt"/>
                <param name="proteinGroups" ftype="tabular" value="test_MQ_proteingroups.tabular"/>
            </conditional>
            <param name="selected_outputs" value="featurelevel_data,proteinlevel_data"/>
            <param name="selected_vis_outputs" value="ConditionPlot"/>
            <conditional name="group">
                <param name="group_comparison" value="yes"/>
                <param name="comparison_matrix" ftype="csv" value="test_MQ_group12_comparison_matrix.csv"/>
            </conditional>
            <param name="select_outputs" value="comparison_result"/>
            <param name="select_comparison_plots" value="QQPlots"/>
            <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="SPILVATAVAAR" />
                    <has_n_columns n="15" />
                    <has_n_lines n="61" />
                </assert_contents>
            </output>
            <output name="proteinlevel_data">
                <assert_contents>
                    <has_text text="qx017084rawthermo" />
                    <has_text text="sp|O75340|PDCD6_HUMANProgrammedcelldeathprotein6OS=HomosapiensOX=9606GN=PDCD6PE=1SV=1" />
                    <has_n_columns n="11" />
                    <has_n_lines n="13" />
                </assert_contents>
            </output>
            <output name="comparison_result">
                <assert_contents>
                    <has_text text="r2-r1" />
                    <has_n_columns n="11" />
                    <has_n_lines n="4" />
                </assert_contents>
            </output>
            <output name="ConditionPlot" file="condition_plot.pdf" compare="sim_size"/>
            <output name="QQPlots" file="qq_plot.pdf" compare="sim_size"/>
        </test>

        <test expect_num_outputs="5">
            <conditional name="input">
                <param name="input_src" value="OpenMS"/>
                <param name="openms_input" ftype="tabular" value="openms_input.tabular"/>
            </conditional>
            <param name="selected_outputs" value="featurelevel_data,proteinlevel_data"/>
            <param name="selected_vis_outputs" value="ConditionPlot"/>
            <conditional name="group">
                <param name="group_comparison" value="yes"/>
                <param name="comparison_matrix" ftype="tabular" value="openms_comparisonmatrix.tabular"/>
            </conditional>
            <param name="select_comparison_plots" value="Heatmap"/>
            <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="AAAPGIQLVAGEGFQSPLEDR_2_NA_0" />
                    <has_text text="sp|P09938|RIR2_YEAST" />
                    <has_n_columns n="15" />
                    <has_n_lines n="121" />
                </assert_contents>
            </output>
            <output name="proteinlevel_data">
                <assert_contents>
                    <has_text text="sp|P09457|ATPO_YEAST" />
                    <has_n_columns n="11" />
                    <has_n_lines n="76" />
                </assert_contents>
            </output>
            <output name="ConditionPlot" file="condition_plot_openms.pdf" compare="sim_size"/>
            <output name="Heatmap" file="Heatmap_openms.pdf" compare="sim_size"/>
        </test>

        <test expect_num_outputs="7">
            <conditional name="input">
                <param name="input_src" value="Skyline"/>
                <param name="skyline_input" ftype="csv" value="skyline_input_first100.csv"/>
                <param name="annotation" ftype="csv" value="skyline_annotations.csv"/>
                <param name="removeProtein_with1Peptide" value="TRUE"/>
            </conditional>
            <conditional name="summarize">
                 <param name="MBimpute" value="FALSE"/>
            </conditional>
            <param name="censoredInt" value="NA"/>
            <param name="selected_outputs" value="log,featurelevel_data,quant_sample_long"/>
            <param name="selected_vis_outputs" value="ProfilePlot"/>
            <param name="width" value="10"/>
            <param name="height" value="7"/>
            <param name="featureName" value="Peptide"/>
            <conditional name="group">
                <param name="group_comparison" value="yes"/>
                <param name="comparison_matrix" ftype="tabular" value="comparison_matrix_skyline.tabular"/>
            </conditional>
            <section name="comparison_plots_opt">
		        <param name="select_outputs" value="comparison_result"/>
	        <param name="select_comparison_plots" value="VolcanoPlot,ComparisonPlot"/>
	    	<section name="comparison_vis_options">
	    		<param name="FCcutoff" value="2" />
	    		<conditional name="which_Comparison">
	       	 	<param name="select" value="list"/>
	        		<param name="comparison_list" ftype="tabular" value="comparison_list_skyline.tabular"/>
	    		</conditional>
	    	</section>
            </section>
            <output name="quant_sample_long">
                <assert_contents>
                    <has_text text="P32125" />
                    <has_text text="Condition5_5" />
                    <has_n_columns n="3" />
                    <has_n_lines n="6" />
                </assert_contents>
            </output>
            <output name="log">
                <assert_contents>
                    <has_text text="3-3" />
                    <has_text text="summaryforMultipleRows: sum" />
                    <has_text text="Shared peptides are removed" />
                </assert_contents>
            </output>
             <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="ADVGFLC[+57]NMLER_2" />
                    <has_text text="319070944" />
                    <has_n_columns n="14" />
                    <has_n_lines n="46" />
                </assert_contents>
            </output>
            <output name="comparison_result">
                <assert_contents>
                    <has_text text="c1-c4" />
                    <has_text text="log2FC" />
                    <has_n_columns n="11" />
                    <has_n_lines n="4" />
                </assert_contents>
            </output>
            <output name="ProfilePlot" file="Profile_plot_skyline.pdf" compare="sim_size"/>
            <output name="VolcanoPlot" file="Volcano_plot_skyline.pdf" compare="sim_size"/>
            <output name="ComparisonPlot" file="Comparison_plot_skyline.pdf" compare="sim_size"/>
        </test>

        <test expect_num_outputs="3">
            <conditional name="input">
                <param name="input_src" value="Skyline"/>
                <param name="skyline_input" ftype="csv" value="skyline_input_first100.csv"/>
                <param name="annotation" ftype="csv" value="skyline_annotations.csv"/>
                <param name="removeProtein_with1Peptide" value="TRUE"/>
            </conditional>
            <conditional name="summarize">
                 <param name="MBimpute" value="TRUE"/>
                 <param name="featureSubset" value="highQuality"/>
                 <param name="remove_uninformative_feature_outlier" value="TRUE"/>
            </conditional>
                 <param name="censoredInt" value="0"/>
            <param name="selected_outputs" value="log,featurelevel_data,quant_sample_matrix"/>
            <output name="quant_sample_matrix">
                <assert_contents>
                    <has_text text="P32125" />
                    <has_text text="Condition5_5" />
                    <has_n_columns n="6" />
                    <has_n_lines n="2" />
                </assert_contents>
            </output>
            <output name="log">
                <assert_contents>
                    <has_text text="3-3" />
                    <has_text text="summaryforMultipleRows: sum" />
                    <has_text text="Shared peptides are removed" />
                </assert_contents>
            </output>
             <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="AFAEAMANNSFNADEK_2" />
                    <has_text text="114949068" />
                    <has_n_columns n="15" />
                    <has_n_lines n="46" />
                </assert_contents>
            </output>
        </test>

        <test expect_num_outputs="5">
            <conditional name="input">
                <param name="input_src" value="OpenSWATH"/>
                <param name="openswath_input" ftype="tabular" value="test_swath_input_data.tabular"/>
                <param name="annotation" ftype="tabular" value="test_swath_annotations.tabular"/>
            </conditional>
            <param name="selected_vis_outputs" value="QCPlot"/>
            <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="GETLGLIGFGR" />
                    <has_n_columns n="15" />
                    <has_n_lines n="253" />
                </assert_contents>
            </output>
            <output name="QCPlot" file="QC_plot.pdf" compare="sim_size"/>
        </test>

        <test expect_num_outputs="6">
            <conditional name="input">
                <param name="input_src" value="OpenSWATH"/>
                <param name="openswath_input" ftype="tabular" value="test_swath_input_data.tabular"/>
                <param name="annotation" ftype="tabular" value="test_swath_annotations.tabular"/>
            </conditional>
            <param name="selected_outputs" value="r_script,featurelevel_data,quant_sample_long"/>
            <conditional name="group">
                <param name="group_comparison" value="yes"/>
                <param name="comparison_matrix" ftype="csv" value="test_swath_group12_comparison_matrix.csv"/>
            </conditional>
            <param name="select_outputs" value="comparison_result"/>
            <param name="select_comparison_plots" value="VolcanoPlot,ResidualPlots"/>
            <output name="featurelevel_data">
                <assert_contents>
                    <has_text text="GETLGLIGFGR" />
                    <has_n_columns n="15" />
                    <has_n_lines n="253" />
                </assert_contents>
            </output>
            <output name="quant_sample_long">
                <assert_contents>
                    <has_text text="NPT_96" />
                    <has_n_columns n="3" />
                    <has_n_lines n="31" />
                </assert_contents>
            </output>
            <output name="comparison_result">
                <assert_contents>
                    <has_text text="Q5VYK3" />
                    <has_n_columns n="11" />
                    <has_n_lines n="6" />
                </assert_contents>
            </output>
            <output name="VolcanoPlot" file="volcanoplot.pdf" compare="sim_size"/>
            <output name="ResidualPlots" file="residualplot.pdf" compare="sim_size"/>
        </test>

    </tests>
    <help><![CDATA[
MSstats is an open-source R package for statistical relative quantification of proteins and peptides in global, targeted and data-independent proteomics. `More information on MSstats <http://msstats.org/>`_

The MSstats Galaxy tool (version @TOOL_VERSION@) allows the detection of differentially abundant proteins for label-free MS experiments with complex designs on data derived from open-source proteomics software available in Galaxy (e.g. MaxQuant, OpenMS, OpenSWATH). Processing functionalities such as log transformation, normalization, feature selection, missing value imputation and quantification are available as well.

-----

**Input data**

- Data in tabular or csv format, either in the 10-column MSstats format or the outputs of spectral processing tools such as `MaxQuant <https://cox-labs.github.io/coxdocs/maxquant_instructions.html>`_, `OpenSWATH <http://openswath.org/en/latest/>`_

    - MSstats format: tabular file with 10 column either manually curated or other sources such as Swath2stats tool which is implemented in Pyprophet export in Galaxy. For manual curation: Names of headers are fixed but not case sensitive:

        - ProteinName: protein ID or peptide ID for peptide-level modeling and analysis; statistical analysis will be done separately for each unique label in this column
        - PeptideSequence: Amino acid sequence for each peptide. If the peptide sequences should be distinguished based on post-translational modifications, this column can be renamed to PeptideModifiedSequence.
        - PrecursorCharge: charge state of precursor.
        - FragmentIon: e.g. b4, y3, if unknown use a single value for all entries.
        - ProductCharge: charge state of product. If unknown use 0 for all entries.
        - IsotopeLabelType: This column indicates whether this measurement is based on the endogenous peptides (use “L”) or labeled reference peptides (use “H”).
        - Condition: For group comparison experiments, this column indicates groups of interest (such as “Disease” or “Control”). The name of the condition is not allowed to start with a number or contain any special characters. For time-course experiments, this column indicates time points (such as “T1”, “T2”, etc). If the experimental design contains both distinct groups of subjects and multiple time points per subject, this column should indicate a combination of these values (such as “Disease_T1”, “Disease_T2”, “Control_T1”, “Control_T2”, etc.).
        - BioReplicate:  This column should contain a unique identifier for each biological replicate in the experiment. For example, in a clinical proteomic investigation this should be a unique patient id. Patients from distinct groups should have distinct ids. MSstats does not require the presence of technical replicates in the experiment. If the technical replicates are present, all samples or runs from a same biological replicate should have a same id. MSstats automatically detects the presence of technical replicates and accounts for them in the model-based analysis.
        - Run: This column contains the identifier of a mass spectrometry run. Each mass spectrometry run should have a unique identifier, regardless of the origin of the biological sample. In SRM experiments, if all the transitions of a biological or a technical replicate are split into multiple “methods” due to the technical limitations, each method should have a separate identifier. When processed by Skyline, distinct values of runs correspond to distinct input file names. It is possible to use the actual input file names as values in the column Run.
        - Intensity: This column should contain the quantified signal of a feature in a run without any transformation (in particular, no logarithm transform). The signals can be quantified as the peak height or the peak of area under curve. Any other quantitative representation of abundance can also be used.
        - Example file header:
          ::

           proteinname    peptidesequence  precursorcharge  fragmention   productcharge
             P02768          DLGEENFK            3               y7             0
             P02768          DLGEENFK            3               y8             0
             P02768         ETYGEMADCCAK         2               b3             0
             P02768         ETYGEMADCCAK         2               b4             0
              ...              ...              ...              ...           ...

                 isotopelabeltype    condition     bioreplicate    run    intensity
                       L              disease          ReplA        1      4298.12
                       H              disease          ReplA        1      1974.59
                       L              disease          ReplA        1      7183.22
                       H              disease          ReplA        1      8467.58
                      ...               ...             ...        ...      ...

    - MaxQuant format: evidence.txt, proteinGroups.txt; plus externally generated annotation file
    - OpenSWATH format: pyprophet export file;  plus externally generated annotation file

- Annotations as tabular file are needed for all input options except MSstats format

    - 4 columns with exactly these headers: Raw.file, Condition, BioReplicate, Run; additional 5th column only for MaxQuant: IsotopeLabelType
    - Example file header:

          ::

           Raw.file         Condition      BioReplicate    Run   IsotopeLabelType
             **              disease           ReplA         1         L
             **              disease           ReplA         2         L
             **              disease           ReplB         3         L
             **              disease           ReplB         4         L
             ...               ...              ...         ...      ...


        - Raw.file:

            - OpenSWATH: File name needs to fit exactly how it is written in OpenSwatch output (e.g. "in/AA12_mzML.mzML")
            - MaxQuant: File name needs to fit exactly how it is writtein in the evidence.txt "Raw file" column. (e.g. "file1.raw.thermo")
        - Condition: The name of the condition is not allowed to start with a number or contain any special characters
        - All other columns: see description above for MSstats format columns

- Comparison matrix as tabular file

    - 1st column: name of comparison
    - Additionally one column for each condition that is present in the tabular file. Use 1 and -1 to indicate the conditions to compare and 0 for conditions that are not compared. Multiple groups can be combined by using 0.5.
    - First row contains the names of the groups, they must exactly match the condition name used in the annotation file and every condition must be present, even though it will not be used for any comparison such as G4 in the example below. Order of the condition columns is irrelevant.
    - Each additional row represents one comparison
    - Example for a two group comparison

       ::

               names     groupA  groupB
          groupA-groupB    1      -1


    - Example for an experiment with 5 groups and 4 different comparisons

       ::

          names    G1   G2   G3   G4   G5
          G2-G1    -1    1    0    0    0
          G3-G5     0    0    1    0   -1
          G3-G5     0    0   -1    0    1
        G1+G2-G5    0.5  0.5  0    0   -1

**Options**

- Data conversion from MaxQuant and OpenSWATH to MSstats format:

    - MaxQuant input: Contaminants and reverse and only identified by site from MaxQuant tool are automatically removed during conversion

- Data processing options:

    - Log transformation: log2 or log10 transformation of intensities
    - Normalization of MS runs: If there are multiple fractionations or injections for one sample, normalization is performed by each fractionation or different m/z range from multiple injections.

        - equalizeMedians: The default option for normalization is equalizeMedians, where all intensities in a run are shifted by a constant, to equalize the median of intensities across runs for label-free experiment. This normalization method is appropriate when we can assume that the majority of proteins do not change across runs. Be cautious when using the equalizeMedians option for a label-free DDA dataset with only a small number of proteins. For label based experiment, equalizeMedians equalizes the median of reference intensities across runs and is generally proper even for a dataset with a small number of proteins.
        - globalStandards:  If you have a spiked in standard, you may set this option to define the standard with name Standardsoption.
        - quantile: The distribution of all the intensities in each run will become the same across runs for label-free experiment. For label-based experiment, the distribution of all the reference intensities will become the same across runs and all the endogenous intensities are shifted by a constant corresponding to reference intensities.
        - FALSE: No normalization is performed. If you had your own normalization before MSstats use this option.

    - Feature selection

        - all: Use all features in the dataset.
        - top3: Use top 3 features which have highest average of log(intensity) across runs.
        - topN: Use top N (specify number) features which have highest average of log(intensity) across runs.
        - highQuality: Detect and flag uninformative features (as Uninformative in  the feature_quality column) and outliers (as TRUE in the is_outliercolumn). These uninformative content may be excluded  from run-level summarization by setting the remove features flagged with uninformative feature quality option to TRUE.

    - Summarizing intensities per MS run

        - TMP: Tukey’s median polish.  Robust parameter estimation method with median across rows and columns. Prerequisite for missing value imputation.
        - linear: Linear model (lmfunction). Average-based summarization.

        	- Account for heterogeneous variation among intensities from different features: Yes: assumes equal variance among intensities from features. No: means that we cannot assume equal variance among intensities from features, then we will account for heterogeneous variation from different features

    - Missing value imputation:

        - Impute Missing Values: Only possible for Summarization Method TMP. Censored missing values will be determined and imputed by Accelerated Failure Time model.

        - Remove runs which have more than 50% missing values: Yes or no.
        - Censored Intensity: The processing tools report missing values differently. This option is for distinguishwhich value should be considered as missing, and further whether it is censored or at random

            - NA - It assumes that all NAs in Intensity column are censored.
            - 0 - It assumes that all values between 0 and 1 in Intensity column are censored. If there areNAs inIntensitywith this option, NAs will be considered as random missing.
            - Skyline and OpenSWATH input should use '0'. MaxQuant input should use 'NA'

- Group comparison: automatic detection of differentially abundant proteins between two conditions, conditions have to be specified with the 'comparison matrix'
- Quantification per sample or group: choose the corresponding output option

    - Sample: relative protein abundance in each biological replicate. If there are technical replicates for biological replicates,sample quantification will be the median among technical replicates. If there is no technical replicate for biological replicate (sample), sample quantification will be the same as run-level summarization.
    - Group: relative protein abundance in each condition, summarized over the biological replicates (median among sample quantification). In presence of completely missing values in a condition, the estimates will be zero


**Output options**

- Different outputs available. Especially for studies with many proteins, it is suggested to select only the necessary pdf outputs as many of them generate one plot per protein.

    - MSstats log - check log file for warnings and information on the analysis steps (txt)
    - MSstats Rscript - can be used to re-run analysis outside Galaxy or to inspect the executed code (txt)
    - MSstats RawData - raw files combined into MSstats format (tabular)
    - MSstats FeatureLevelData - transformed, normalized, imputed intensities (tabular)

        - Intensity column:  includes original intensities values
        - Abundance column:  contains the log2 transformed and normalized intensities and it will used for run-level summarization
        - Censored column:  has the decision about censored missing or not, based on censored Intensity and maximum quantile for deciding censored missing values options. Abundances with TRUE value in censored column will be considered as censored missing and imputed when Missing value imputation: Yes.

    - MSstats ProteinLevelData - run and protein level summarized data (tabular)

        - LogIntensities: log intensity summarized per run and protein, they will be used for the group comparison and summarized profile plot
        - NumMeasuredFeature: shows how many features were used for summarization of the corresponding run and protein
        - MissingPercentage: percentage of random and censoredmissing in the corresponding run and protein out of the total number of feature in the corresponding protein.
        - more50missing: whether MissingPercentage is greater than 50% or not
        - NumImputedFeatures: how many features were imputed in the corresponding run and protein

    - MSstats QCPlot - log2 intensity boxplot for all proteins and run on first page, followed by one boxplot per protein (pdf)
    - MSstats ProfilePlot - log2 intensity profiles one plot per protein and run (pdf)

        -  Profile plot helps identify potential sources of variation (both variation of interest and nuisance variation) for each protein: show individual measurements for each peptide (peptide for DDA, transition for SRM orDIA) across runs, grouped per condition. Each peptide has a different color/type layout. Disconnected linesshow that there are missing value (NA).

    - MSstats ProfilePlot_wSummarization  - log2 intensity profiles one plot per protein and run with run summarization (pdf)

        - Run-level summarized data per protein. The same peptides (or transition) in the first plot are presented in grey, with the summarized values overlaid in red.

    - MSstats ConditionPlot - log2 intensity range for each protein and condition (pdf)

        - Visualizes potential systematic differences in protein intensities between conditions. Dots indicate the mean of log2 intensities for each condition, error bars indicate the confidence interval with 0.95 significant level for each condition. The intervals are for descriptive purposes only.

    - Sample Quantification Matrix/Long Table  - relative protein abundance in each biological replicate in matrix (rows are proteins, and columns are combinations of biological replicate and group, filled with LogIntensities) or long format (row corresponding to relative protein abundances, and columns are Protein, Group, BioReplicate, LogIntensities) (tabular)

        - If there are technical replicates for biological replicates, sample quantification will be the median among technical replicates. If there is no technical replicate for biological replicate (sample), sample quantification will be the same as run-level summarization. In presence of completely missing values in a biological replicate, the estimates will be zero.

    - Group Quantification Matrix/Long Tableuant_group_matrix - relative protein abundance in each condition in matrix (rows are proteins, and columns are groups) or long format (row corresponding to relative protein abundances, and columns are Protein,  Group and LogIntensities) (tabular)

        - Outputs the estimates of relative protein abundance in each condition, summarized over the biological replicates (median among sample quantification). In presence of completely missing values in a condition, the estimates will be zero.

    - MSstats ComparisonFittedModel (txt)
    - MSstats ComparisonResult - summary of statistical results per protein and comparison (tabular)

        - Label: name of the comparison (e.g. condition1 - condition2)
        - log2FC: log2 fold change for the given comparison name, e.g. condition1-condition2: positive values mean more abundant in condition1, negative values mean more abundant in condition2
        - SE: standard error of the log2 fold change
        - Tvalue: test statistic of the Student test
        - DF: degree of freedom of the Student test
        - pvalue: raw p-values
        - adj. pvalue: adjusted p-values among all the proteins in the specific comparison
        - issue: shows if there is any issue for inference in corresponding protein and comparison,for example,OneConditionMissing or CompleteMissing. If one of condition for compariosn is completely missing, it would flag with OneConditionMissing with adj.pvalue=0 and log2FC=Inf or -Inf even though pvalue=NA. For example, if you want to compare ‘condition1-condition2’, but condition2 has complete missing, log2FC=Inf and adj.pvalue=0. SE,Tvalue, and pvalue will be NA. If you want to compare ‘conditions - condition2’, but condition1 has complete missing, then log2FC=-Inf and adj.pvalue=0. But, please be careful for using this log2FC and adj.pvalue.

    - MSstats ModelQC - summary statistics per run and protein (tabular)

    - MSstats QQPlot - one QQplot per protein (pdf)

        - Normal quantile-quantile plots for each protein, taking as input the results of model fitting and testing in groupComparison. Only large deviations of transition intensities from the straight line are problematic and indicate that the assumption of the normal distribution of the measurement errors may not hold.

    - MSstats ResiudalPlot - one residual plot per protein (pdf)

        - Residual plot shows variance of the residuals that is associated with the mean feature intensity. Any specific pattern, such as increasing or decreasing by predicted abundance, is problematic and indicates that the assumption of constant variance of the measurement error may not hold.

    - MSstats VolcanoPlot - one volcano plot per comparison (pdf)

        -  Visualizes the outcome of one comparison between conditions for all the proteins, and combine the information on statistical and practical significance. The y-axis displays the FDR-adjusted p-values on the negative log10 scale, representing statistical significance. The horizontal dashed line shows the FDR cutoff. The points above the FDR cutoff line are statistically significant proteins that are differentially abundant across conditions. These points are colored in red and blue for upregulated and downregulated proteins, respectively. The x-axis is the model-based estimate of fold change on log scale and represents practical significance. It is possible to specify a practical significance cutoff based on the estimate of fold change in addition to the statistical significance cutoff. If the fold change cutoff is specified, the points above the horizontal cutoff line but within the vertical cutoff line will be considered as not differentially abundant (and will be colored in black).

    - MSstats Heatmap - needs at least 2 comparisons, one heatmap for all proteins and comparisons (pdf)

        - Illustrates the patterns of up- and down-regulation of proteins in several comparisons. Columns in the heatmaps are comparison of conditions assigned in contrast matrix, and rows are proteins. The heatmaps display signed FDR-adjusted p-values of the tests, colored in red/blue for significantly up-/down-regulated proteins, while taking into account the specified FDR cutoff and the additional optional fold change cutoff. Brighter colors indicate stronger evidence in favor of differential abundance. Black color represents proteins that are not significantly differentially abundant.

    - MSstats ComparisonPlot - log2 intensity range for each protein and comparison (pdf)

        - Illustrates model-based estimates of log-fold changes, and the associated uncertainty, in several comparisons of conditions for one protein. X-axis is the comparison of interest. Y-axis is the log fold change. The dots are the model-based estimates of log-fold change, and the error bars are the model-based 95% confidence intervals. For simplicity, the confidence intervals are adjusted for multiple comparisons within protein only, using the Bonferroni approach. For proteins with N comparisons, the individual confidence intervals are at the level of 1-sig/N.

For additional help please visit the `MSstats documentation <http://msstats.org/msstats-2/>`_


    ]]></help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btu305</citation>
        <citation type="doi">10.1021/acs.jproteome.2c00051</citation>
    </citations>
</tool>
author	galaxyp
date	Tue, 12 Mar 2024 11:46:51 +0000
parents	b7034eff0db1
children