Mercurial > repos > malex > secimtools

<tool id="secimtools_partial_least_squares" name="Partial Least Squares Discriminant Analysis (PLS-DA)" version="@WRAPPER_VERSION@">
    <description></description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="exit_code"><![CDATA[
partial_least_squares.py
--input $input
--design $design
--ID $uniqID
--group $group
--toCompare "$toCompare"
--cross_validation $cross_validation
--nComp $nComp
--outScores $outScores
--outWeights $outWeights
--outClassification $outClassification
--outClassificationAccuracy $outClassificationAccuracy
--figure $figures
    ]]></command>
    <inputs>
        <param name="input" type="data" format="tabular" label="Wide Dataset" help="Input your tab-separated wide format dataset. If file is not tab separated see TIP below."/>
        <param name="design" type="data" format="tabular" label="Design File" help="Input your design file (tab-separated). Note you need a 'sampleID' column. If not tab separated see TIP below."/>
        <param name="uniqID" type="text" size="30" value="" label="Unique Feature ID" help="Name of the column in your wide dataset that has unique identifiers."/>
        <param name="group" type="text" size="30" label="Group/Treatment" help="Name of the column in your design file that contains group classifications."/>
        <param name="toCompare" type="text" size="30" label="Names of the Groups to Compare" help="Names of the two groups to compare. The user should insure that group names do not contain commas. The separator for the two groups should only include commas (no spaces)."/>
        <param name="cross_validation" type="select" size="30" display="radio" value="double" label="Cross-Validation Options">
            <option value="none">None</option>
            <option value="single">Single</option>
            <option value="double">Double</option>
        </param>
        <param name="nComp" type="text" size="30" value="2" label="Number of Components" help="Number of components for the analysis to use (default = 2). This field is used only when the cross validation field is set to none."/>
    </inputs>
    <outputs>
        <data format="tabular" name="outScores" label="${tool.name} on ${on_string}: Scores"/>
        <data format="tabular" name="outWeights" label="${tool.name} on ${on_string}: Weights"/>
        <data format="tabular" name="outClassification" label="${tool.name} on ${on_string}: Classification of Samples"/>
        <data format='tabular' name="outClassificationAccuracy" label="${tool.name} on ${on_string}: Classification Accuracy of Samples"/>
        <data format="pdf" name="figures" label="${tool.name} on ${on_string}: Scatter Plots"/>
    </outputs>
    <tests>
        <test>
            <param name="input"             value="ST000006_data.tsv"/>
            <param name="design"            value="ST000006_design_group_name_underscore.tsv"/>
            <param name="uniqID"            value="Retention_Index" />
            <param name="group"             value="White_wine_type_and_source" />
            <param name="toCompare"         value="Chardonnay_ Napa_ CA 2003,Riesling_ CA 2004" />
            <param name="cross_validation"  value="none"/>
            <param name="nComp"             value="2"/>
            <output name="outScores"                 file="ST000006_partial_least_squares_none_scores.tsv" />
            <output name="outWeights"                file="ST000006_partial_least_squares_none_weights.tsv" />
            <output name="outClassification"         file="ST000006_partial_least_squares_none_classification.tsv" />
            <output name="outClassificationAccuracy" file="ST000006_partial_least_squares_none_classification_accuracy.tsv" />
            <output name="figures"                   file="ST000006_partial_least_squares_none_figure.pdf" compare="sim_size" delta="10000"/>
        </test>
    </tests>
    <help><![CDATA[

@TIP_AND_WARNING@

**Tool Description**

The tool performs partial least square discriminant analysis (PLS-DA) for two treatment groups selected by the user.

**NOTE: A minimum of 100 samples is required by the tool for single or double cross validation**

The subspace dimension defines the number of components that will be used to describe the variability within the data.
The user can specify subspace dimension in the range of two to the sample number.
The user has the option to specify the dimension of the subspace directly (Default =2) or to perform single or double cross-validation to determine the dimension of the subspace.

For single and double cross-validation: the data set is split differently when the model fit is performed.

For double cross-validation: the data set is split into pieces and the model fit is performed on one piece using cross-validation and evaluated on the other pieces.

For single cross-validation: the same data are used to fit the model and to evaluate the model using three-fold cross validation.

More details can be found in:

Geladi, Paul and Bruce R. Kowalski. "Partial least-squares regression: a tutorial." Analytica chimica acta 185 (1986): 1-17.


--------------------------------------------------------------------------------

**Note**

- This tool currently treats all variables as continuous numeric
  variables. Running the tool on categorical variables may result in
  incorrect results.
- Rows containing non-numeric (or missing) data in any
  of the chosen columns will be skipped from the analysis.

--------------------------------------------------------------------------------

**Input**

    - Two input datasets are required.

@WIDE@

**NOTE:** The sample IDs must match the sample IDs in the Design File (below).
Extra columns will automatically be ignored.


@METADATA@

@UNIQID@

@GROUP@


**Names of the Groups to Compare**

    - Comma separated names of the two groups in your Group/Treatment column that you want to compare. The user should ensure that group names do not contain commas. The separator for the two groups should only include commas (no spaces).

**Cross-Validation Options**

    - The choice of cross-validation options available for the user. None corresponds to no cross-validation when the user specifies the number of components manually.

**Number of Components**

    - The parameter is used only when the "None" cross-validation option is selected. If the field is left blank, the number of components is set to the default value (2).

--------------------------------------------------------------------------------

**Output**


Three different files are generated:

(1) a TSV file containing the scores produced by the model for each sample

(2) a TSV file containing the weights produced by the model for each feature.

(3) a TSV file containing the classification produced by the model for each sample.

(4) a TSV file containing the algorithm classification accuracy (by percentage).

(5) a PDF file containing the 2D plots for all pairwise comparisons of components between the two treatment groups.

**NOTE:** Regardless how many components are selected for the algorithm, pairwise 2D plots are produced for the pairs of components.
Increasing the number of components will increase the number of plots produced.

    ]]></help>
    <expand macro="citations"/>
</tool>
author	malex
date	Thu, 10 Jun 2021 15:41:17 +0000
parents	2e7d47c0b027
children