view linear_discriminant_analysis.xml @ 2:caba07f41453 draft default tip

"planemo upload for repository https://github.com/secimTools/SECIMTools/tree/main/galaxy commit 498abad641099412df56f04ff6e144e4193bbc34-dirty"
author malex
date Thu, 10 Jun 2021 15:41:17 +0000
parents 2e7d47c0b027
children
line wrap: on
line source

<tool id="secimtools_linear_discriminant_analysis" name="Linear Discriminant Analysis (LDA)" version="@WRAPPER_VERSION@">
    <description>.</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="exit_code"><![CDATA[
linear_discriminant_analysis.py
--input $input
--design $design
--ID $uniqID
--group $group
--cross_validation $cross_validation
--outClassification $outClassification
--outClassificationAccuracy $outClassificationAccuracy
--nComponents $nComponents
--out $out
--figure $figure
    ]]></command>
    <inputs>
        <param name="input" type="data" format="tabular" label="Wide Dataset" help="Input your tab-separated wide format dataset. If file is not tab separated see TIP below."/>
        <param name="design" type="data" format="tabular" label="Design File" help="Input your design file (tab-separated). Note you need a 'sampleID' column. If not tab separated see TIP below."/>
        <param name="uniqID" type="text" size="30" value="" label="Unique Feature ID" help="Name of the column in your wide dataset that has unique identifiers.."/>
        <param name="group" type="text" size="30" value="" label="Group/Treatment" help="Name of the column in your design file that contains group classifications."/>
        <param name="cross_validation" type="select" size="30" display="radio" value="double" label="Cross-Validation Choice - NOTE: a minimum of 100 samples is required for single or nested cross validation">
            <option value="none">None</option>
            <option value="single">Single</option>
            <option value="double">Double</option>
        </param>
        <param name="nComponents" type="text" size="30" value="2" label="Number of Components" help="Enter the number of components to use in the analysis. This value should be less than the number of groups and is used only when the cross-validation options field is set to 'none'."/>
    </inputs>
    <outputs>
        <data format="tabular" name="out" label="${tool.name} on ${on_string}: Components"/>
        <data format="tabular" name="outClassification" label="${tool.name} on ${on_string}: Classification of Samples"/>
        <data format='tabular' name="outClassificationAccuracy" label="${tool.name} on ${on_string}: Classification Accuracy of Samples"/>
        <data format="pdf" name="figure" label="${tool.name} on ${on_string}: Scatter Plots"/>
    </outputs>
    <tests>
        <test>
            <param name="input"            value="ST000006_data.tsv"/>
            <param name="design"           value="ST000006_design.tsv"/>
            <param name="uniqID"           value="Retention_Index" />
            <param name="group"            value="White_wine_type_and_source" />
            <param name="cross_validation" value="none"/>
            <param name="nComponents"      value="2"/>
            <output name="out"                       file="ST000006_linear_discriminant_analysis_none_scores.tsv" />
            <output name="outClassification"         file="ST000006_linear_discriminant_analysis_none_classification.tsv" />
            <output name="outClassificationAccuracy" file="ST000006_linear_discriminant_analysis_none_classification_accuracy.tsv" />
            <output name="figure"                    file="ST000006_linear_discriminant_analysis_none_figure.pdf" compare="sim_size" delta="10000"/>
        </test>
    </tests>
    <help><![CDATA[

@TIP_AND_WARNING@

**Tool Description**


The tool performs linear discriminant analysis (LDA) on the data.

***NOTE: A minimum of 100 samples is required by the tool for single or double cross validation***

LDA is a supervised method based on the projection of data in the linear subspace to achieve maximum separation between samples in different groups and minimum separation between samples within groups.  The subspace dimension defines the number of components used to describe the variability within the data.
Due to the LDA method specification, the subspace dimension must be less than the number of treatment groups. The user has an option to specify the dimension of the subspace directly (default = 2) or to perform single or double cross-validation to determine the dimension of the subspace.  For single and double cross-validation, the dataset is split when model fit is performed. For double cross-validation, the data set is split into pieces and the model fit is performed on one piece using cross-validation and evaluated on the other pieces. For single cross-validation, the data are used to both fit and evaluate the model using a three-fold cross validation.

Visual summaries are provided in the form of a 2D plot where samples are colored by group and plotted along the determined subspace components pairwise.

More details about the method are available via:

Trevor J.. Hastie, Tibshirani, R. J., and Friedman, J. H. (2011). The elements of statistical learning: data mining, inference, and prediction. Springer. p106-119


--------------------------------------------------------------------------------

**Note**

- This tool currently treats all variables as continuous numeric variables. Running the tool on categorical variables may result in incorrect results.
- Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.

--------------------------------------------------------------------------------

**Input**

    - Two input datasets are required.


@WIDE@

**NOTE:** The sample IDs must match the sample IDs in the Design File
(below). Extra columns will automatically be ignored.

@METADATA@

@UNIQID@

@GROUP@

**Cross-Validation Choice**

    - The choice of cross-validation options available for the user. None corresponds to no cross-validation where the user specifies the number of components manually. ***The tool requires a minimum of 100 samples***.


**Number of Components**

    - This parameter is used only when the "None" cross-validation option is selected. If the field is left blank, the number of components is set to the default value (2).


--------------------------------------------------------------------------------

**Output**

This tool outputs:

(1) TSV file containing the components produced by the model for each sample.
Component_{i}: contains the score values for each sample. The number of levels {i} is specified in the Number of components text box or determined via cross validation.

(2) TSV file containing the sample classifications produced by the model.
Group_Observed: Initial group labels.
Group_Predicted: Predicted group labels.

(3) TSV file containing the classification accuracy (in percent) of the algorithm with respect to the number of correctly classified samples.

(4) A PDF file containing 2D plots for all pairwise comparisons of components.  Colored by treatment group.


    ]]></help>
    <expand macro="citations"/>
</tool>