comparison lasso_enet_var_select.xml @ 1:2e7d47c0b027 draft

"planemo upload for repository https://malex@toolshed.g2.bx.psu.edu/repos/malex/secimtools"
author malex
date Mon, 08 Mar 2021 22:04:06 +0000
parents
children
comparison
equal deleted inserted replaced
0:b54326490b4d 1:2e7d47c0b027
1 <tool id="secimtools_lasso_enet_var_select" name="LASSO/Elastic Net Variable Selection," version="@WRAPPER_VERSION@">
2 <description>for feature selection.</description>
3 <macros>
4 <import>macros.xml</import>
5 </macros>
6 <expand macro="requirements" />
7 <stdio>
8 <exit_code range="1:" level="warning" description="RuntimeWarning"/>
9 </stdio>
10 <command><![CDATA[
11 lasso_enet_var_select.py
12 --input $input
13 --design $design
14 --ID $uniqID
15 --group $group
16 --alpha $alpha
17 --coefficients $coefficients
18 --flags $flags
19 --plots $plots
20 ]]></command>
21 <inputs>
22 <param name="input" type="data" format="tabular" label="Wide Dataset" help="Input your tab-separated wide format dataset. If file is not tab separated see TIP below."/>
23 <param name="design" type="data" format="tabular" label="Design File" help="Input your design file (tab-separated). Note you need a 'sampleID' column. If not tab separated see TIP below."/>
24 <param name="uniqID" type="text" size="30" value="" label="Unique Feature ID" help="Name of the column in your wide dataset that has unique identifiers.."/>
25 <param name="group" type="text" size="30" label="Group/Treatment." help="Name of the column in your design file that contains group classifications."/>
26 <param name="alpha" type="text" value=".5" size="30" label="shrinkage parameter α" help="Shrinkage parameter α specifies the penalty for the LASSO/Elastic Net procedure. Default 0.5"/>
27 </inputs>
28 <outputs>
29 <data format="tabular" name="coefficients" label="${tool.name} on ${on_string}: Coefficients"/>
30 <data format="tabular" name="flags" label="${tool.name} on ${on_string}: Flags"/>
31 <data format="pdf" name="plots" label="${tool.name} on ${on_string}: Plots"/>
32 </outputs>
33 <tests>
34 <test>
35 <param name="input" value="ST000006_data.tsv"/>
36 <param name="design" value="ST000006_design.tsv"/>
37 <param name="uniqID" value="Retention_Index" />
38 <param name="group" value="White_wine_type_and_source" />
39 <param name="alpha" value="0.5" />
40 <output name="coefficients" file="ST000006_lasso_enet_var_select_coefficients.tsv" />
41 <output name="flags" file="ST000006_lasso_enet_var_select_flags.tsv" />
42 <output name="plots" file="ST000006_lasso_enet_var_select_plots.pdf" compare="sim_size" delta="10000" />
43 </test>
44 </tests>
45 <help><![CDATA[
46
47 @TIP_AND_WARNING@
48
49 **Tool Description**
50
51 The tool selects (identifies) features that are different between pairs of treatment groups.
52 The selection is performed based on the logistic regression with Elastic Net shrinkage (with LASSO being a special case).
53 The selection method is defined by shrinkage parameter α.
54 Variable selection can be performed for any value of α in the range [0:1] where α = 1 corresponds to the fewest number of variables and the most strict selection criterion (LASSO) and α = 0 corresponds to shrinkage without variable selection (Ridge regression). The default value is α = 0.5.
55 The best subset of variables for a given α is selected by a cross validation procedure.
56 Lambda is a penalty parameter determined during the cross validation procedure.
57
58 More details about the Elastic Net and LASSO methods can be found in the reference below:
59
60 Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
61
62 Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society. Series B (Methodological) (1996): 267-288.
63
64 Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. "Regularization paths for generalized linear models via coordinate descent." Journal of statistical software 33, no. 1 (2010): 1.
65
66 --------------------------------------------------------------------------------
67
68 **Note**
69
70 - This tool currently treats all variables as continuous numeric
71 variables. Running the tool on categorical variables might result in
72 incorrect results.
73 - Rows containing non-numeric (or missing) data will be excluded.
74
75 --------------------------------------------------------------------------------
76
77 **Input**
78
79 - Two input datasets are required.
80
81 @WIDE@
82
83 **NOTE:** The sample IDs must match the sample IDs in the Design File (below).
84 Extra columns will automatically be ignored.
85
86 @METADATA@
87
88 @UNIQID@
89
90 @GROUP@
91
92 **shrinkage parameter α**
93
94 - Specifies the penalty for the LASSO/Elastic Net procedure. Default = 0.5
95
96 --------------------------------------------------------------------------------
97
98 **Output**
99
100 This file outputs three files:
101
102 (1) A TSV file containing the values of the coefficients (including zeroes) for each feature generated by the tool for each pair of comparisons (in columns). These coefficients are produced from the transformed data (as part of the LASSO/EN method) and should be interpreted with caution.
103
104 (2) A TSV file containing the corresponding flags for each feature where the value “1” corresponds to features selected by the method.
105
106 (3) A PDF file containing graphs for each pairwise comparison between the groups. The first graph displays the behavior of the coefficients based on the value of penalty parameter lambda and the shrinkage parameter α. The second graph provides details of cross-validation procedure used for detection of the optimal penalty and for feature selection.
107
108 ]]></help>
109 <expand macro="citations"/>
110 </tool>