0
|
1 <tool id="BestSubsetsRegression1" name="Perform Best-subsets Regression" version="0.0.1">
|
|
2 <description> </description>
|
1
|
3 <requirements>
|
|
4 <requirement type="package" version="1.7.1">numpy</requirement>
|
|
5 <requirement type="package" version="1.0.3">rpy</requirement>
|
|
6 </requirements>
|
0
|
7 <command interpreter="python">
|
|
8 best_regression_subsets.py
|
|
9 $input1
|
|
10 $response_col
|
|
11 $predictor_cols
|
|
12 $out_file1
|
|
13 $out_file2
|
|
14 1>/dev/null
|
|
15 2>/dev/null
|
|
16 </command>
|
|
17 <inputs>
|
|
18 <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
|
|
19 <param name="response_col" label="Response column (Y)" type="data_column" data_ref="input1" />
|
|
20 <param name="predictor_cols" label="Predictor columns (X)" type="data_column" data_ref="input1" multiple="true" >
|
|
21 <validator type="no_options" message="Please select at least one column."/>
|
|
22 </param>
|
|
23 </inputs>
|
|
24 <outputs>
|
|
25 <data format="input" name="out_file1" metadata_source="input1" />
|
|
26 <data format="pdf" name="out_file2" />
|
|
27 </outputs>
|
|
28 <tests>
|
|
29 <!-- Testing this tool will not be possible because this tool produces a pdf output file.
|
|
30 -->
|
|
31 </tests>
|
|
32 <help>
|
|
33
|
|
34 .. class:: infomark
|
|
35
|
|
36 **TIP:** If your data is not TAB delimited, use *Edit Datasets->Convert characters*
|
|
37
|
|
38 -----
|
|
39
|
|
40 .. class:: infomark
|
|
41
|
|
42 **What it does**
|
|
43
|
|
44 This tool uses the 'regsubsets' function from R statistical package for regression subset selection. It outputs two files, one containing a table with the best subsets and the corresponding summary statistics, and the other containing the graphical representation of the results.
|
|
45
|
|
46 -----
|
|
47
|
|
48 .. class:: warningmark
|
|
49
|
|
50 **Note**
|
|
51
|
|
52 - This tool currently treats all predictor and response variables as continuous variables.
|
|
53
|
|
54 - Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.
|
|
55
|
|
56 - The 6 columns in the output are described below:
|
|
57
|
|
58 - Column 1 (Vars): denotes the number of variables in the model
|
|
59 - Column 2 ([c2 c3 c4...]): represents a list of the user-selected predictor variables (full model). An asterix denotes the presence of the corresponding predictor variable in the selected model.
|
|
60 - Column 3 (R-sq): the fraction of variance explained by the model
|
|
61 - Column 4 (Adj. R-sq): the above R-squared statistic adjusted, penalizing for higher number of predictors (p)
|
|
62 - Column 5 (Cp): Mallow's Cp statistics
|
|
63 - Column 6 (bic): Bayesian Information Criterion.
|
|
64
|
|
65
|
|
66 </help>
|
|
67 </tool>
|