annotate logistic_regression_vif.xml @ 0:bd196d7c1ca9 draft default tip

Imported from capsule None
author devteam
date Tue, 01 Apr 2014 10:51:26 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
1 <tool id="LogisticRegression" name="Perform Logistic Regression with vif" version="1.0.1">
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
2 <description> </description>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
3 <requirements>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
4 <requirement type="package" version="1.7.1">numpy</requirement>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
5 <requirement type="package" version="1.0.3">rpy</requirement>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
6 <requirement type="package" version="2.11.0">R</requirement>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
7 </requirements>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
8 <command interpreter="python">
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
9 logistic_regression_vif.py
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
10 $input1
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
11 $response_col
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
12 $predictor_cols
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
13 $out_file1
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
14 1>/dev/null
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
15 </command>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
16 <inputs>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
17 <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
18 <param name="response_col" label="Response column (Y)" type="data_column" data_ref="input1" numerical="True"/>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
19 <param name="predictor_cols" label="Predictor columns (X)" type="data_column" data_ref="input1" numerical="True" multiple="true" >
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
20 <validator type="no_options" message="Please select at least one column."/>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
21 </param>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
22 </inputs>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
23 <outputs>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
24 <data format="input" name="out_file1" metadata_source="input1" />
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
25
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
26 </outputs>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
27 <requirements>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
28 <requirement type="python-module">rpy</requirement>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
29 </requirements>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
30 <tests>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
31 <test>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
32 <param name="input1" value="logreg_inp.tabular"/>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
33 <param name="response_col" value="4"/>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
34 <param name="predictor_cols" value="1,2,3"/>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
35 <output name="out_file1" file="logreg_out2.tabular"/>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
36
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
37 </test>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
38 </tests>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
39 <help>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
40
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
41
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
42 .. class:: infomark
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
43
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
44 **TIP:** If your data is not TAB delimited, use *Edit Datasets-&gt;Convert characters*
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
45
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
46 -----
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
47
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
48 .. class:: infomark
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
49
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
50 **What it does**
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
51
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
52 This tool uses the **'glm'** function from R statistical package to perform logistic regression on the input data. It outputs one file containing the summary statistics of the performed regression. Also, it calculates VIF(Variance Inflation Factor) with **'vif'** function from library (car) in R.
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
53
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
54
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
55 *R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.*
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
56
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
57 -----
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
58
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
59 .. class:: warningmark
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
60
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
61 **Note**
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
62
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
63 - This tool currently treats all predictor variables as continuous numeric variables and response variable as categorical variable. Currently, the response variable can have only two classes, namely 0 and 1. The program will take 0 as base class.
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
64
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
65 - Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
66
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
67 - The summary statistics in the output are described below:
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
68
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
69 - Pseudo R-squared: the proportion of model improvement from null model
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
70 - p-value: p-value for the z-test of the null hypothesis that the corresponding slope is equal to zero against the two-sided alternative.
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
71 - Coefficient indicates log ratio of (probability to be class 1 / probability to be class 0)
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
72
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
73 - This tool also provides **Variance Inflation Factor or VIF** which quantifies the level of multicollinearity. The tool will automatic generate VIF if the model has more than one predictor. The higher the VIF, the higher is the multicollinearity. Multicollinearity will inflate standard error and reduce level of significance of the predictor. In the worst case, it can reverse direction of slope for highly correlated predictors if one of them is significant. A general thumb-rule is to use those predictors having VIF lower than 10 or 5.
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
74 - **vif** is calculated by
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
75 - First, regressing each predictor over all other predictors, and recording R-squared for each regression.
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
76 - Second, computing vif as 1/(1- R_squared)
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
77
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
78 </help>
bd196d7c1ca9 Imported from capsule None
devteam
parents:
diff changeset
79 </tool>