comparison logistic_regression_vif.xml @ 0:bd196d7c1ca9 draft default tip

Imported from capsule None
author devteam
date Tue, 01 Apr 2014 10:51:26 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:bd196d7c1ca9
1 <tool id="LogisticRegression" name="Perform Logistic Regression with vif" version="1.0.1">
2 <description> </description>
3 <requirements>
4 <requirement type="package" version="1.7.1">numpy</requirement>
5 <requirement type="package" version="1.0.3">rpy</requirement>
6 <requirement type="package" version="2.11.0">R</requirement>
7 </requirements>
8 <command interpreter="python">
9 logistic_regression_vif.py
10 $input1
11 $response_col
12 $predictor_cols
13 $out_file1
14 1>/dev/null
15 </command>
16 <inputs>
17 <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
18 <param name="response_col" label="Response column (Y)" type="data_column" data_ref="input1" numerical="True"/>
19 <param name="predictor_cols" label="Predictor columns (X)" type="data_column" data_ref="input1" numerical="True" multiple="true" >
20 <validator type="no_options" message="Please select at least one column."/>
21 </param>
22 </inputs>
23 <outputs>
24 <data format="input" name="out_file1" metadata_source="input1" />
25
26 </outputs>
27 <requirements>
28 <requirement type="python-module">rpy</requirement>
29 </requirements>
30 <tests>
31 <test>
32 <param name="input1" value="logreg_inp.tabular"/>
33 <param name="response_col" value="4"/>
34 <param name="predictor_cols" value="1,2,3"/>
35 <output name="out_file1" file="logreg_out2.tabular"/>
36
37 </test>
38 </tests>
39 <help>
40
41
42 .. class:: infomark
43
44 **TIP:** If your data is not TAB delimited, use *Edit Datasets-&gt;Convert characters*
45
46 -----
47
48 .. class:: infomark
49
50 **What it does**
51
52 This tool uses the **'glm'** function from R statistical package to perform logistic regression on the input data. It outputs one file containing the summary statistics of the performed regression. Also, it calculates VIF(Variance Inflation Factor) with **'vif'** function from library (car) in R.
53
54
55 *R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.*
56
57 -----
58
59 .. class:: warningmark
60
61 **Note**
62
63 - This tool currently treats all predictor variables as continuous numeric variables and response variable as categorical variable. Currently, the response variable can have only two classes, namely 0 and 1. The program will take 0 as base class.
64
65 - Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.
66
67 - The summary statistics in the output are described below:
68
69 - Pseudo R-squared: the proportion of model improvement from null model
70 - p-value: p-value for the z-test of the null hypothesis that the corresponding slope is equal to zero against the two-sided alternative.
71 - Coefficient indicates log ratio of (probability to be class 1 / probability to be class 0)
72
73 - This tool also provides **Variance Inflation Factor or VIF** which quantifies the level of multicollinearity. The tool will automatic generate VIF if the model has more than one predictor. The higher the VIF, the higher is the multicollinearity. Multicollinearity will inflate standard error and reduce level of significance of the predictor. In the worst case, it can reverse direction of slope for highly correlated predictors if one of them is significant. A general thumb-rule is to use those predictors having VIF lower than 10 or 5.
74 - **vif** is calculated by
75 - First, regressing each predictor over all other predictors, and recording R-squared for each regression.
76 - Second, computing vif as 1/(1- R_squared)
77
78 </help>
79 </tool>