view tools/regVariation/linear_regression.xml @ 1:cdcb0ce84a1b

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:45:15 -0500
parents 9071e359b9a3
children
line wrap: on
line source

<tool id="LinearRegression1" name="Perform Linear Regression" version="1.0.1">
  <description> </description>
  <command interpreter="python">
    linear_regression.py 
      $input1
      $response_col
      $predictor_cols
      $out_file1
      $out_file2
      1>/dev/null
  </command>
  <inputs>
    <param format="tabular" name="input1" type="data" label="Select data" help="Dataset missing? See TIP below."/>
    <param name="response_col" label="Response column (Y)" type="data_column" data_ref="input1" numerical="True"/>
    <param name="predictor_cols" label="Predictor columns (X)" type="data_column" data_ref="input1" numerical="True" multiple="true" >
        <validator type="no_options" message="Please select at least one column."/>
    </param>
  </inputs>
  <outputs>
    <data format="input" name="out_file1" metadata_source="input1" />
    <data format="pdf" name="out_file2" />
  </outputs>
  <requirements>
    <requirement type="python-module">rpy</requirement>
  </requirements>
  <tests>
    <test>
        <param name="input1" value="regr_inp.tabular"/>
        <param name="response_col" value="3"/>
        <param name="predictor_cols" value="1,2"/>
        <output name="out_file1" file="regr_out.tabular"/>
        <output name="out_file2" file="regr_out.pdf"/>
    </test>
  </tests>
  <help>


.. class:: infomark

**TIP:** If your data is not TAB delimited, use *Edit Datasets-&gt;Convert characters*

-----

.. class:: infomark

**What it does**

This tool uses the 'lm' function from R statistical package to perform linear regression on the input data. It outputs two files, one containing the summary statistics of the performed regression, and the other containing diagnostic plots to check whether model assumptions are satisfied.   

*R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.*

-----

.. class:: warningmark

**Note**

- This tool currently treats all predictor and response variables as continuous numeric variables. Running the tool on categorical variables might result in incorrect results.

- Rows containing non-numeric (or missing) data in any of the chosen columns will be skipped from the analysis.

- The summary statistics in the output are described below:

  - sigma: the square root of the estimated variance of the random error (standard error of the residiuals)
  - R-squared: the fraction of variance explained by the model
  - Adjusted R-squared: the above R-squared statistic adjusted, penalizing for the number of the predictors (p)
  - p-value: p-value for the t-test of the null hypothesis that the corresponding slope is equal to zero against the two-sided alternative.


  </help>
</tool>