Mercurial > repos > ynewton > matrix_normalization
changeset 0:a9d8d4b531f7 draft
Uploaded
author | ynewton |
---|---|
date | Thu, 13 Dec 2012 11:19:17 -0500 |
parents | |
children | 8389b0c211ae |
files | normalize.xml |
diffstat | 1 files changed, 56 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/normalize.xml Thu Dec 13 11:19:17 2012 -0500 @@ -0,0 +1,56 @@ +<tool id="matrix_normalize" name="Matrix Normalize" version="2.0.0"> + <description>Matrix Normalize</description> + <command interpreter="Rscript">normalize.r $genomicMatrix $normType $normBy +#if str($controlColumnLabelsList) != "None": + $controlColumnLabelsList +#end if + > $outfile + </command> + <inputs> + <param name="genomicMatrix" type="data" label="Genomic Matrix"/> + <param name="normBy" type="select" label="normalize by (row or column)"> + <option value="row">ROW</option> + <option value="column">COLUMN</option> + </param> + <param name="normType" type="select" label="type of normalization"> + <option value="median_shift">Median Shift</option> + <option value="mean_shift">Mean Shift</option> + <option value="t_statistic">Student t-statistic (z-scores)</option> + <option value="exponential_fit">Exponential Distribution Normalization</option> + <option value="normal_fit">Normal Distribution Normalization</option> + <option value="weibull_0.5_fit">Weibull Distribution Normalization (scale=1,shape=0.5)</option> + <option value="weibull_1_fit">Weibull Distribution Normalization (scale=1,shape=1)</option> + <option value="weibull_1.5_fit">Weibull Distribution Normalization (scale=1,shape=1.5)</option> + <option value="weibull_5_fit">Weibull Distribution Normalization (scale=1,shape=5)</option> + </param> + <param name="controlColumnLabelsList" optional="true" type="data" label="Controls"/> + </inputs> + <outputs> + <data name="outfile" format="tabular"/> + </outputs> + <help> +**What it does** + +This tool takes data in a matrix format and normalizes it using the chosen normalization options. The matrix data is assumed to be column and row annotated, meaning that the first line of the matrix file is assumed to be the column headers and the first column of each row is assumed to be the row header. + +Data can be normalized either by row or column. Note that exponential, normal, and weibull normalizations automatically do so by column regardless of the user selection. + +The following normalizations are provided: + +1. Median shift: if no normals list is provided then computes the median for the whole row and subtracts it from each entry of the row. If normals are provided then computes median for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored. + +2. Mean shift: if no normals list is provided then computes the mean for the whole row and subtracts it from each entry of the row. If normals are provided then computes mean for normals and subtracts it from each value of non-normal. Returns only non-normal samples if normals are provided. If "Column" is selected in normalize by, then normals are ignored. + +3. T-statistic (z-score): sometimes called standardization. Z-score is computed for each value of the row/column. If normals are specified then the z-score within each class (normals and non-normals) is computed. + +4. Exponential normalization: performed by columns/samples. All genes/probes in the column/sample are ranked. Then inverse CDF (quantile function) is applied to the ranks (transforms a rank to a real number in exponential distribution). + +5. Normal normalization: same as exponential normalization, but inverse quantile function of Normal distribution is applied. + +6. Weibull normalizations: same as exponential normalization, but inverse quantile function of Weibull distribution is applied with appropriate scale and shape parameters. + + +Normals/controls parameter is an optional parameter which contains either a list of column headers from the input matrix which should be considered as normals/controls, or a matrix of normal/control samples. The program is smart enough to distinguish between the two cases and will automatically process the normals/controls in a correct way. When specifying both the main expression matrix and the normals/controls matrix while performing column-wise normalization, the program will actually concatenate the two matrices and produce a combined matrix which contains both tumor and normal/control samples, in which samples are normalized. + + </help> +</tool> \ No newline at end of file