comparison tools/stats/gsummary.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:9071e359b9a3
1 <tool id="Summary_Statistics1" name="Summary Statistics" version="1.1.0">
2 <description>for any numerical column</description>
3 <command interpreter="python">gsummary.py $input $out_file1 "$cond"</command>
4 <inputs>
5 <param format="tabular" name="input" type="data" label="Summary statistics on" help="Dataset missing? See TIP below"/>
6 <param name="cond" size="30" type="text" value="c5" label="Column or expression" help="See syntax below">
7 <validator type="empty_field" message="Enter a valid column or expression, see syntax below for examples"/>
8 </param>
9 </inputs>
10 <outputs>
11 <data format="tabular" name="out_file1" />
12 </outputs>
13 <requirements>
14 <requirement type="python-module">rpy</requirement>
15 </requirements>
16 <tests>
17 <test>
18 <param name="input" value="1.bed"/>
19 <output name="out_file1" file="gsummary_out1.tabular"/>
20 <param name="cond" value="c2"/>
21 </test>
22 </tests>
23 <help>
24
25 .. class:: warningmark
26
27 This tool expects input datasets consisting of tab-delimited columns (blank or comment lines beginning with a # character are automatically skipped).
28
29 .. class:: infomark
30
31 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert delimiters to TAB*
32
33 .. class:: infomark
34
35 **TIP:** Computing summary statistics may throw exceptions if the data value in every line of the columns being summarized is not numerical. If a line is missing a value or contains a non-numerical value in the column being summarized, that line is skipped and the value is not included in the statistical computation. The number of invalid skipped lines is documented in the resulting history item.
36
37 .. class:: infomark
38
39 **USING R FUNCTIONS:** Most functions (like *abs*) take only a single expression. *log* can take one or two parameters, like *log(expression,base)*
40
41 Currently, these R functions are supported: *abs, sign, sqrt, floor, ceiling, trunc, round, signif, exp, log, cos, sin, tan, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh, lgamma, gamma, gammaCody, digamma, trigamma, cumsum, cumprod, cummax, cummin*
42
43 -----
44
45 **Syntax**
46
47 This tool computes basic summary statistics on a given column, or on a valid expression containing one or more columns.
48
49 - Columns are referenced with **c** and a **number**. For example, **c1** refers to the first column of a tab-delimited file.
50
51 - For example:
52
53 - **log(c5)** calculates the summary statistics for the natural log of column 5
54 - **(c5 + c6 + c7) / 3** calculates the summary statistics on the average of columns 5-7
55 - **log(c5,10)** summary statistics of the base 10 log of column 5
56 - **sqrt(c5+c9)** summary statistics of the square root of column 5 + column 9
57
58 -----
59
60 **Examples**
61
62 - Input Dataset::
63
64 c1 c2 c3 c4 c5 c6
65 586 chrX 161416 170887 41108_at 16990
66 73 chrX 505078 532318 35073_at 1700
67 595 chrX 1361578 1388460 33665_s_at 1960
68 74 chrX 1420620 1461919 1185_at 8600
69
70 - Summary Statistics on column c6 of the above input dataset::
71
72 #sum mean stdev 0% 25% 50% 75% 100%
73 29250.000 7312.500 7198.636 1700.000 1895.000 5280.000 10697.500 16990.000
74
75 </help>
76 </tool>