annotate cor.xml @ 0:24e01abf9e34 draft default tip

Imported from capsule None
author devteam
date Mon, 28 Jul 2014 11:55:23 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
1 <tool id="cor2" name="Correlation" version="1.0.0">
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
2 <description>for numeric columns</description>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
3 <requirements>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
4 <requirement type="package" version="1.0.3">rpy</requirement>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
5 </requirements>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
6 <command interpreter="python">cor.py $input1 $out_file1 $numeric_columns $method</command>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
7 <inputs>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
8 <param format="tabular" name="input1" type="data" label="Dataset" help="Dataset missing? See TIP below"/>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
9 <param name="numeric_columns" label="Numerical columns" type="data_column" numerical="True" multiple="True" data_ref="input1" help="Multi-select list - hold the appropriate key while clicking to select multiple columns" />
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
10 <param name="method" type="select" label="Method">
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
11 <option value="pearson">Pearson</option>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
12 <option value="kendall">Kendall rank</option>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
13 <option value="spearman">Spearman rank</option>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
14 </param>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
15 </inputs>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
16 <outputs>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
17 <data format="txt" name="out_file1" />
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
18 </outputs>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
19 <tests>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
20 <!--
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
21 Test a tabular input with the first line being a comment without a # character to start
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
22 -->
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
23 <test>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
24 <param name="input1" value="cor.tabular" />
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
25 <param name="numeric_columns" value="2,3" />
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
26 <param name="method" value="pearson" />
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
27 <output name="out_file1" file="cor_out.txt" />
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
28 </test>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
29 </tests>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
30 <help>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
31
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
32 .. class:: infomark
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
33
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
34 **TIP:** If your data is not TAB delimited, use *Text Manipulation-&gt;Convert*
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
35
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
36 .. class:: warningmark
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
37
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
38 Missing data ("nan") removed from each pairwise comparison
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
39
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
40 -----
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
41
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
42 **Syntax**
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
43
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
44 This tool computes the matrix of correlation coefficients between numeric columns.
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
45
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
46 - All invalid, blank and comment lines are skipped when performing computations. The number of skipped lines is displayed in the resulting history item.
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
47
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
48 - **Pearson's Correlation** reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The formula for Pearson's correlation is:
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
49
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
50 .. image:: pearson.png
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
51
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
52 where n is the number of items
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
53
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
54 - **Kendall's rank correlation** is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. The formula for Kendall's rank correlation is:
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
55
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
56 .. image:: kendall.png
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
57
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
58 where n is the number of items, and P is the sum.
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
59
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
60 - **Spearman's rank correlation** assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. The formula for Spearman's rank correlation is
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
61
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
62 .. image:: spearman.png
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
63
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
64 where D is the difference between the ranks of corresponding values of X and Y, and N is the number of pairs of values.
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
65
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
66 -----
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
67
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
68 **Example**
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
69
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
70 - Input file::
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
71
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
72 #Person Height Self Esteem
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
73 1 68 4.1
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
74 2 71 4.6
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
75 3 62 3.8
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
76 4 75 4.4
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
77 5 58 3.2
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
78 6 60 3.1
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
79 7 67 3.8
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
80 8 68 4.1
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
81 9 71 4.3
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
82 10 69 3.7
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
83 11 68 3.5
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
84 12 67 3.2
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
85 13 63 3.7
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
86 14 62 3.3
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
87 15 60 3.4
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
88 16 63 4.0
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
89 17 65 4.1
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
90 18 67 3.8
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
91 19 63 3.4
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
92 20 61 3.6
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
93
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
94 - Computing the correlation coefficients between columns 2 and 3 of the above file (using Pearson's Correlation), the output is::
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
95
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
96 1.0 0.730635686279
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
97 0.730635686279 1.0
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
98
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
99 So the correlation for our twenty cases is .73, which is a fairly strong positive relationship.
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
100 </help>
24e01abf9e34 Imported from capsule None
devteam
parents:
diff changeset
101 </tool>