annotate rank_terms.xml @ 25:cba0d7a63b82

workaround for gd_genotype datatype admix shift int -> float
author Richard Burhans <burhans@bx.psu.edu>
date Wed, 29 May 2013 13:49:19 -0400
parents 95a05c1ef5d5
children 8997f2ca8c7a
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
22
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
1 <tool id="gd_rank_terms" name="Rank Terms" version="1.0.0">
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
2 <description>: Assess the enrichment/depletion of a gene set for GO terms</description>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
3
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
4 <command interpreter="python">
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
5 #set $t_col1_0 = int(str($t_col1)) - 1
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
6 #set $t_col2_0 = int(str($t_col2)) - 1
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
7 #set $g_col2_0 = int(str($g_col2)) - 1
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
8 rank_terms.py --input "$input1" --columnENSEMBLT $t_col1_0 --inExtnddfile "$input2" --columnENSEMBLTExtndd $t_col2_0 --columnGOExtndd $g_col2_0 --output "$output"
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
9 </command>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
10
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
11 <inputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
12 <param name="input1" type="data" format="tabular" label="Query dataset" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
13 <param name="t_col1" type="data_column" data_ref="input1" label="Column with ENSEMBL transcript codes" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
14
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
15 <param name="input2" type="data" format="tabular" label="Background dataset" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
16 <param name="t_col2" type="data_column" data_ref="input2" label="Column with ENSEMBL transcript codes" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
17 <param name="g_col2" type="data_column" data_ref="input2" label="Column with GO terms" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
18 </inputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
19
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
20 <outputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
21 <data name="output" format="tabular" />
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
22 </outputs>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
23
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
24 <help>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
25
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
26 **Dataset formats**
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
27
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
28 All of the input and output datasets are in tabular_ format.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
29 The query dataset has a column containing ENSEMBL transcript codes for
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
30 the gene set of interest, while the background dataset has one column
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
31 with ENSEMBL transcript codes and another with GO terms, for some
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
32 larger universe of genes.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
33 The output dataset is described below.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
34 (`Dataset missing?`_)
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
35
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
36 .. _tabular: ./static/formatHelp.html#tab
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
37 .. _Dataset missing?: ./static/formatHelp.html
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
38
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
39 -----
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
40
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
41 **What it does**
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
42
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
43 Given a query set of genes from a larger background dataset, this tool
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
44 evaluates the statistical over- or under-representation of Gene Ontology
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
45 terms in the query set, using a two-tailed Fisher's exact test.
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
46
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
47 The output contains a row for each GO term, with the following columns:
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
48
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
49 1. count: the number of genes in the query set that are in this GO category
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
50 2. representation: the percentage of this category's genes (from the background dataset) that appear in the query set
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
51 3. ranking of this term, based on its representation ("1" is highest)
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
52 4. Fisher probability of enrichment/depletion of this GO category in the query dataset
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
53 5. GO term
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
54
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
55 </help>
95a05c1ef5d5 update to devshed revision aaece207bd01
Richard Burhans <burhans@bx.psu.edu>
parents:
diff changeset
56 </tool>