comparison rank_terms.xml @ 22:95a05c1ef5d5

update to devshed revision aaece207bd01
author Richard Burhans <burhans@bx.psu.edu>
date Mon, 11 Mar 2013 11:28:06 -0400
parents
children 8997f2ca8c7a
comparison
equal deleted inserted replaced
21:d6b961721037 22:95a05c1ef5d5
1 <tool id="gd_rank_terms" name="Rank Terms" version="1.0.0">
2 <description>: Assess the enrichment/depletion of a gene set for GO terms</description>
3
4 <command interpreter="python">
5 #set $t_col1_0 = int(str($t_col1)) - 1
6 #set $t_col2_0 = int(str($t_col2)) - 1
7 #set $g_col2_0 = int(str($g_col2)) - 1
8 rank_terms.py --input "$input1" --columnENSEMBLT $t_col1_0 --inExtnddfile "$input2" --columnENSEMBLTExtndd $t_col2_0 --columnGOExtndd $g_col2_0 --output "$output"
9 </command>
10
11 <inputs>
12 <param name="input1" type="data" format="tabular" label="Query dataset" />
13 <param name="t_col1" type="data_column" data_ref="input1" label="Column with ENSEMBL transcript codes" />
14
15 <param name="input2" type="data" format="tabular" label="Background dataset" />
16 <param name="t_col2" type="data_column" data_ref="input2" label="Column with ENSEMBL transcript codes" />
17 <param name="g_col2" type="data_column" data_ref="input2" label="Column with GO terms" />
18 </inputs>
19
20 <outputs>
21 <data name="output" format="tabular" />
22 </outputs>
23
24 <help>
25
26 **Dataset formats**
27
28 All of the input and output datasets are in tabular_ format.
29 The query dataset has a column containing ENSEMBL transcript codes for
30 the gene set of interest, while the background dataset has one column
31 with ENSEMBL transcript codes and another with GO terms, for some
32 larger universe of genes.
33 The output dataset is described below.
34 (`Dataset missing?`_)
35
36 .. _tabular: ./static/formatHelp.html#tab
37 .. _Dataset missing?: ./static/formatHelp.html
38
39 -----
40
41 **What it does**
42
43 Given a query set of genes from a larger background dataset, this tool
44 evaluates the statistical over- or under-representation of Gene Ontology
45 terms in the query set, using a two-tailed Fisher's exact test.
46
47 The output contains a row for each GO term, with the following columns:
48
49 1. count: the number of genes in the query set that are in this GO category
50 2. representation: the percentage of this category's genes (from the background dataset) that appear in the query set
51 3. ranking of this term, based on its representation ("1" is highest)
52 4. Fisher probability of enrichment/depletion of this GO category in the query dataset
53 5. GO term
54
55 </help>
56 </tool>