comparison rank_pathways.xml @ 21:d6b961721037

Miller Lab Devshed version 4c04e35b18f6
author Richard Burhans <burhans@bx.psu.edu>
date Mon, 05 Nov 2012 12:44:17 -0500
parents 8ae67e9fb6ff
children 95a05c1ef5d5
comparison
equal deleted inserted replaced
20:8a4b8efbc82c 21:d6b961721037
5 #if str($output_format) == 'a' 5 #if str($output_format) == 'a'
6 calctfreq.py 6 calctfreq.py
7 #else if str($output_format) == 'b' 7 #else if str($output_format) == 'b'
8 calclenchange.py 8 calclenchange.py
9 #end if 9 #end if
10 "--loc_file=${GALAXY_DATA_INDEX_DIR}/gd.rank.loc" 10 "--loc_file=${GALAXY_DATA_INDEX_DIR}/gd.rank.loc"
11 "--species=${input.metadata.dbkey}" 11 "--species=${input.metadata.dbkey}"
12 "--input=${input}" 12 "--input=${input}"
13 "--output=${output}" 13 "--output=${output}"
14 "--posKEGGclmn=${input.metadata.kegg_path}" 14 "--posKEGGclmn=${kpath}"
15 "--KEGGgeneposcolmn=${input.metadata.kegg_gene}" 15 "--KEGGgeneposcolmn=${kgene}"
16 </command> 16 </command>
17 17
18 <inputs> 18 <inputs>
19 <param name="input" type="data" format="gd_sap" label="Table"> 19 <param name="input" type="data" format="tab" label="Dataset" />
20 <validator type="metadata" check="kegg_gene,kegg_path" message="Missing KEGG gene code column and/or KEGG pathway code/name column metadata. Click the pencil icon in the history item to edit/save the metadata attributes" /> 20 <param name="kgene" type="data_column" data_ref="input" label="Column with KEGG gene ID" />
21 </param> 21 <param name="kpath" type="data_column" data_ref="input" numerical="false" label="Column with KEGG pathways" />
22 <param name="output_format" type="select" label="Output format"> 22 <param name="output_format" type="select" label="Output">
23 <option value="a" selected="true">ranked by percentage of genes affected</option> 23 <option value="a" selected="true">ranked by percentage of genes affected</option>
24 <option value="b">ranked by change in length and number of paths</option> 24 <option value="b">ranked by change in length and number of paths</option>
25 </param> 25 </param>
26 </inputs> 26 </inputs>
27 27
30 </outputs> 30 </outputs>
31 31
32 <tests> 32 <tests>
33 <test> 33 <test>
34 <param name="input" value="test_in/sample.gd_sap" ftype="gd_sap" /> 34 <param name="input" value="test_in/sample.gd_sap" ftype="gd_sap" />
35 <param name="kgene" value="10" />
36 <param name="kpath" value="12" />
35 <param name="output_format" value="a" /> 37 <param name="output_format" value="a" />
36 <output name="output" file="test_out/rank_pathways/rank_pathways.tabular" /> 38 <output name="output" file="test_out/rank_pathways/rank_pathways.tabular" />
37 </test> 39 </test>
38 </tests> 40 </tests>
39 41
40 <help> 42 <help>
43
44 **Dataset formats**
45
46 The input and output datasets are in tabular_ format.
47 The input dataset must have columns with KEGG gene ID and pathways.
48 The output dataset is described below.
49 (`Dataset missing?`_)
50
51 .. _tabular: ./static/formatHelp.html#tab
52 .. _Dataset missing?: ./static/formatHelp.html
53
54 -----
41 55
42 **What it does** 56 **What it does**
43 57
44 This tool produces a table ranking the pathways based on the percentage 58 This tool produces a table ranking the pathways based on the percentage
45 of genes in an input dataset, out of the total in each pathway. 59 of genes in an input dataset, out of the total in each pathway.
52 the pathway. 66 the pathway.
53 67
54 If pathways are ranked by percentage of genes affected, the output is 68 If pathways are ranked by percentage of genes affected, the output is
55 a tabular dataset with the following columns: 69 a tabular dataset with the following columns:
56 70
57 1. number of genes in the pathway present in the input dataset 71 1. number of genes in the pathway present in the input dataset
58 2. percentage of the total genes in the pathway included in the input dataset 72 2. percentage of the total genes in the pathway included in the input dataset
59 3. rank of the frequency (from high freq to low freq) 73 3. rank of the frequency (from high freq to low freq)
60 4. name of the pathway 74 4. name of the pathway
61 75
62 If pathways are ranked by change in length and number of paths, the 76 If pathways are ranked by change in length and number of paths, the
63 output is a tabular dataset with the following columns: 77 output is a tabular dataset with the following columns:
64 78
65 1. change in the mean length of paths between sources and sinks 79 1. change in the mean length of paths between sources and sinks
66 2. mean length of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I) 80 2. mean length of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I)
67 3. mean length of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I) 81 3. mean length of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I)
68 4. rank of the change in the mean length of paths between sources and sinks (from high change to low change) 82 4. rank of the change in the mean length of paths between sources and sinks (from high change to low change)
69 5. change in the number of paths between sources and sinks 83 5. change in the number of paths between sources and sinks
70 6. number of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C) 84 6. number of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C)
71 7. number of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C) 85 7. number of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C)
72 8. rank of the change in the number of paths between sources and sinks (from high change to low change) 86 8. rank of the change in the number of paths between sources and sinks (from high change to low change)
73 9. name of the pathway 87 9. name of the pathway
88
89 -----
90
91 **Examples**
92
93 - input (column 10 for KEGG gene ID, column 12 for KEGG pathways)::
94
95 Contig39_chr1_3261104_3261850 414 chr1 3261546 ENSCAFT00000000001 ENSCAFP00000000001 S 667 F 476153 probably damaging cfa00230=Purine metabolism.cfa00500=Starch and sucrose metabolism.cfa00740=Riboflavin metabolism.cfa00760=Nicotinate and nicotinamide metabolism.cfa00770=Pantothenate and CoA biosynthesis.cfa01100=Metabolic pathways
96 Contig62_chr1_19011969_19012646 265 chr1 19012240 ENSCAFT00000000144 ENSCAFP00000000125 * 161 R 483960 probably damaging N
97 etc.
98
99 - output ranked by percentage of genes affected::
100
101 3 0.25 1 cfa03450=Non-homologous end-joining
102 1 0.25 1 cfa00750=Vitamin B6 metabolism
103 2 0.2 3 cfa00290=Valine, leucine and isoleucine biosynthesis
104 3 0.18 4 cfa00770=Pantothenate and CoA biosynthesis
105 etc.
106
107 - output ranked by change in length and number of paths::
108
109 3.64 8.44 4.8 2 4 9 5 1 cfa00260=Glycine, serine and threonine metabolism
110 7.6 9.6 2 1 3 5 2 2 cfa00240=Pyrimidine metabolism
111 0.05 2.67 2.62 6 1 30 29 3 cfa00982=Drug metabolism - cytochrome P450
112 -0.08 8.33 8.41 84 1 30 29 3 cfa00564=Glycerophospholipid metabolism
113 etc.
74 114
75 </help> 115 </help>
76 </tool> 116 </tool>