Mercurial > repos > miller-lab > genome_diversity
comparison rank_pathways.xml @ 21:d6b961721037
Miller Lab Devshed version 4c04e35b18f6
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Mon, 05 Nov 2012 12:44:17 -0500 |
parents | 8ae67e9fb6ff |
children | 95a05c1ef5d5 |
comparison
equal
deleted
inserted
replaced
20:8a4b8efbc82c | 21:d6b961721037 |
---|---|
5 #if str($output_format) == 'a' | 5 #if str($output_format) == 'a' |
6 calctfreq.py | 6 calctfreq.py |
7 #else if str($output_format) == 'b' | 7 #else if str($output_format) == 'b' |
8 calclenchange.py | 8 calclenchange.py |
9 #end if | 9 #end if |
10 "--loc_file=${GALAXY_DATA_INDEX_DIR}/gd.rank.loc" | 10 "--loc_file=${GALAXY_DATA_INDEX_DIR}/gd.rank.loc" |
11 "--species=${input.metadata.dbkey}" | 11 "--species=${input.metadata.dbkey}" |
12 "--input=${input}" | 12 "--input=${input}" |
13 "--output=${output}" | 13 "--output=${output}" |
14 "--posKEGGclmn=${input.metadata.kegg_path}" | 14 "--posKEGGclmn=${kpath}" |
15 "--KEGGgeneposcolmn=${input.metadata.kegg_gene}" | 15 "--KEGGgeneposcolmn=${kgene}" |
16 </command> | 16 </command> |
17 | 17 |
18 <inputs> | 18 <inputs> |
19 <param name="input" type="data" format="gd_sap" label="Table"> | 19 <param name="input" type="data" format="tab" label="Dataset" /> |
20 <validator type="metadata" check="kegg_gene,kegg_path" message="Missing KEGG gene code column and/or KEGG pathway code/name column metadata. Click the pencil icon in the history item to edit/save the metadata attributes" /> | 20 <param name="kgene" type="data_column" data_ref="input" label="Column with KEGG gene ID" /> |
21 </param> | 21 <param name="kpath" type="data_column" data_ref="input" numerical="false" label="Column with KEGG pathways" /> |
22 <param name="output_format" type="select" label="Output format"> | 22 <param name="output_format" type="select" label="Output"> |
23 <option value="a" selected="true">ranked by percentage of genes affected</option> | 23 <option value="a" selected="true">ranked by percentage of genes affected</option> |
24 <option value="b">ranked by change in length and number of paths</option> | 24 <option value="b">ranked by change in length and number of paths</option> |
25 </param> | 25 </param> |
26 </inputs> | 26 </inputs> |
27 | 27 |
30 </outputs> | 30 </outputs> |
31 | 31 |
32 <tests> | 32 <tests> |
33 <test> | 33 <test> |
34 <param name="input" value="test_in/sample.gd_sap" ftype="gd_sap" /> | 34 <param name="input" value="test_in/sample.gd_sap" ftype="gd_sap" /> |
35 <param name="kgene" value="10" /> | |
36 <param name="kpath" value="12" /> | |
35 <param name="output_format" value="a" /> | 37 <param name="output_format" value="a" /> |
36 <output name="output" file="test_out/rank_pathways/rank_pathways.tabular" /> | 38 <output name="output" file="test_out/rank_pathways/rank_pathways.tabular" /> |
37 </test> | 39 </test> |
38 </tests> | 40 </tests> |
39 | 41 |
40 <help> | 42 <help> |
43 | |
44 **Dataset formats** | |
45 | |
46 The input and output datasets are in tabular_ format. | |
47 The input dataset must have columns with KEGG gene ID and pathways. | |
48 The output dataset is described below. | |
49 (`Dataset missing?`_) | |
50 | |
51 .. _tabular: ./static/formatHelp.html#tab | |
52 .. _Dataset missing?: ./static/formatHelp.html | |
53 | |
54 ----- | |
41 | 55 |
42 **What it does** | 56 **What it does** |
43 | 57 |
44 This tool produces a table ranking the pathways based on the percentage | 58 This tool produces a table ranking the pathways based on the percentage |
45 of genes in an input dataset, out of the total in each pathway. | 59 of genes in an input dataset, out of the total in each pathway. |
52 the pathway. | 66 the pathway. |
53 | 67 |
54 If pathways are ranked by percentage of genes affected, the output is | 68 If pathways are ranked by percentage of genes affected, the output is |
55 a tabular dataset with the following columns: | 69 a tabular dataset with the following columns: |
56 | 70 |
57 1. number of genes in the pathway present in the input dataset | 71 1. number of genes in the pathway present in the input dataset |
58 2. percentage of the total genes in the pathway included in the input dataset | 72 2. percentage of the total genes in the pathway included in the input dataset |
59 3. rank of the frequency (from high freq to low freq) | 73 3. rank of the frequency (from high freq to low freq) |
60 4. name of the pathway | 74 4. name of the pathway |
61 | 75 |
62 If pathways are ranked by change in length and number of paths, the | 76 If pathways are ranked by change in length and number of paths, the |
63 output is a tabular dataset with the following columns: | 77 output is a tabular dataset with the following columns: |
64 | 78 |
65 1. change in the mean length of paths between sources and sinks | 79 1. change in the mean length of paths between sources and sinks |
66 2. mean length of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I) | 80 2. mean length of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I) |
67 3. mean length of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I) | 81 3. mean length of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, the length is assumed to be infinite (I) |
68 4. rank of the change in the mean length of paths between sources and sinks (from high change to low change) | 82 4. rank of the change in the mean length of paths between sources and sinks (from high change to low change) |
69 5. change in the number of paths between sources and sinks | 83 5. change in the number of paths between sources and sinks |
70 6. number of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C) | 84 6. number of paths between sources and sinks in the pathway including the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C) |
71 7. number of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C) | 85 7. number of paths between sources and sinks in the pathway excluding the genes in the input dataset. If the pathway do not have sources/sinks, it is assumed to be a circuit (C) |
72 8. rank of the change in the number of paths between sources and sinks (from high change to low change) | 86 8. rank of the change in the number of paths between sources and sinks (from high change to low change) |
73 9. name of the pathway | 87 9. name of the pathway |
88 | |
89 ----- | |
90 | |
91 **Examples** | |
92 | |
93 - input (column 10 for KEGG gene ID, column 12 for KEGG pathways):: | |
94 | |
95 Contig39_chr1_3261104_3261850 414 chr1 3261546 ENSCAFT00000000001 ENSCAFP00000000001 S 667 F 476153 probably damaging cfa00230=Purine metabolism.cfa00500=Starch and sucrose metabolism.cfa00740=Riboflavin metabolism.cfa00760=Nicotinate and nicotinamide metabolism.cfa00770=Pantothenate and CoA biosynthesis.cfa01100=Metabolic pathways | |
96 Contig62_chr1_19011969_19012646 265 chr1 19012240 ENSCAFT00000000144 ENSCAFP00000000125 * 161 R 483960 probably damaging N | |
97 etc. | |
98 | |
99 - output ranked by percentage of genes affected:: | |
100 | |
101 3 0.25 1 cfa03450=Non-homologous end-joining | |
102 1 0.25 1 cfa00750=Vitamin B6 metabolism | |
103 2 0.2 3 cfa00290=Valine, leucine and isoleucine biosynthesis | |
104 3 0.18 4 cfa00770=Pantothenate and CoA biosynthesis | |
105 etc. | |
106 | |
107 - output ranked by change in length and number of paths:: | |
108 | |
109 3.64 8.44 4.8 2 4 9 5 1 cfa00260=Glycine, serine and threonine metabolism | |
110 7.6 9.6 2 1 3 5 2 2 cfa00240=Pyrimidine metabolism | |
111 0.05 2.67 2.62 6 1 30 29 3 cfa00982=Drug metabolism - cytochrome P450 | |
112 -0.08 8.33 8.41 84 1 30 29 3 cfa00564=Glycerophospholipid metabolism | |
113 etc. | |
74 | 114 |
75 </help> | 115 </help> |
76 </tool> | 116 </tool> |