Mercurial > repos > miller-lab > genome_diversity
comparison cluster_kegg.xml @ 31:a631c2f6d913
Update to Miller Lab devshed revision 3c4110ffacc3
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Fri, 20 Sep 2013 13:25:27 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
30:4188853b940b | 31:a631c2f6d913 |
---|---|
1 <tool id="gd_cluster_kegg" name="Cluster KEGG" version="1.0.0"> | |
2 <description>: Group gene categories connected by shared genes</description> | |
3 | |
4 <command interpreter="python"> | |
5 #set $ensembltcolmn_arg = int(str($ensembltcolmn)) - 1 | |
6 cluster_onConnctdComps.py '--input=$input' '--input_columns=${input.dataset.metadata.columns}' '--outfile=$output' '--threshold=$threshold' '--ENSEMBLTcolmn=$ensembltcolmn_arg' '--classClmns=$classclmns' | |
7 </command> | |
8 | |
9 <inputs> | |
10 <param name="input" type="data" format="tabular" label="Input dataset" /> | |
11 <param name="ensembltcolmn" type="data_column" data_ref="input" numerical="false" label="Column with the ENSEMBL code in the Input dataset" /> | |
12 <param name="threshold" type="float" value="90" min="0" max="100" label="Threshold to disconnect the nodes" /> | |
13 <param name="classclmns" size="10" type="text" value="c1,c2" label="Gene category columns"/> | |
14 | |
15 </inputs> | |
16 | |
17 <outputs> | |
18 <data name="output" format="tabular" /> | |
19 </outputs> | |
20 | |
21 <requirements> | |
22 <requirement type="package" version="1.8.1">networkx</requirement> | |
23 </requirements> | |
24 | |
25 <help> | |
26 **What it does** | |
27 | |
28 The program builds a network of gene categories connected by shared | |
29 genes. The edges of this network are weighted based on the number of | |
30 genes that each node shares. The clustering coefficient, c\ :sub:`u`\ , is then calculated for each node using the formula: | |
31 | |
32 .. image:: $PATH_TO_IMAGES/cluster_kegg_formula.png | |
33 | |
34 | | |
35 | |
36 where deg(u) is the degree of u and edge weights, w\ :sub:`uv`\ , | |
37 are normalized by the maximum weight in the network. The cluster | |
38 coefficients are then filtered by our program based on threshold (that | |
39 could be a percentile or a value choose by the user) and all the nodes | |
40 with a cluster coefficient lower than this threshold are deleted from | |
41 the network. Finally, the program reports each connected component as | |
42 a cluster of gene classifications. With our program a lower number of | |
43 gene categories is obtained, but the results are easier to interpret as | |
44 they exclude genes present in many gene groups. | |
45 </help> | |
46 </tool> |