comparison cluster_kegg.xml @ 31:a631c2f6d913

Update to Miller Lab devshed revision 3c4110ffacc3
author Richard Burhans <burhans@bx.psu.edu>
date Fri, 20 Sep 2013 13:25:27 -0400
parents
children
comparison
equal deleted inserted replaced
30:4188853b940b 31:a631c2f6d913
1 <tool id="gd_cluster_kegg" name="Cluster KEGG" version="1.0.0">
2 <description>: Group gene categories connected by shared genes</description>
3
4 <command interpreter="python">
5 #set $ensembltcolmn_arg = int(str($ensembltcolmn)) - 1
6 cluster_onConnctdComps.py '--input=$input' '--input_columns=${input.dataset.metadata.columns}' '--outfile=$output' '--threshold=$threshold' '--ENSEMBLTcolmn=$ensembltcolmn_arg' '--classClmns=$classclmns'
7 </command>
8
9 <inputs>
10 <param name="input" type="data" format="tabular" label="Input dataset" />
11 <param name="ensembltcolmn" type="data_column" data_ref="input" numerical="false" label="Column with the ENSEMBL code in the Input dataset" />
12 <param name="threshold" type="float" value="90" min="0" max="100" label="Threshold to disconnect the nodes" />
13 <param name="classclmns" size="10" type="text" value="c1,c2" label="Gene category columns"/>
14
15 </inputs>
16
17 <outputs>
18 <data name="output" format="tabular" />
19 </outputs>
20
21 <requirements>
22 <requirement type="package" version="1.8.1">networkx</requirement>
23 </requirements>
24
25 <help>
26 **What it does**
27
28 The program builds a network of gene categories connected by shared
29 genes. The edges of this network are weighted based on the number of
30 genes that each node shares. The clustering coefficient, c\ :sub:`u`\ , is then calculated for each node using the formula:
31
32 .. image:: $PATH_TO_IMAGES/cluster_kegg_formula.png
33
34 |
35
36 where deg(u) is the degree of u and edge weights, w\ :sub:`uv`\ ,
37 are normalized by the maximum weight in the network. The cluster
38 coefficients are then filtered by our program based on threshold (that
39 could be a percentile or a value choose by the user) and all the nodes
40 with a cluster coefficient lower than this threshold are deleted from
41 the network. Finally, the program reports each connected component as
42 a cluster of gene classifications. With our program a lower number of
43 gene categories is obtained, but the results are easier to interpret as
44 they exclude genes present in many gene groups.
45 </help>
46 </tool>