genome_diversity: rank_pathways.xml comparison

comparison rank_pathways.xml @ 21:d6b961721037

Miller Lab Devshed version 4c04e35b18f6

author	Richard Burhans <burhans@bx.psu.edu>
date	Mon, 05 Nov 2012 12:44:17 -0500
parents	8ae67e9fb6ff
children	95a05c1ef5d5

comparison

equal deleted inserted replaced

-:8a4b8efbc82c
+:d6b961721037
 #if str($output_format) == 'a'
 calctfreq.py
 #else if str($output_format) == 'b'
 calclenchange.py
 #end if
 "--loc_file=${GALAXY_DATA_INDEX_DIR}/gd.rank.loc"
 "--species=${input.metadata.dbkey}"
 "--input=${input}"
 "--output=${output}"
-"--posKEGGclmn=${input.metadata.kegg_path}"
+"--posKEGGclmn=${kpath}"
-"--KEGGgeneposcolmn=${input.metadata.kegg_gene}"
+"--KEGGgeneposcolmn=${kgene}"
 </command>
 <inputs>
-<param name="input" type="data" format="gd_sap" label="Table">
+<param name="input" type="data" format="tab" label="Dataset" />
-<validator type="metadata" check="kegg_gene,kegg_path" message="Missing KEGG gene code column and/or KEGG pathway code/name column metadata.  Click the pencil icon in the history item to edit/save the metadata attributes" />
+<param name="kgene" type="data_column" data_ref="input" label="Column with KEGG gene ID"  />
-</param>
+<param name="kpath" type="data_column" data_ref="input" numerical="false" label="Column with KEGG pathways" />
-<param name="output_format" type="select" label="Output format">
+<param name="output_format" type="select" label="Output">
 <option value="a" selected="true">ranked by percentage of genes affected</option>
 <option value="b">ranked by change in length and number of paths</option>
 </param>
 </inputs>
 </outputs>
 <tests>
 <test>
 <param name="input" value="test_in/sample.gd_sap" ftype="gd_sap" />
+<param name="kgene" value="10" />
+<param name="kpath" value="12" />
 <param name="output_format" value="a" />
 <output name="output" file="test_out/rank_pathways/rank_pathways.tabular" />
 </test>
 </tests>
 <help>
+**Dataset formats**
+The input and output datasets are in tabular_ format.
+The input dataset must have columns with KEGG gene ID and pathways.
+The output dataset is described below.
+(`Dataset missing?`_)
+.. _tabular: ./static/formatHelp.html#tab
+.. _Dataset missing?: ./static/formatHelp.html
+-----
 **What it does**
 This tool produces a table ranking the pathways based on the percentage
 of genes in an input dataset, out of the total in each pathway.
 the pathway.
 If pathways are ranked by percentage of genes affected, the output is
 a tabular dataset with the following columns:
 1. number of genes in the pathway present in the input dataset
 2. percentage of the total genes in the pathway included in the input dataset
 3. rank of the frequency (from high freq to low freq)
 4. name of the pathway
 If pathways are ranked by change in length and number of paths, the
 output is a tabular dataset with the following columns:
 1. change in the mean length of paths between sources and sinks
 2. mean length of paths between sources and sinks in the pathway including the genes in the input dataset.  If the pathway do not have sources/sinks, the length is assumed to be infinite (I)
 3. mean length of paths between sources and sinks in the pathway excluding the genes in the input dataset.  If the pathway do not have sources/sinks, the length is assumed to be infinite (I)
 4. rank of the change in the mean length of paths between sources and sinks (from high change to low change)
 5. change in the number of paths between sources and sinks
 6. number of paths between sources and sinks in the pathway including the genes in the input dataset.  If the pathway do not have sources/sinks, it is assumed to be a circuit (C)
 7. number of paths between sources and sinks in the pathway excluding the genes in the input dataset.  If the pathway do not have sources/sinks, it is assumed to be a circuit (C)
 8. rank of the change in the number of paths between sources and sinks (from high change to low change)
 9. name of the pathway
+-----
+**Examples**
+- input (column 10 for KEGG gene ID, column 12 for KEGG pathways)::
+Contig39_chr1_3261104_3261850   414  chr1  3261546  ENSCAFT00000000001   ENSCAFP00000000001   S    667   F    476153  probably damaging    cfa00230=Purine metabolism.cfa00500=Starch and sucrose metabolism.cfa00740=Riboflavin metabolism.cfa00760=Nicotinate and nicotinamide metabolism.cfa00770=Pantothenate and CoA biosynthesis.cfa01100=Metabolic pathways
+Contig62_chr1_19011969_19012646 265  chr1  19012240 ENSCAFT00000000144   ENSCAFP00000000125   *    161   R    483960  probably damaging    N
+etc.
+- output ranked by percentage of genes affected::
+3       0.25    1       cfa03450=Non-homologous end-joining
+1       0.25    1       cfa00750=Vitamin B6 metabolism
+2       0.2     3       cfa00290=Valine, leucine and isoleucine biosynthesis
+3       0.18    4       cfa00770=Pantothenate and CoA biosynthesis
+etc.
+- output ranked by change in length and number of paths::
+3.64	  8.44	4.8	2	4	9	5	1	cfa00260=Glycine, serine and threonine metabolism
+7.6	  9.6	2	1	3	5	2	2	cfa00240=Pyrimidine metabolism
+0.05	  2.67	2.62	6	1	30	29	3	cfa00982=Drug metabolism - cytochrome P450
+-0.08  8.33	8.41	84	1	30	29	3	cfa00564=Glycerophospholipid metabolism
+etc.
 </help>
 </tool>

Mercurial > repos > miller-lab > genome_diversity

comparison rank_pathways.xml @ 21:d6b961721037