diff tools/taxonomy/find_diag_hits.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/taxonomy/find_diag_hits.xml	Fri Mar 09 19:37:19 2012 -0500
@@ -0,0 +1,99 @@
+<tool id="find_diag_hits" name="Find diagnostic hits" version="1.0.0">
+    <description></description>
+    <requirements>
+        <requirement type="package">taxonomy</requirement>
+    </requirements>
+    <command interpreter="python">find_diag_hits.py $input1 $id_col $rank_list $out_format $out_file1</command>
+    <inputs>
+        <param format="taxonomy" name="input1" type="data" label="Find diagnostic hits in"/>
+        <param name="id_col" type="data_column" data_ref="input1" numerical="False" label="Select column with sequence id" />
+        <param name="rank_list" type="select" display="checkboxes" multiple="true" label="select taxonomic ranks">
+            <option value="superkingdom">Superkingdom</option>
+            <option value="kingdom">Kingdom</option>
+            <option value="subkingdom">Subkingdom</option>
+            <option value="superphylum">Superphylum</option>
+            <option value="phylum">Phylum</option>
+            <option value="subphylum">Subphylum</option>
+            <option value="superclass">Superclass</option>
+            <option value="class">Class</option>
+            <option value="subclass">Subclass</option>
+            <option value="superorder">Superorder</option>
+            <option value="order">Order</option>
+            <option value="suborder">Suborder</option>
+            <option value="superfamily">Superfamily</option>
+            <option value="family">Family</option>
+            <option value="subfamily">Subfamily</option>
+            <option value="tribe">Tribe</option>
+            <option value="subtribe">Subtribe</option>
+            <option value="genus">Genus</option>
+            <option value="subgenus">Subgenus</option>
+            <option selected="true" value="species">Species</option>
+            <option value="subspecies">Subspecies</option>
+        </param>
+        <param name="out_format" type="select" label="Select output format">
+            <option value="reads">Diagnostic read list</option>
+            <option value="counts">Number of diagnostic reads per taxonomic rank</option>
+        </param>
+    </inputs>
+    <outputs>
+        <data format="tabular" name="out_file1" />
+    </outputs>
+      <tests>
+    <test>
+      <param name="input1" value="taxonomyGI.taxonomy" ftype="taxonomy"/>
+      <param name="id_col" value="1" />
+      <param name="rank_list" value="order,genus" />
+      <param name="out_format" value="counts" />
+      <output name="out_file1" file="find_diag_hits.tabular" />
+    </test> 
+  </tests>
+
+    
+<help>
+
+**What it does**
+
+When performing metagenomic analyses it is often necessary to identify sequence reads corresponding to a particular taxonomic group, or, in other words, diagnostic of a particular taxonomic rank. This utility performs this analysis. It takes data generated by *Taxonomy manipulation->Fetch Taxonomic Ranks* as input and outputs either a list of sequence reads unique to a particular taxonomic rank, or a list of taxonomic ranks and the count of unique reads corresponding to each rank. 
+
+------
+
+**Example**
+
+Suppose the *Taxonomy manipulation->Fetch Taxonomic Ranks* generated the following taxonomy representation::
+
+    read1 2      root Eukaryota Metazoa n n Chordata   Craniata Gnathostomata Mammalia n        Laurasiatheria   n           Ruminantia  n             Bovidae     Bovinae      n          n          Bos        n Bos taurus        n
+    read2 12585	 root Eukaryota Metazoa n n Chordata   Craniata Gnathostomata Mammalia n        Euarchontoglires Primates	 Haplorrhini Hominoidea    Hominidae   n            n          n          Homo       n Homo sapiens      n 
+    read1 58615  root Eukaryota Metazoa n n Arthropoda n        Hexapoda      Insecta  Neoptera Amphiesmenoptera Lepidoptera Glossata    Papilionoidea Nymphalidae Nymphalinae  Melitaeini Phyciodina Anthanassa n Anthanassa otanes n 
+    read3 56785	 root Eukaryota Metazoa n n Chordata   Craniata Gnathostomata Mammalia n        Euarchontoglires Primates	 Haplorrhini Hominoidea    Hominidae   n            n          n          Homo       n Homo sapiens      n   
+
+Running this tool with the following parameters:
+
+  * *Select column with sequence id* set to **c1**
+  * *Select taxonomic ranks* with **order**, and **genus** checked
+  * *Output format* set to **Diagnostic read list**
+  
+will return::
+
+    read2 Primates order
+    read3 Primates order
+    read2 Homo     genus
+    read3 Homo     genus
+    
+Changing *Output format* set to **Number of diagnostic reads per taxonomic rank** will produce::
+
+    Primates 2       order
+    Homo     2       genus
+    
+.. class:: infomark
+
+Note that **read1** is omitted because it is non-unique: it hits Mammals and Insects at the same time.    
+
+--------
+
+.. class:: warningmark
+
+This tool omits "**n**" corresponding to ranks missing from NCBI taxonomy. In the above example *Home sapiens* contains the order name (Primates) while *Bos taurus* does not.
+
+
+</help>
+</tool>