Galaxy |

Changeset 2:c96bef0643dc (2013-05-06)

Previous changeset 1:f1323a651777 (2013-05-06) Next changeset 3:6aae6bc0802d (2013-09-18)

Commit message:
Uploaded v0.0.3 again to revert upload of wrong file. Sigh.

added:
tools/plotting/venn_list.py
tools/plotting/venn_list.txt
tools/plotting/venn_list.xml

removed:
README.txt
repository_dependencies.xml
rxlr_venn_workflow.ga

diff -r f1323a651777 -r c96bef0643dc README.txt
--- a/README.txt Mon May 06 14:04:34 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

@@ -1,37 +0,0 @@
-This Tool Shed Repository contains a workflow for comparing three RXLR prediction methods with a Venn Diagram, and creates a FASTA file of any proteins passing all three methods.
-
-
-References
-==========
-
-Stephen C. Whisson, Petra C. Boevink, Lucy Moleleki, Anna O. Avrova, Juan G. Morales, Eleanor M. Gilroy, Miles R. Armstrong, Severine Grouffaud, Pieter van West, Sean Chapman, Ingo Hein, Ian K. Toth, Leighton Pritchard and Paul R. J. Birch A translocation signal for delivery of oomycete effector proteins into host plant cells. Nature 450:115-118, 2007.
-http://dx.doi.org/10.1038/nature06203
-
-Joe Win, William Morgan, Jorunn Bos, Ksenia V. Krasileva, Liliana M. Cano, Angela Chaparro-Garcia, Randa Ammar, Brian J. Staskawicz and Sophien Kamoun. Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes. The Plant Cell 19:2349-2369, 2007.
-http://dx.doi.org/10.1105/tpc.107.051037
-
-Souvik Bhattacharjee, N. Luisa Hiller, Konstantinos Liolios, Joe Win, Thirumala-Devi Kanneganti, Carolyn Young, Sophien Kamoun and Kasturi Haldar. The malarial host-targeting signal is conserved in the Irish potato famine pathogen. PLoS Pathogens, 2(5):e50, 2006.
-http://dx.doi.org/10.1371/journal.ppat.0020050
-
-
-Availability
-============
-
-This workflow is available on the main Galaxy Tool Shed:
-http://toolshed.g2.bx.psu.edu/view/peterjc/rxlr_venn_workflow
-
-Development is being done on github here:
-https://github.com/peterjc/picobio/tree/master/galaxy_workflows/rxlr_venn_workflow
-
-
-Dependencies
-============
-
-These dependencies should be resolved automatically via the Galaxy Tool Shed:
- * http://toolshed.g2.bx.psu.edu/view/peterjc/tmhmm_and_signalp
- * http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_id
- * http://toolshed.g2.bx.psu.edu/view/peterjc/venn_list
-
-However, at the time of writing those Galaxy tools have their own dependencies
-required for this workflow which require manual installation (SignalP v3.0,
-HMMER v2.0, and the R/Bioconductor package limma).

diff -r f1323a651777 -r c96bef0643dc repository_dependencies.xml
--- a/repository_dependencies.xml Mon May 06 14:04:34 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

@@ -1,9 +0,0 @@
-<?xml version="1.0"?>
-<repositories description="This requires my SignalP and TMHMM wrapers, and my FASTA filtering tool.">
-    
-    <repository toolshed="http://toolshed.g2.bx.psu.edu" name="tmhmm_and_signalp" owner="peterjc" changeset_revision="6abd809cefdd" />
-    
-    <repository toolshed="http://toolshed.g2.bx.psu.edu" name="seq_filter_by_id" owner="peterjc" changeset_revision="abdd608c869b" />
-    
-    <repository toolshed="http://toolshed.g2.bx.psu.edu" name="venn_list" owner="peterjc" changeset_revision="baf7031d470e" />
-</repositories>

diff -r f1323a651777 -r c96bef0643dc rxlr_venn_workflow.ga
--- a/rxlr_venn_workflow.ga Mon May 06 14:04:34 2013 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

[

b'@@ -1,471 +0,0 @@\n-{\n- "a_galaxy_workflow": "true", \n- "annotation": "", \n- "format-version": "0.1", \n- "name": "3 RXLR methods, Venn Diagram, overlap FASTA", \n- "steps": {\n- "0": {\n- "annotation": "", \n- "id": 0, \n- "input_connections": {}, \n- "inputs": [\n- {\n- "description": "", \n- "name": "Input Dataset"\n- }\n- ], \n- "name": "Input dataset", \n- "outputs": [], \n- "position": {\n- "left": 200, \n- "top": 363\n- }, \n- "tool_errors": null, \n- "tool_id": null, \n- "tool_state": "{\\"name\\": \\"Input Dataset\\"}", \n- "tool_version": null, \n- "type": "data_input", \n- "user_outputs": []\n- }, \n- "1": {\n- "annotation": "", \n- "id": 1, \n- "input_connections": {\n- "fasta_file": {\n- "id": 0, \n- "output_name": "output"\n- }\n- }, \n- "inputs": [], \n- "name": "RXLR Motifs", \n- "outputs": [\n- {\n- "name": "tabular_file", \n- "type": "tabular"\n- }\n- ], \n- "position": {\n- "left": 420, \n- "top": 363\n- }, \n- "post_job_actions": {\n- "HideDatasetActiontabular_file": {\n- "action_arguments": {}, \n- "action_type": "HideDatasetAction", \n- "output_name": "tabular_file"\n- }\n- }, \n- "tool_errors": null, \n- "tool_id": "rxlr_motifs", \n- "tool_state": "{\\"__page__\\": 0, \\"fasta_file\\": \\"null\\", \\"chromInfo\\": \\"\\\\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\\\\"\\", \\"model\\": \\"\\\\\\"Whisson2007\\\\\\"\\"}", \n- "tool_version": "0.0.6", \n- "type": "tool", \n- "user_outputs": []\n- }, \n- "2": {\n- "annotation": "", \n- "id": 2, \n- "input_connections": {\n- "fasta_file": {\n- "id": 0, \n- "output_name": "output"\n- }\n- }, \n- "inputs": [], \n- "name": "RXLR Motifs", \n- "outputs": [\n- {\n- "name": "tabular_file", \n- "type": "tabular"\n- }\n- ], \n- "position": {\n- "left": 420, \n- "top": 483\n- }, \n- "post_job_actions": {\n- "HideDatasetActiontabular_file": {\n- "action_arguments": {}, \n- "action_type": "HideDatasetAction", \n- "output_name": "tabular_file"\n- }\n- }, \n- "tool_errors": null, \n- "tool_id": "rxlr_motifs", \n- "tool_state": "{\\"__page__\\": 0, \\"fasta_file\\": \\"null\\", \\"chromInfo\\": \\"\\\\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\\\\"\\", \\"model\\": \\"\\\\\\"Win2007\\\\\\"\\"}", \n- "tool_version": "0.0.6", \n- "type": "tool", \n- "user_outputs": []\n- }, \n- "3": {\n- "annotation": "", \n- "id": 3, \n- "input_connections": {\n- "fasta_file": {\n- "id": 0, \n- "output_name": "output"\n- }\n- }, \n- "inputs": [], \n- "name": "RXLR Motifs", \n- "outputs": [\n- {\n- "name": "tabular_file", \n- "type": "tabular"\n- }\n- ], \n- "position": {\n- "left": 420, \n- "top": 603\n- }, \n- "post_job_actions": {\n- '..b'- "output_name": "out_file1"\n- }\n- }, \n- "inputs": [], \n- "name": "Filter sequences by ID", \n- "outputs": [\n- {\n- "name": "output_pos", \n- "type": "fasta"\n- }, \n- {\n- "name": "output_neg", \n- "type": "fasta"\n- }\n- ], \n- "position": {\n- "left": 1321, \n- "top": 621\n- }, \n- "post_job_actions": {\n- "HideDatasetActionoutput_neg": {\n- "action_arguments": {}, \n- "action_type": "HideDatasetAction", \n- "output_name": "output_neg"\n- }, \n- "HideDatasetActionoutput_pos": {\n- "action_arguments": {}, \n- "action_type": "HideDatasetAction", \n- "output_name": "output_pos"\n- }, \n- "RenameDatasetActionoutput_pos": {\n- "action_arguments": {\n- "newname": "Positive Whisson et al. (2007) and Win et al. (2007) results"\n- }, \n- "action_type": "RenameDatasetAction", \n- "output_name": "output_pos"\n- }\n- }, \n- "tool_errors": null, \n- "tool_id": "seq_filter_by_id", \n- "tool_state": "{\\"__page__\\": 0, \\"output_choice_cond\\": \\"{\\\\\\"output_choice\\\\\\": \\\\\\"pos\\\\\\", \\\\\\"__current_case__\\\\\\": 1}\\", \\"input_file\\": \\"null\\", \\"input_tabular\\": \\"null\\", \\"chromInfo\\": \\"\\\\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\\\\"\\", \\"columns\\": \\"{\\\\\\"__class__\\\\\\": \\\\\\"UnvalidatedValue\\\\\\", \\\\\\"value\\\\\\": [\\\\\\"1\\\\\\"]}\\"}", \n- "tool_version": "0.0.5", \n- "type": "tool", \n- "user_outputs": []\n- }, \n- "10": {\n- "annotation": "", \n- "id": 10, \n- "input_connections": {\n- "input_file": {\n- "id": 9, \n- "output_name": "output_pos"\n- }, \n- "input_tabular": {\n- "id": 6, \n- "output_name": "out_file1"\n- }\n- }, \n- "inputs": [], \n- "name": "Filter sequences by ID", \n- "outputs": [\n- {\n- "name": "output_pos", \n- "type": "fasta"\n- }, \n- {\n- "name": "output_neg", \n- "type": "fasta"\n- }\n- ], \n- "position": {\n- "left": 1641, \n- "top": 391\n- }, \n- "post_job_actions": {\n- "HideDatasetActionoutput_neg": {\n- "action_arguments": {}, \n- "action_type": "HideDatasetAction", \n- "output_name": "output_neg"\n- }, \n- "RenameDatasetActionoutput_pos": {\n- "action_arguments": {\n- "newname": "RXLR by all 3 methods"\n- }, \n- "action_type": "RenameDatasetAction", \n- "output_name": "output_pos"\n- }\n- }, \n- "tool_errors": null, \n- "tool_id": "seq_filter_by_id", \n- "tool_state": "{\\"__page__\\": 0, \\"output_choice_cond\\": \\"{\\\\\\"output_choice\\\\\\": \\\\\\"pos\\\\\\", \\\\\\"__current_case__\\\\\\": 1}\\", \\"input_file\\": \\"null\\", \\"input_tabular\\": \\"null\\", \\"chromInfo\\": \\"\\\\\\"/opt/galaxy-dist/tool-data/shared/ucsc/chrom/?.len\\\\\\"\\", \\"columns\\": \\"{\\\\\\"__class__\\\\\\": \\\\\\"UnvalidatedValue\\\\\\", \\\\\\"value\\\\\\": [\\\\\\"1\\\\\\"]}\\"}", \n- "tool_version": "0.0.5", \n- "type": "tool", \n- "user_outputs": []\n- }\n- }\n-}\n\\ No newline at end of file\n'

diff -r f1323a651777 -r c96bef0643dc tools/plotting/venn_list.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/plotting/venn_list.py Mon May 06 14:05:13 2013 -0400

[

@@ -0,0 +1,135 @@
+#!/usr/bin/env python
+"""Plot up to 3-way Venn Diagram using R limma vennDiagram (via rpy)
+
+This script is copyright 2010 by Peter Cock, The James Hutton Institute
+(formerly SCRI), UK. All rights reserved.
+See accompanying text file for licence details (MIT/BSD style).
+
+This is version 0.0.3 of the script.
+"""
+
+
+import sys
+import rpy
+
+def stop_err(msg, error_level=1):
+    """Print error message to stdout and quit with given error level."""
+    sys.stderr.write("%s\n" % msg)
+    sys.exit(error_level)
+
+try:
+    import rpy
+except ImportError:
+    stop_err("Requires the Python library rpy (to call R)")
+
+try:
+    rpy.r.library("limma")
+except:
+    stop_err("Requires the R library limma (for vennDiagram function)")
+
+
+if len(sys.argv)-1 not in [7, 10, 13]:
+    stop_err("Expected 7, 10 or 13 arguments (for 1, 2 or 3 sets), not %i" % (len(sys.argv)-1))
+
+all_file, all_type, all_label = sys.argv[1:4]
+set_data = []
+if len(sys.argv)-1 >= 7:
+    set_data.append(tuple(sys.argv[4:7]))
+if len(sys.argv)-1 >= 10:
+    set_data.append(tuple(sys.argv[7:10]))
+if len(sys.argv)-1 >= 13:
+    set_data.append(tuple(sys.argv[10:13]))
+pdf_file = sys.argv[-1]
+n = len(set_data)
+print "Doing %i-way Venn Diagram" % n
+
+def load_ids(filename, filetype):
+    if filetype=="tabular":
+        for line in open(filename):
+            if not line.startswith("#"):
+                yield line.rstrip("\n").split("\t",1)[0]
+    elif filetype=="fasta":
+        for line in open(filename):
+            if line.startswith(">"):
+                yield line[1:].rstrip("\n").split(None,1)[0]
+    elif filetype.startswith("fastq"):
+        #Use the Galaxy library not Biopython to cope with CS
+        from galaxy_utils.sequence.fastq import fastqReader
+        handle = open(filename, "rU")
+        for record in fastqReader(handle):
+            #The [1:] is because the fastaReader leaves the @ on the identifer.
+            yield record.identifier.split()[0][1:]
+        handle.close()
+    elif filetype=="sff":
+        try:
+            from Bio.SeqIO import index
+        except ImportError:
+            stop_err("Require Biopython 1.54 or later (to read SFF files)")
+        #This will read the SFF index block if present (very fast)
+        for name in index(filename, "sff"):
+            yield name
+    else:
+        stop_err("Unexpected file type %s" % filetype)
+
+def load_ids_whitelist(filename, filetype, whitelist):
+    for name in load_ids(filename, filetype):
+        if name in whitelist:
+            yield name
+        else:
+            stop_err("Unexpected ID %s in %s file %s" % (name, filetype, filename))
+
+if all_file in ["", "-", '""', '"-"']:
+    #Load without white list
+    sets = [set(load_ids(f,t)) for (f,t,c) in set_data]
+    #Take union
+    all = set()
+    for s in sets:
+        all.update(s)
+    print "Inferred total of %i IDs" % len(all)
+else:
+    all = set(load_ids(all_file, all_type))
+    print "Total of %i IDs" % len(all)
+    sets = [set(load_ids_whitelist(f,t,all)) for (f,t,c) in set_data]
+
+for s, (f,t,c) in zip(sets, set_data):
+    print "%i in %s" % (len(s), c)
+
+#Now call R library to draw simple Venn diagram
+try:
+    #Create dummy Venn diagram counts object for three groups
+    cols = 'c("%s")' % '","'.join("Set%i" % (i+1) for i in range(n))
+    rpy.r('groups <- cbind(%s)' % ','.join(['1']*n))
+    rpy.r('colnames(groups) <- %s' % cols)
+    rpy.r('vc <- vennCounts(groups)')
+    #Populate the 2^n classes with real counts
+    #Don't make any assumptions about the class order
+    #print rpy.r('vc')
+    for index, row in enumerate(rpy.r('vc[,%s]' % cols)):
+        if isinstance(row, int) or isinstance(row, float):
+            #Hack for rpy being too clever for single element row
+            row = [row]
+        names = all
+        for wanted, s in zip(row, sets):
+            if wanted:
+                names = names.intersection(s)
+            else:
+                names = names.difference(s)
+        rpy.r('vc[%i,"Counts"] <- %i' % (index+1, len(names)))
+    #print rpy.r('vc')
+    if n == 1:
+        #Single circle, don't need to add (Total XXX) line
+        names = [c for (t,f,c) in set_data]
+    else:
+        names = ["%s\n(Total %i)" % (c, len(s)) for s, (f,t,c) in zip(sets, set_data)]
+    rpy.r.assign("names", names)
+    rpy.r.assign("colors", ["red","green","blue"][:n])
+    rpy.r.pdf(pdf_file, 8, 8)
+    rpy.r("""vennDiagram(vc, include="both", names=names,
+                         main="%s", sub="(Total %i)",
+                         circle.col=colors)
+                         """ % (all_label, len(all)))
+    rpy.r.dev_off()
+except Exception, exc:
+    stop_err( "%s" %str( exc ) )
+rpy.r.quit( save="no" )
+print "Done"

diff -r f1323a651777 -r c96bef0643dc tools/plotting/venn_list.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/plotting/venn_list.txt Mon May 06 14:05:13 2013 -0400

@@ -0,0 +1,75 @@
+Galaxy tool to draw a Venn Diagram with up to 3 sets
+====================================================
+
+This tool is copyright 2011 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below.
+
+This tool is a short Python script (using both the Galaxy and Biopython library
+functions) to extract ID lists from tabular, FASTA, FASTQ or SFF files to build
+sets, which are then drawn using the R limma package function vennDiagram
+(called from Python using rpy).
+
+There are just two files to install:
+
+* venn_list.py (the Python script)
+* venn_list.xml (the Galaxy tool definition)
+
+The suggested location is in the Galaxy folder tools/plotting next to other
+graph drawing tools.
+
+You will also need to modify the tools_conf.xml file to tell Galaxy to offer the
+tool. The suggested location is in the "Graph/Display Data" section. Simply add
+the line:
+
+<tool file="plotting/venn_list.xml" />
+
+You will also need to install Biopython 1.54 or later, and the R/Bioconductor
+pacakge limma. You should already have rpy installed for other Galaxy tools.
+
+
+History
+=======
+
+v0.0.3 - Initial public release.
+
+
+Developers
+==========
+
+This script and related tools are being developed on the following hg branch:
+http://bitbucket.org/peterjc/galaxy-central/src/tools
+
+For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball use
+the following command from the Galaxy root folder:
+
+tar -czf venn_list.tar.gz tools/plotting/venn_list.*
+
+Check this worked:
+
+$ tar -tzf venn_list.tar.gz
+tools/plotting/venn_list.py
+tools/plotting/venn_list.txt
+tools/plotting/venn_list.xml
+
+
+Licence (MIT/BSD style)
+=======================
+
+Permission to use, copy, modify, and distribute this software and its
+documentation with or without modifications and for any purpose and
+without fee is hereby granted, provided that any copyright notices
+appear in all copies and that both those copyright notices and this
+permission notice appear in supporting documentation, and that the
+names of the contributors or copyright holders not be used in
+advertising or publicity pertaining to distribution of the software
+without specific prior permission.
+
+THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL
+WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE
+CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT
+OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
+OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
+OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
+OR PERFORMANCE OF THIS SOFTWARE.

diff -r f1323a651777 -r c96bef0643dc tools/plotting/venn_list.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/plotting/venn_list.xml Mon May 06 14:05:13 2013 -0400

@@ -0,0 +1,113 @@
+<tool id="venn_list" name="Venn Diagram" version="0.0.3">
+  <description>from lists</description>
+  <command interpreter="python">
+venn_list.py
+#if $universe.type_select=="implicit":
+  - -
+#else:
+  $main $main.ext
+#end if
+"$main_lab"
+#for $s in $sets:
+  $s.set $s.set.ext "$s.lab"
+#end for
+$PDF</command>
+  <inputs>
+    <param name="main_lab" size="30" type="text" value="Venn Diagram" label="Plot title"/>
+    <conditional name="universe">
+       <param name="type_select" type="select" label="Implicit or explicit full ID list?">
+         <option value="explicit">Explicit</option>
+         <option value="implicit">Implicit (use union of sets below)</option>
+       </param>
+       <when value="explicit">
+           <param name="main" type="data" format="tabular,fasta,fastq,sff" label="Full dataset (with all identifiers)" help="Tabular file (uses column one), FASTA, FASTQ or SFF file."/>
+       </when>
+       <when value="implicit"/>
+    </conditional>
+    <repeat name="sets" min="1" max="3" title="Sets">
+      <param name="set" type="data" format="tabular,fasta,fastq,sff" label="Members of set" help="Tabular file (uses column one), FASTA, FASTQ or SFF file."/>
+      <param name="lab" size="30" type="text" value="Group" label="Caption for set"/>
+    </repeat>
+  </inputs>
+  <outputs>
+    <data format="pdf" name="PDF" />
+  </outputs>
+  <requirements>
+    <requirement type="python-module">rpy</requirement>
+    <requirement type="python-module">Bio</requirement>
+  </requirements>
+  <tests>
+    
+    
+  </tests>
+  <help>
+
+.. class:: infomark
+
+**TIP:** If your data is in tabular files, the identifier is assumed to be in column one.
+
+**What it does**
+
+Draws Venn Diagram for one, two or three sets (as a PDF file).
+
+You must supply one, two or three sets of identifiers -- corresponding
+to one, two or three circles on the Venn Diagram.
+
+In general you should also give the full list of all the identifiers
+explicitly. This is used to calculate the number of identifers outside
+the circles (and check the identifiers in the other files match up).
+The full list can be omitted by implicitly taking the union of the
+category sets. In this case, the count outside the categories (circles)
+will always be zero.
+
+The identifiers can be taken from the first column of a tabular file
+(e.g. query names in BLAST tabular output, or signal peptide predictions
+after filtering, etc), or from a sequence file (FASTA, FASTQ, SFF).
+
+For example, you may have a set of NGS reads (as a FASTA, FASTQ or SFF
+file), and the results of several different read mappings (e.g. to
+different references) as tabular files (filtered to have just the mapped
+reads). You could then show the different mappings (and their overlaps)
+as a Venn Diagram, and the outside count would be the unmapped reads.
+
+**Citations**
+
+The Venn Diagrams are drawn using Gordon Smyth's limma package from
+R/Bioconductor, http://www.bioconductor.org/
+
+The R library is called from Python via rpy, http://rpy.sourceforge.net/
+
+This tool uses Biopython to read SFF files. If you use this tool with
+SFF files in scientific work leading to a publication, please cite the
+Biopython application note:
+
+Cock et al 2009. Biopython: freely available Python tools for computational
+molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
+http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
+
+  </help>
+</tool>