Mercurial > repos > peterjc > tmhmm_and_signalp

--- a/tools/protein_analysis/README	Tue Jun 07 18:06:27 2011 -0400
+++ b/tools/protein_analysis/README	Tue Jun 07 18:07:09 2011 -0400
@@ -7,6 +7,8 @@

 * WoLF PSORT v0.2 from http://wolfpsort.org/

+Also, the RXLR motif tool uses SignalP 3.0 and HMMER 2.3.2 internally.
+
 To use these Galaxy wrappers you must first install the command line tools.
 At the time of writing they are all free for academic use.

@@ -30,6 +32,10 @@
    directory, run runWolfPsortSummary, and then change back to the original
    directory), see: http://wolfpsort.org/WoLFPSORT_package/version0.2/

+4. Install hmmsearch from HMMER 2.3.2 (the last stable release of HMMER 2)
+   but put it on the path under the name hmmsearch2 (allowing it to co-exist
+   with HMMER 3), or edit rlxr_motif.py accordingly.
+
 Verify each of the tools is installed and working from the command line
 (when logged in at the Galaxy user if appropriate).

@@ -49,6 +55,9 @@
 wolf_psort.xml (Galaxy tool definition)
 wolf_psort.py (Python wrapper script)

+rxlr_motifs.xml (Galaxy tool definition)
+rxlr_motifs.py (Python script)
+
 seq_analysis_utils.py (shared Python code)
 README (optional)

@@ -60,6 +69,7 @@
     <tool file="protein_analysis/tmhmm2.xml" />
     <tool file="protein_analysis/signalp3.xml" />
     <tool file="protein_analysis/wolf_psort.xml" />
+    <tool file="protein_analysis/rxlr_motifs.xml" />
   </section>

    Leave out the lines for any tools you do not wish to use in Galaxy.
@@ -101,6 +111,7 @@
 v0.0.7 - Change SignalP default truncation from 60 to 70 to match the
          SignalP webservice.
 v0.0.8 - Added WoLF PSORT wrapper to the suite.
+v0.0.9 - Added our RXLR motifs tool to the suite.


 Developers
@@ -115,11 +126,11 @@
 For making the "Galaxy Tool Shed" http://community.g2.bx.psu.edu/ tarball use
 the following command from the Galaxy root folder:

-tar -czf tmhmm_signalp_wolfpsort.tar.gz tools/protein_analysis/LICENSE tools/protein_analysis/README tools/protein_analysis/suite_config.xml tools/protein_analysis/seq_analysis_utils.py tools/protein_analysis/signalp3.xml tools/protein_analysis/signalp3.py tools/protein_analysis/tmhmm2.xml tools/protein_analysis/tmhmm2.py tools/protein_analysis/wolf_psort.xml tools/protein_analysis/wolf_psort.py test-data/four_human_proteins.* test-data/empty.fasta test-data/empty_tmhmm2.tabular test-data/empty_signalp3.tabular
+tar -czf tmhmm_signalp_etc.tar.gz tools/protein_analysis/LICENSE tools/protein_analysis/README tools/protein_analysis/suite_config.xml tools/protein_analysis/seq_analysis_utils.py tools/protein_analysis/signalp3.xml tools/protein_analysis/signalp3.py tools/protein_analysis/tmhmm2.xml tools/protein_analysis/tmhmm2.py tools/protein_analysis/wolf_psort.xml tools/protein_analysis/wolf_psort.py tools/protein_analysis/rxlr_motifs.xml tools/protein_analysis/rxlr_motifs.py  test-data/four_human_proteins.* test-data/empty.fasta test-data/empty_tmhmm2.tabular test-data/empty_signalp3.tabular

 Check this worked:

-$ tar -tzf tmhmm_signalp_wolfpsort.tar.gz
+$ tar -tzf tmhmm_signalp_etc.tar.gz
 tools/protein_analysis/LICENSE
 tools/protein_analysis/README
 tools/protein_analysis/suite_config.xml
@@ -130,6 +141,8 @@
 tools/protein_analysis/tmhmm2.py
 tools/protein_analysis/wolf_psort.xml
 tools/protein_analysis/wolf_psort.py
+tools/protein_analysis/rxlr_motifs.xml
+tools/protein_analysis/rxrl_motifs.py
 test-data/four_human_proteins.fasta
 test-data/four_human_proteins.signalp3.tabular
 test-data/four_human_proteins.tmhmm2.tabular
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/protein_analysis/rxlr_motifs.py	Tue Jun 07 18:07:09 2011 -0400
@@ -0,0 +1,268 @@
+#!/usr/bin/env python
+"""Implements assorted RXLR motif methods from the literature
+
+This script takes exactly four command line arguments:
+ * Protein FASTA filename
+ * Number of threads
+ * Model name (Bhattacharjee2006, Win2007, Whisson2007)
+ * Output tabular filename
+
+The model names are:
+
+Bhattacharjee2006: Simple regular expression search for RXLR
+with additional requirements for positioning and signal peptide.
+
+Win2007: Simple regular expression search for RXLR, but with
+different positional requirements.
+
+Whisson2007: As Bhattacharjee2006 but with a more complex regular
+expression to look for RXLR-EER domain, and additionally calls HMMER.
+
+See the help text in the accompanying Galaxy tool XML file for more
+details including the full references.
+
+Note:
+
+Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0,
+which is no longer available. The current release is SignalP v3.0
+(Mar 5, 2007). We have therefore opted to use the NN Ymax position for
+the predicted cleavage site, as this is expected to be more accurate.
+Also note that the HMM score values have changed from v2.0 to v3.0.
+Whisson et al. (2007) used SignalP v3.0 anyway.
+
+Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model
+can still be used with hmmsearch from HMMER 3 this this does give
+slightly different results. We expect the hmmsearch from HMMER 2.3.2
+(the last stable release of HMMER 2) to be present on the path under
+the name hmmsearch2 (allowing it to co-exist with HMMER 3).
+"""
+import os
+import sys
+import re
+import subprocess
+from seq_analysis_utils import stop_err, fasta_iterator
+
+if len(sys.argv) != 5:
+   stop_err("Requires four arguments: protein FASTA filename, threads, model, and output filename")
+
+fasta_file, threads, model, tabular_file = sys.argv[1:]
+hmm_output_file = tabular_file + ".hmm.tmp"
+signalp_input_file = tabular_file + ".fasta.tmp"
+signalp_output_file = tabular_file + ".tabular.tmp"
+min_signalp_hmm = 0.9
+hmmer_search = "hmmsearch2"
+
+if model == "Bhattacharjee2006":
+   signalp_trunc = 70
+   re_rxlr = re.compile("R.LR")
+   min_sp = 10
+   max_sp = 40
+   max_sp_rxlr = 100
+   min_rxlr_start = 1
+   #Allow signal peptide to be at most 40aa, and want RXLR to be
+   #within 100aa, therefore for the prescreen the max start is 140:
+   max_rxlr_start = max_sp + max_sp_rxlr
+elif model == "Win2007":
+   signalp_trunc = 70
+   re_rxlr = re.compile("R.LR")
+   min_sp = 10
+   max_sp = 40
+   min_rxlr_start = 30
+   max_rxlr_start = 60
+   #No explicit limit on separation of signal peptide clevage
+   #and RXLR, but shortest signal peptide is 10, and furthest
+   #away RXLR is 60, so effectively limit is 50.
+   max_sp_rxlr = max_rxlr_start - min_sp + 1
+elif model == "Whisson2007":
+   signalp_trunc = 0 #zero for no truncation
+   re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]")
+   min_sp = 10
+   max_sp = 40
+   max_sp_rxlr = 100
+   min_rxlr_start = 1
+   max_rxlr_start = max_sp + max_sp_rxlr
+else:
+   stop_err("Did not recognise the model name %r\n"
+            "Use Bhattacharjee2006, Win2007, or Whisson2007" % model)
+
+
+def get_hmmer_version(exe, required=None):
+    cmd = "%s -h" % exe
+    try:
+        child = subprocess.Popen([exe, "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+    except OSError:
+        raise ValueError("Could not run %s" % exe)
+    stdout, stderr = child.communicate()
+    if required:
+        return required in stdout
+    elif "HMMER 2" in stdout:
+        return 2
+    elif "HMMER 3" in stdout:
+        return 3
+    else:
+        raise ValueError("Could not determine version of %s" % exe)
+
+
+#Run hmmsearch for Whisson et al. (2007)
+if model == "Whisson2007":
+    hmm_file = os.path.join(os.path.split(sys.argv[0])[0],
+                       "whisson_et_al_rxlr_eer_cropped.hmm")
+    if not os.path.isfile(hmm_file):
+        stop_err("Missing HMM file for Whisson et al. (2007)")
+    if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"):
+        stop_err("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_searcher)
+    #I've left the code to handle HMMER 3 in situ, in case
+    #we revisit the choice to insist on HMMER 2.
+    hmmer3 = (3 == get_hmmer_version(hmmer_search))
+    #Using zero (or 5.6?) for bitscore threshold
+    if hmmer3:
+        #The HMMER3 table output is easy to parse
+        #In HMMER3 can't use both -T and -E
+        cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" \
+              % (hmmer_search, hmm_output_file, hmm_file, fasta_file)
+    else:
+        #For HMMER2 we are stuck with parsing stdout
+        #Put 1e6 to effectively have no expectation threshold (otherwise
+        #HMMER defaults to 10 and the calculated e-value depends on the
+        #input FASTA file, and we can loose hits of interest).
+        cmd = "%s -T 0 -E 1e6 %s %s > %s" \
+              % (hmmer_search, hmm_file, fasta_file, hmm_output_file)
+    return_code = os.system(cmd)
+    if return_code:
+        stop_err("Error %i from hmmsearch:\n%s" % (return_code, cmd))
+    hmm_hits = set()
+    valid_ids = set()
+    for title, seq in fasta_iterator(fasta_file):
+        name = title.split(None,1)[0]
+        if name in valid_ids:
+            stop_err("Duplicated identifier %r" % name)
+        else:
+            valid_ids.add(name)
+    handle = open(hmm_output_file)
+    for line in handle:
+        if not line.strip():
+            #We expect blank lines in the HMMER2 stdout
+            continue
+        elif line.startswith("#"):
+            #Header
+            continue
+        else:
+            name = line.split(None,1)[0]
+            #Should be a sequence name in the HMMER3 table output.
+            #Could be anything in the HMMER2 stdout.
+            if name in valid_ids:
+                hmm_hits.add(name)
+            elif hmmer3:
+                stop_err("Unexpected identifer %r in hmmsearch output" % name)
+    handle.close()
+    #if hmmer3:
+    #    print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
+    #else:
+    #    print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
+    #print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids))
+    os.remove(hmm_output_file)
+    del valid_ids
+
+
+#Prepare short list of candidates containing RXLR to pass to SignalP
+assert min_rxlr_start > 0, "Min value one, since zero based counting"
+count = 0
+total = 0
+handle = open(signalp_input_file, "w")
+for title, seq in fasta_iterator(fasta_file):
+    total += 1
+    name = title.split(None,1)[0]
+    match = re_rxlr.search(seq[min_rxlr_start-1:].upper())
+    if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
+        #This is a potential RXLR, depending on the SignalP results.
+        #Might as well truncate the sequence now, makes the temp file smaller
+        if signalp_trunc:
+            handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc]))
+        else:
+            #Does it matter we don't line wrap?
+            handle.write(">%s\n%s\n" % (name, seq))
+        count += 1
+handle.close()
+#print "Running SignalP on %i/%i potentials." % (count, total)
+
+
+#Run SignalP (using our wrapper script to get multi-core support etc)
+signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py")
+if not os.path.isfile(signalp_script):
+    stop_err("Error - missing signalp3.py script")
+cmd = "python %s euk %i %s %s %s" % (signalp_script, signalp_trunc, threads, signalp_input_file, signalp_output_file)
+return_code = os.system(cmd)
+if return_code:
+    stop_err("Error %i from SignalP:\n%s" % (return_code, cmd))
+#print "SignalP done"
+
+def parse_signalp(filename):
+    """Parse SignalP output, yield tuples of ID, HMM_Sprob_score and NN predicted signal peptide length.
+
+    For signal peptide length we use NN_Ymax_pos (minus one).
+    """
+    handle = open(filename)
+    line = handle.readline()
+    assert line.startswith("#ID\t"), line
+    for line in handle:
+        parts = line.rstrip("\t").split("\t")
+        assert len(parts)==20, repr(line)
+        yield parts[0], float(parts[18]), int(parts[5])-1
+    handle.close()
+
+
+#Parse SignalP results and apply the strict RXLR criteria
+total = 0
+tally = dict()
+handle = open(tabular_file, "w")
+handle.write("#ID\t%s\n" % model)
+signalp_results = parse_signalp(signalp_output_file)
+for title, seq in fasta_iterator(fasta_file):
+    total += 1
+    rxlr = "N"
+    name = title.split(None,1)[0]
+    match = re_rxlr.search(seq[min_rxlr_start-1:].upper())
+    if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
+        del match
+        #This was the criteria for calling SignalP,
+        #so it will be in the SignalP results.
+        sp_id, sp_hmm_score, sp_nn_len = signalp_results.next()
+        assert name == sp_id, "%s vs %s" % (name, sp_id)
+        if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp:
+            match = re_rxlr.search(seq[sp_nn_len:].upper())
+            if match and match.start() + 1 <= max_sp_rxlr: #1-based counting
+                rxlr_start = sp_nn_len + match.start() + 1
+                if min_rxlr_start <= rxlr_start <= max_rxlr_start:
+                    rxlr = "Y"
+    if model == "Whisson2007":
+        #Combine the signalp with regular expression heuristic and the HMM
+        if name in hmm_hits and rxlr == "N":
+            rxlr = "hmm" #HMM only
+        elif rxlr == "N":
+            rxlr = "neither" #Don't use N (no)
+        elif name not in hmm_hits and rxlr == "Y":
+            rxlr = "re" #Heuristic only
+        #Now have a four way classifier: Y, hmm, re, neither
+        #and count is the number of Y results (both HMM and heuristic)
+    handle.write("%s\t%s\n" % (name, rxlr))
+    try:
+        tally[rxlr] += 1
+    except KeyError:
+        tally[rxlr] = 1
+handle.close()
+assert sum(tally.values()) == total
+
+#Check the iterator is finished
+try:
+    signalp_results.next()
+    assert False, "Unexpected data in SignalP output"
+except StopIteration:
+    pass
+
+#Cleanup
+os.remove(signalp_input_file)
+os.remove(signalp_output_file)
+
+#Short summary to stdout for Galaxy's info display
+print "%s for %i sequences:" % (model, total)
+print ", ".join("%s = %i" % kv for kv in sorted(tally.iteritems()))
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/protein_analysis/rxlr_motifs.xml	Tue Jun 07 18:07:09 2011 -0400
@@ -0,0 +1,149 @@
+<tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.5">
+    <description>Find RXLR Effectors of Plant Pathogenic Oomycetes</description>
+    <command interpreter="python">
+      rxlr_motifs.py $fasta_file 8 $model $tabular_file
+      ##I want the number of threads to be a Galaxy config option...
+    </command>
+    <inputs>
+        <param name="fasta_file" type="data" format="fasta" label="FASTA file of protein sequences" />
+        <param name="model" type="select" label="Which RXLR model?">
+            <option value="Bhattacharjee2006">Bhattacharjee et al. (2006) RXLR</option>
+            <option value="Win2007">Win et al. (2007) RXLR</option>
+            <option value="Whisson2007" selected="True">Whisson et al. (2007) RXLR-EER with HMM</option>
+        </param>
+    </inputs>
+    <outputs>
+        <data name="tabular_file" format="tabular" label="$model.value_label" />
+    </outputs>
+    <requirements>
+        <!-- Need SignalP for all the models -->
+        <requirement type="binary">signalp</requirement>
+        <!-- Need HMMER for Whisson et al. (2007) -->
+        <requirement type="binary">hmmsearch</requirement>
+    </requirements>
+    <tests>
+        <test>
+            <param name="fasta_file" value="rxlr_win_et_al_2007.fasta" ftype="fasta" />
+            <param name="model" value="Win2007" />
+            <output name="tabular_file" file="rxlr_win_et_al_2007.tabular" ftype="tabular" />
+        </test>
+    </tests>
+    <help>
+
+**Background**
+
+Many effector proteins from Oomycete plant pathogens for manipulating the host
+have been found to contain a signal peptide followed by a conserved RXLR motif
+(Arg, any amino acid, Leu, Arg), and then sometimes EER (Glu, Glu, Arg). There
+are stiking parallels with the malarial host-targeting signal (Plasmodium
+export element, or "Pexel" for short).
+
+-----
+
+**What it does**
+
+Takes a protein sequence FASTA file as input, and produces a simple tabular
+file as output with one line per protein, and two columns giving the sequence
+ID and the predicted class. This is typically just whether or not it had the
+selected RXLR motif (Y or N).
+
+-----
+
+**Bhattacharjee et al. (2006) RXLR Model**
+
+Looks for the oomycete motif RXLR as described in Bhattacharjee et al. (2006).
+
+Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9,
+a SignalP Neural Network (NN) predicted clevage site giving a signal peptide
+length between 10 and 40 amino acids inclusive, and the RXLR pattern must be
+after but within 100 amino acids of the clevage site.
+SignalP is run truncating the sequences to the first 70 amino acids, which was
+the default on the SignalP webservice used in Bhattacharjee et al. (2006).
+
+
+**Win et al. (2007) RXLR Model**
+
+Looks for the protein motif RXLR as described in Win et al. (2007).
+
+Matches must have a SignalP Hidden Markov Model (HMM) score of at least 0.9,
+a SignalP Neural Network (NN) predicted clevage site giving a signal peptide
+length between 10 and 40 amino acids inclusive, and the RXLR pattern must be
+after the clevage site and start between amino acids 30 and 60.
+SignalP is run truncating the sequences to the first 70 amino acids, to match
+the methodology of Torto et al. (2003) followed in Win et al. (2007).
+
+
+**Whisson et al. (2007) RXLR-EER with HMM**
+
+Looks for the protein motif RXLR-EER using the heuristic regular expression
+methodolgy, which was an extension of the Bhattacharjee et al. (2006) model,
+and a HMM as described in Whisson et al. (2007).
+
+All the requirements described above for Bhattacharjee et al. (2006) apply,
+but rather than just looking for RXLR with the regular expression R.LR the
+more complicated regular expression R.LR.{,40}[ED][ED][KR] is used. This means
+RXLR (Arg, any amino acid, Leu, Arg), then a stretch of up to forty amino
+acids before Glu/Asp, Glu/Asp, Lys/Arg. The EER part of the name is perhaps
+misleading as it also allows for DDR, EEK, and so on.
+
+Unlike Bhattacharjee et al. (2006) which used the SignalP webservice which
+defaults to truncating the sequences at 70 amino acids, Whisson et al. (2007)
+used the SignalP 3.0 command line tool with its default of not truncating the
+sequences. This does alter some of the scores, and also takes a little longer.
+
+Additionally HMMER 2.3.2 is run to look for a cross validated HMM for the
+RXLR-ERR domain based on known positive examples. There are no restrictions
+on where within the protein the HMM match must be found.
+
+The output of this model has four classes:
+ * Y = Yes, both the heuristic motif and HMM were found.
+ * re = Only the heuristic SignalP with regular expression motif was found.
+ * hmm = Only the HMM was found.
+ * neither = Niether the heuristic motif nor HMM was found.
+
+-----
+
+**Note**
+
+Both Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0, which
+is no longer available. The current release is SignalP v3.0 (Mar 5, 2007), so
+this is used instead. SignalP is called with the Eukaryote model and the short
+output (one line per protein). Any sequence truncation (e.g. to 70 amino acids)
+is handled via the intemediate sequence files.
+
+-----
+
+**References**
+
+Stephen C. Whisson, Petra C. Boevink, Lucy Moleleki, Anna O. Avrova, Juan G. Morales, Eleanor M. Gilroy, Miles R. Armstrong, Severine Grouffaud, Pieter van West, Sean Chapman, Ingo Hein, Ian K. Toth, Leighton Pritchard and Paul R. J. Birch
+A translocation signal for delivery of oomycete effector proteins into host plant cells.
+Nature 450:115-118, 2007.
+http://dx.doi.org/10.1038/nature06203
+
+Joe Win, William Morgan, Jorunn Bos, Ksenia V. Krasileva, Liliana M. Cano, Angela Chaparro-Garcia, Randa Ammar, Brian J. Staskawicz and Sophien Kamoun.
+Adaptive evolution has targeted the C-terminal domain of the RXLR effectors of plant pathogenic oomycetes.
+The Plant Cell 19:2349-2369, 2007.
+http://dx.doi.org/10.1105/tpc.107.051037
+
+Souvik Bhattacharjee, N. Luisa Hiller, Konstantinos Liolios, Joe Win, Thirumala-Devi Kanneganti, Carolyn Young, Sophien Kamoun and Kasturi Haldar.
+The malarial host-targeting signal is conserved in the Irish potato famine pathogen.
+PLoS Pathogens, 2(5):e50, 2006.
+http://dx.doi.org/10.1371/journal.ppat.0020050
+
+Trudy A. Torto, Shuang Li, Allison Styer, Edgar Huitema, Antonino Testa, Neil A.R. Gow, Pieter van West and Sophien Kamoun.
+EST mining and functional expression assays identify extracellular effector proteins from the plant pathogen *phytophthora*.
+Genome Research, 13:1675-1685, 2003.
+http://dx.doi.org/10.1101/gr.910003
+
+Sean R. Eddy.
+Profile hidden Markov models.
+Bioinformatics, 14(9):755–763, 1998
+http://dx.doi.org/10.1093/bioinformatics/14.9.755
+
+Nielsen, Engelbrecht, Brunak and von Heijne.
+Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.
+Protein Engineering, 10:1-6, 1997.
+http://dx.doi.org/10.1093/protein/10.1.1
+
+    </help>
+</tool>
--- a/tools/protein_analysis/seq_analysis_utils.py	Tue Jun 07 18:06:27 2011 -0400
+++ b/tools/protein_analysis/seq_analysis_utils.py	Tue Jun 07 18:07:09 2011 -0400
@@ -102,7 +102,7 @@
         raise err
     return files

-def run_jobs(jobs, threads, verbose=False):
+def run_jobs(jobs, threads, pause=10, verbose=False):
     """Takes list of cmd strings, returns dict with error levels."""
     pending = jobs[:]
     running = []
@@ -126,7 +126,7 @@
             process = subprocess.Popen(cmd, shell=True)
             running.append((cmd, process))
         #Loop...
-        sleep(10)
+        sleep(pause)
     if verbose:
         print "%i jobs completed" % len(results)
     assert set(jobs) == set(results)
--- a/tools/protein_analysis/signalp3.xml	Tue Jun 07 18:06:27 2011 -0400
+++ b/tools/protein_analysis/signalp3.xml	Tue Jun 07 18:07:09 2011 -0400
@@ -1,4 +1,4 @@
-<tool id="signalp3" name="SignalP 3.0" version="0.0.7">
+<tool id="signalp3" name="SignalP 3.0" version="0.0.8">
     <description>Find signal peptides in protein sequences</description>
     <command interpreter="python">
       signalp3.py $organism $truncate 8 $fasta_file $tabular_file
@@ -124,15 +124,18 @@
 Bendtsen, Nielsen, von Heijne, and Brunak.
 Improved prediction of signal peptides: SignalP 3.0.
 J. Mol. Biol., 340:783-795, 2004.
+http://dx.doi.org/10.1016/j.jmb.2004.05.028

 Nielsen, Engelbrecht, Brunak and von Heijne.
 Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.
 Protein Engineering, 10:1-6, 1997.
+http://dx.doi.org/10.1093/protein/10.1.1

 Nielsen and Krogh.
 Prediction of signal peptides and signal anchors by a hidden Markov model.
 Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (ISMB 6),
 AAAI Press, Menlo Park, California, pp. 122-130, 1998.
+http://www.ncbi.nlm.nih.gov/pubmed/9783217

 http://www.cbs.dtu.dk/services/SignalP-3.0/output.php
--- a/tools/protein_analysis/suite_config.xml	Tue Jun 07 18:06:27 2011 -0400
+++ b/tools/protein_analysis/suite_config.xml	Tue Jun 07 18:07:09 2011 -0400
@@ -1,12 +1,15 @@
-    <suite id="tmhmm_and_signalp" name="TMHMM, SignalP, WoLF PSORT" version="0.0.8">
-        <description>Wrappers for TMHMM, SignalP and WoLF PSORT</description>
-        <tool id="tmhmm2" name="TMHMM 2.0" version="0.0.6">
+    <suite id="tmhmm_and_signalp" name="TMHMM, SignalP, WoLF PSORT" version="0.0.9">
+        <description>TMHMM, SignalP, RXLR motifs, WoLF PSORT</description>
+        <tool id="tmhmm2" name="TMHMM 2.0" version="0.0.7">
             <description>Find transmembrane domains in protein sequences</description>
         </tool>
-        <tool id="signalp3" name="SignalP 3.0" version="0.0.7">
+        <tool id="signalp3" name="SignalP 3.0" version="0.0.8">
             <description>Find signal peptides in protein sequences</description>
         </tool>
         <tool id="wolf_psort" name="WoLF PSORT" version="0.0.1">
             <description>Eukaryote protein subcellular localization prediction</description>
         </tool>
+        <tool id="rxlr_motifs" name="RXLR Motifs" version="0.0.5">
+            <description>Find RXLR Effectors of Plant Pathogenic Oomycetes</description>
+        </tool>
     </suite>
--- a/tools/protein_analysis/tmhmm2.xml	Tue Jun 07 18:06:27 2011 -0400
+++ b/tools/protein_analysis/tmhmm2.xml	Tue Jun 07 18:07:09 2011 -0400
@@ -1,4 +1,4 @@
-<tool id="tmhmm2" name="TMHMM 2.0" version="0.0.6">
+<tool id="tmhmm2" name="TMHMM 2.0" version="0.0.7">
     <description>Find transmembrane domains in protein sequences</description>
     <command interpreter="python">
       tmhmm2.py 8 $fasta_file $tabular_file
@@ -76,10 +76,14 @@
 Krogh, Larsson, von Heijne, and Sonnhammer.
 Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes.
 J. Mol. Biol. 305:567-580, 2001.
+http://dx.doi.org/10.1006/jmbi.2000.4315

 Sonnhammer, von Heijne, and Krogh.
 A hidden Markov model for predicting transmembrane helices in protein sequences.
 In J. Glasgow et al., eds.: Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology, pages 175-182. AAAI Press, 1998.
+http://www.ncbi.nlm.nih.gov/pubmed/9783223
+
+http://www.cbs.dtu.dk/services/TMHMM/

     </help>
 </tool>
--- a/tools/protein_analysis/wolf_psort.py	Tue Jun 07 18:06:27 2011 -0400
+++ b/tools/protein_analysis/wolf_psort.py	Tue Jun 07 18:07:09 2011 -0400
@@ -41,7 +41,8 @@
 exe = "runWolfPsortSummary"

 """
-Note: I had trouble getting runWolfPsortSummary on the path, so used a wrapper
+Note: I had trouble getting runWolfPsortSummary on the path (via a link, other
+than by including all of /opt/WoLFPSORT_package_v0.2/bin , so used a wrapper
 python script called runWolfPsortSummary as follows:

 #!/usr/bin/env python
@@ -69,7 +70,7 @@
 except:
    num_threads = 0
 if num_threads < 1:
-   stop_err("Threads argument %s is not a positive integer" % sys.argv[3])
+   stop_err("Threads argument %s is not a positive integer" % sys.argv[2])

 fasta_file = sys.argv[3]
--- a/tools/protein_analysis/wolf_psort.xml	Tue Jun 07 18:06:27 2011 -0400
+++ b/tools/protein_analysis/wolf_psort.xml	Tue Jun 07 18:07:09 2011 -0400
@@ -90,7 +90,8 @@

 Paul Horton, Keun-Joon Park, Takeshi Obayashi, Naoya Fujita, Hajime Harada, C.J. Adams-Collier, and Kenta Nakai,
 WoLF PSORT: Protein Localization Predictor.
-Nucleic Acids Research, 35(S2), W585-W587, doi:10.1093/nar/gkm259, 2007.
+Nucleic Acids Research, 35(S2), W585-W587, 2007.
+http://dx.doi.org/10.1093/nar/gkm259

 Paul Horton, Keun-Joon Park, Takeshi Obayashi and Kenta Nakai.
 Protein Subcellular Localization Prediction with WoLF PSORT.