annotate tools/protein_analysis/rxlr_motifs.py @ 14:6365217cd3de draft

Uploaded v0.2.3, adds unit tests for WoLF PSORT
author peterjc
date Thu, 25 Apr 2013 11:54:47 -0400
parents a290c6d4e658
children 7de64c8b258d
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Implements assorted RXLR motif methods from the literature
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * Protein FASTA filename
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * Number of threads
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * Model name (Bhattacharjee2006, Win2007, Whisson2007)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * Output tabular filename
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 The model names are:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 Bhattacharjee2006: Simple regular expression search for RXLR
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 with additional requirements for positioning and signal peptide.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 Win2007: Simple regular expression search for RXLR, but with
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16 different positional requirements.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 Whisson2007: As Bhattacharjee2006 but with a more complex regular
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 expression to look for RXLR-EER domain, and additionally calls HMMER.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 See the help text in the accompanying Galaxy tool XML file for more
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 details including the full references.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 Note:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0,
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 which is no longer available. The current release is SignalP v3.0
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 the predicted cleavage site, as this is expected to be more accurate.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 Also note that the HMM score values have changed from v2.0 to v3.0.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 Whisson et al. (2007) used SignalP v3.0 anyway.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 can still be used with hmmsearch from HMMER 3 this this does give
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 slightly different results. We expect the hmmsearch from HMMER 2.3.2
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 (the last stable release of HMMER 2) to be present on the path under
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 the name hmmsearch2 (allowing it to co-exist with HMMER 3).
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38 """
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 import os
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 import sys
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 import re
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 import subprocess
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 from seq_analysis_utils import stop_err, fasta_iterator
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 if len(sys.argv) != 5:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 stop_err("Requires four arguments: protein FASTA filename, threads, model, and output filename")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 fasta_file, threads, model, tabular_file = sys.argv[1:]
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 hmm_output_file = tabular_file + ".hmm.tmp"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 signalp_input_file = tabular_file + ".fasta.tmp"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 signalp_output_file = tabular_file + ".tabular.tmp"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 min_signalp_hmm = 0.9
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 hmmer_search = "hmmsearch2"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 if model == "Bhattacharjee2006":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 signalp_trunc = 70
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 re_rxlr = re.compile("R.LR")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 min_sp = 10
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 max_sp = 40
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60 max_sp_rxlr = 100
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 min_rxlr_start = 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62 #Allow signal peptide to be at most 40aa, and want RXLR to be
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 #within 100aa, therefore for the prescreen the max start is 140:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 max_rxlr_start = max_sp + max_sp_rxlr
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 elif model == "Win2007":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 signalp_trunc = 70
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 re_rxlr = re.compile("R.LR")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 min_sp = 10
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 max_sp = 40
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 min_rxlr_start = 30
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 max_rxlr_start = 60
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 #No explicit limit on separation of signal peptide clevage
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73 #and RXLR, but shortest signal peptide is 10, and furthest
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 #away RXLR is 60, so effectively limit is 50.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 max_sp_rxlr = max_rxlr_start - min_sp + 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 elif model == "Whisson2007":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 signalp_trunc = 0 #zero for no truncation
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 min_sp = 10
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 max_sp = 40
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 max_sp_rxlr = 100
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82 min_rxlr_start = 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 max_rxlr_start = max_sp + max_sp_rxlr
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 stop_err("Did not recognise the model name %r\n"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89 def get_hmmer_version(exe, required=None):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 cmd = "%s -h" % exe
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 try:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 child = subprocess.Popen([exe, "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 except OSError:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 raise ValueError("Could not run %s" % exe)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 stdout, stderr = child.communicate()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 if required:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 return required in stdout
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 elif "HMMER 2" in stdout:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 return 2
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 elif "HMMER 3" in stdout:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 return 3
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 raise ValueError("Could not determine version of %s" % exe)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 #Run hmmsearch for Whisson et al. (2007)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 if model == "Whisson2007":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 hmm_file = os.path.join(os.path.split(sys.argv[0])[0],
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 "whisson_et_al_rxlr_eer_cropped.hmm")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 if not os.path.isfile(hmm_file):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 stop_err("Missing HMM file for Whisson et al. (2007)")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 stop_err("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_searcher)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 #I've left the code to handle HMMER 3 in situ, in case
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 #we revisit the choice to insist on HMMER 2.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 hmmer3 = (3 == get_hmmer_version(hmmer_search))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 #Using zero (or 5.6?) for bitscore threshold
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 if hmmer3:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 #The HMMER3 table output is easy to parse
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 #In HMMER3 can't use both -T and -E
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" \
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 % (hmmer_search, hmm_output_file, hmm_file, fasta_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 #For HMMER2 we are stuck with parsing stdout
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 #Put 1e6 to effectively have no expectation threshold (otherwise
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126 #HMMER defaults to 10 and the calculated e-value depends on the
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 #input FASTA file, and we can loose hits of interest).
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 cmd = "%s -T 0 -E 1e6 %s %s > %s" \
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 % (hmmer_search, hmm_file, fasta_file, hmm_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
130 return_code = os.system(cmd)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
131 if return_code:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
132 stop_err("Error %i from hmmsearch:\n%s" % (return_code, cmd))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 hmm_hits = set()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
134 valid_ids = set()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
135 for title, seq in fasta_iterator(fasta_file):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
136 name = title.split(None,1)[0]
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
137 if name in valid_ids:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
138 stop_err("Duplicated identifier %r" % name)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
139 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
140 valid_ids.add(name)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
141 handle = open(hmm_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
142 for line in handle:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
143 if not line.strip():
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
144 #We expect blank lines in the HMMER2 stdout
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
145 continue
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
146 elif line.startswith("#"):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
147 #Header
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
148 continue
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
149 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
150 name = line.split(None,1)[0]
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
151 #Should be a sequence name in the HMMER3 table output.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
152 #Could be anything in the HMMER2 stdout.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
153 if name in valid_ids:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
154 hmm_hits.add(name)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
155 elif hmmer3:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
156 stop_err("Unexpected identifer %r in hmmsearch output" % name)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
157 handle.close()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
158 #if hmmer3:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
159 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
160 #else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
161 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
162 #print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
163 os.remove(hmm_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
164 del valid_ids
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
165
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
166
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
167 #Prepare short list of candidates containing RXLR to pass to SignalP
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
168 assert min_rxlr_start > 0, "Min value one, since zero based counting"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
169 count = 0
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
170 total = 0
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
171 handle = open(signalp_input_file, "w")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
172 for title, seq in fasta_iterator(fasta_file):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
173 total += 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
174 name = title.split(None,1)[0]
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
175 match = re_rxlr.search(seq[min_rxlr_start-1:].upper())
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
176 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
177 #This is a potential RXLR, depending on the SignalP results.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
178 #Might as well truncate the sequence now, makes the temp file smaller
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
179 if signalp_trunc:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
180 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc]))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
181 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
182 #Does it matter we don't line wrap?
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
183 handle.write(">%s\n%s\n" % (name, seq))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
184 count += 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
185 handle.close()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
186 #print "Running SignalP on %i/%i potentials." % (count, total)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
187
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
188
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
189 #Run SignalP (using our wrapper script to get multi-core support etc)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
190 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
191 if not os.path.isfile(signalp_script):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
192 stop_err("Error - missing signalp3.py script")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
193 cmd = "python %s euk %i %s %s %s" % (signalp_script, signalp_trunc, threads, signalp_input_file, signalp_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
194 return_code = os.system(cmd)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
195 if return_code:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
196 stop_err("Error %i from SignalP:\n%s" % (return_code, cmd))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
197 #print "SignalP done"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
198
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
199 def parse_signalp(filename):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
200 """Parse SignalP output, yield tuples of ID, HMM_Sprob_score and NN predicted signal peptide length.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
201
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
202 For signal peptide length we use NN_Ymax_pos (minus one).
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
203 """
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
204 handle = open(filename)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
205 line = handle.readline()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
206 assert line.startswith("#ID\t"), line
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
207 for line in handle:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
208 parts = line.rstrip("\t").split("\t")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
209 assert len(parts)==20, repr(line)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
210 yield parts[0], float(parts[18]), int(parts[5])-1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
211 handle.close()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
212
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
213
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
214 #Parse SignalP results and apply the strict RXLR criteria
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
215 total = 0
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
216 tally = dict()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
217 handle = open(tabular_file, "w")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
218 handle.write("#ID\t%s\n" % model)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
219 signalp_results = parse_signalp(signalp_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
220 for title, seq in fasta_iterator(fasta_file):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
221 total += 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
222 rxlr = "N"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
223 name = title.split(None,1)[0]
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
224 match = re_rxlr.search(seq[min_rxlr_start-1:].upper())
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
225 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
226 del match
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
227 #This was the criteria for calling SignalP,
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
228 #so it will be in the SignalP results.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
229 sp_id, sp_hmm_score, sp_nn_len = signalp_results.next()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
230 assert name == sp_id, "%s vs %s" % (name, sp_id)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
231 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
232 match = re_rxlr.search(seq[sp_nn_len:].upper())
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
233 if match and match.start() + 1 <= max_sp_rxlr: #1-based counting
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
234 rxlr_start = sp_nn_len + match.start() + 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
235 if min_rxlr_start <= rxlr_start <= max_rxlr_start:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
236 rxlr = "Y"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
237 if model == "Whisson2007":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
238 #Combine the signalp with regular expression heuristic and the HMM
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
239 if name in hmm_hits and rxlr == "N":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
240 rxlr = "hmm" #HMM only
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
241 elif rxlr == "N":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
242 rxlr = "neither" #Don't use N (no)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
243 elif name not in hmm_hits and rxlr == "Y":
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
244 rxlr = "re" #Heuristic only
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
245 #Now have a four way classifier: Y, hmm, re, neither
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
246 #and count is the number of Y results (both HMM and heuristic)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
247 handle.write("%s\t%s\n" % (name, rxlr))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
248 try:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
249 tally[rxlr] += 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
250 except KeyError:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
251 tally[rxlr] = 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
252 handle.close()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
253 assert sum(tally.values()) == total
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
254
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
255 #Check the iterator is finished
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
256 try:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
257 signalp_results.next()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
258 assert False, "Unexpected data in SignalP output"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
259 except StopIteration:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
260 pass
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
261
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
262 #Cleanup
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
263 os.remove(signalp_input_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
264 os.remove(signalp_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
265
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
266 #Short summary to stdout for Galaxy's info display
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
267 print "%s for %i sequences:" % (model, total)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
268 print ", ".join("%s = %i" % kv for kv in sorted(tally.iteritems()))