annotate tools/protein_analysis/rxlr_motifs.py @ 23:e1996f0f4e85 draft default tip

"v0.2.13 - Python 3 fix for raising StopIteration"
author peterjc
date Thu, 17 Jun 2021 17:59:33 +0000
parents 238eae32483c
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
2 """Implements assorted RXLR motif methods from the literature.
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * Protein FASTA filename
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * Number of threads
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * Model name (Bhattacharjee2006, Win2007, Whisson2007)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * Output tabular filename
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 The model names are:
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
11 * Bhattacharjee2006: Simple regular expression search for RXLR
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
12 with additional requirements for positioning and signal peptide.
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
13 * Win2007: Simple regular expression search for RXLR, but with
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
14 different positional requirements.
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
15 * Whisson2007: As Bhattacharjee2006 but with a more complex regular
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
16 expression to look for RXLR-EER domain, and additionally calls HMMER.
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 See the help text in the accompanying Galaxy tool XML file for more
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 details including the full references.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
21 Note
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
22 ----
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 Bhattacharjee et al. (2006) and Win et al. (2007) used SignalP v2.0,
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 which is no longer available. The current release is SignalP v3.0
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 (Mar 5, 2007). We have therefore opted to use the NN Ymax position for
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 the predicted cleavage site, as this is expected to be more accurate.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 Also note that the HMM score values have changed from v2.0 to v3.0.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 Whisson et al. (2007) used SignalP v3.0 anyway.
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 Whisson et al. (2007) used HMMER 2.3.2, and althought their HMM model
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
31 can still be used with hmmsearch from HMMER 3, sadly this does give
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 slightly different results. We expect the hmmsearch from HMMER 2.3.2
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 (the last stable release of HMMER 2) to be present on the path under
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 the name hmmsearch2 (allowing it to co-exist with HMMER 3).
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
35
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
36 If using Conda, you should therefore install the special "hmmer2"
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
37 package from BioConda which provides "hmmsearch2" etc::
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
38
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
39 conda install -c bioconda hmmer2
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
40
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
41 See https://bioconda.github.io/recipes/hmmer2/README.html and
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
42 https://anaconda.org/bioconda/hmmer2
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 """
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
44
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
45 from __future__ import print_function
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
46
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 import os
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 import re
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 import subprocess
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
50 import sys
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
51
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
52 from seq_analysis_utils import fasta_iterator
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
53
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
54 if "-v" in sys.argv:
23
e1996f0f4e85 "v0.2.13 - Python 3 fix for raising StopIteration"
peterjc
parents: 21
diff changeset
55 print("RXLR Motifs v0.0.17")
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
56 sys.exit(0)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 if len(sys.argv) != 5:
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
59 sys.exit(
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
60 "Requires four arguments: protein FASTA filename, threads, "
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
61 "model, and output filename"
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
62 )
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 fasta_file, threads, model, tabular_file = sys.argv[1:]
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 hmm_output_file = tabular_file + ".hmm.tmp"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 signalp_input_file = tabular_file + ".fasta.tmp"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 signalp_output_file = tabular_file + ".tabular.tmp"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 min_signalp_hmm = 0.9
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 hmmer_search = "hmmsearch2"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 if model == "Bhattacharjee2006":
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
72 signalp_trunc = 70
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
73 re_rxlr = re.compile("R.LR")
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
74 min_sp = 10
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
75 max_sp = 40
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
76 max_sp_rxlr = 100
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
77 min_rxlr_start = 1
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
78 # Allow signal peptide to be at most 40aa, and want RXLR to be
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
79 # within 100aa, therefore for the prescreen the max start is 140:
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
80 max_rxlr_start = max_sp + max_sp_rxlr
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 elif model == "Win2007":
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
82 signalp_trunc = 70
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
83 re_rxlr = re.compile("R.LR")
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
84 min_sp = 10
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
85 max_sp = 40
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
86 min_rxlr_start = 30
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
87 max_rxlr_start = 60
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
88 # No explicit limit on separation of signal peptide clevage
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
89 # and RXLR, but shortest signal peptide is 10, and furthest
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
90 # away RXLR is 60, so effectively limit is 50.
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
91 max_sp_rxlr = max_rxlr_start - min_sp + 1
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 elif model == "Whisson2007":
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
93 signalp_trunc = 0 # zero for no truncation
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
94 re_rxlr = re.compile("R.LR.{,40}[ED][ED][KR]")
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
95 min_sp = 10
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
96 max_sp = 40
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
97 max_sp_rxlr = 100
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
98 min_rxlr_start = 1
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
99 max_rxlr_start = max_sp + max_sp_rxlr
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 else:
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
101 sys.exit(
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
102 "Did not recognise the model name %r\n"
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
103 "Use Bhattacharjee2006, Win2007, or Whisson2007" % model
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
104 )
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 def get_hmmer_version(exe, required=None):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 try:
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
109 child = subprocess.Popen(
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
110 [exe, "-h"],
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
111 universal_newlines=True,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
112 stdout=subprocess.PIPE,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
113 stderr=subprocess.PIPE,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
114 )
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 except OSError:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 raise ValueError("Could not run %s" % exe)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 stdout, stderr = child.communicate()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 if required:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 return required in stdout
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 elif "HMMER 2" in stdout:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 return 2
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 elif "HMMER 3" in stdout:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 return 3
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 raise ValueError("Could not determine version of %s" % exe)
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
126
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
128 # Run hmmsearch for Whisson et al. (2007)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 if model == "Whisson2007":
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
130 hmm_file = os.path.join(
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
131 os.path.split(sys.argv[0])[0], "whisson_et_al_rxlr_eer_cropped.hmm"
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
132 )
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 if not os.path.isfile(hmm_file):
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
134 sys.exit("Missing HMM file for Whisson et al. (2007)")
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
135 if not get_hmmer_version(hmmer_search, "HMMER 2.3.2 (Oct 2003)"):
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
136 sys.exit("Missing HMMER 2.3.2 (Oct 2003) binary, %s" % hmmer_search)
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
137
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
138 hmm_hits = set()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
139 valid_ids = set()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
140 for title, seq in fasta_iterator(fasta_file):
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
141 name = title.split(None, 1)[0]
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
142 if name in valid_ids:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
143 sys.exit("Duplicated identifier %r" % name)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
144 else:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
145 valid_ids.add(name)
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
146 if not valid_ids:
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
147 # Special case, don't need to run HMMER if there are no sequences
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
148 pass
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
149 else:
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
150 # I've left the code to handle HMMER 3 in situ, in case
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
151 # we revisit the choice to insist on HMMER 2.
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
152 hmmer3 = 3 == get_hmmer_version(hmmer_search)
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
153 # Using zero (or 5.6?) for bitscore threshold
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
154 if hmmer3:
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
155 # The HMMER3 table output is easy to parse
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
156 # In HMMER3 can't use both -T and -E
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
157 cmd = "%s -T 0 --tblout %s --noali %s %s > /dev/null" % (
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
158 hmmer_search,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
159 hmm_output_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
160 hmm_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
161 fasta_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
162 )
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
163 else:
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
164 # For HMMER2 we are stuck with parsing stdout
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
165 # Put 1e6 to effectively have no expectation threshold (otherwise
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
166 # HMMER defaults to 10 and the calculated e-value depends on the
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
167 # input FASTA file, and we can loose hits of interest).
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
168 cmd = "%s -T 0 -E 1e6 %s %s > %s" % (
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
169 hmmer_search,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
170 hmm_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
171 fasta_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
172 hmm_output_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
173 )
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
174 return_code = os.system(cmd)
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
175 if return_code:
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
176 sys.stderr.write("Error %i from hmmsearch:\n%s\n" % (return_code, cmd))
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
177 sys.exit(return_code)
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
178
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
179 handle = open(hmm_output_file)
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
180 for line in handle:
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
181 if not line.strip():
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
182 # We expect blank lines in the HMMER2 stdout
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
183 continue
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
184 elif line.startswith("#"):
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
185 # Header
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
186 continue
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
187 else:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
188 name = line.split(None, 1)[0]
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
189 # Should be a sequence name in the HMMER3 table output.
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
190 # Could be anything in the HMMER2 stdout.
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
191 if name in valid_ids:
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
192 hmm_hits.add(name)
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
193 elif hmmer3:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
194 sys.exit("Unexpected identifer %r in hmmsearch output" % name)
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
195 handle.close()
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
196 # if hmmer3:
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
197 # print "HMMER3 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
198 # else:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
199 # print "HMMER2 hits for %i/%i" % (len(hmm_hits), len(valid_ids))
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
200 # print "%i/%i matched HMM" % (len(hmm_hits), len(valid_ids))
16
7de64c8b258d Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
peterjc
parents: 6
diff changeset
201 os.remove(hmm_output_file)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
202 del valid_ids
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
203
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
204
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
205 # Prepare short list of candidates containing RXLR to pass to SignalP
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
206 assert min_rxlr_start > 0, "Min value one, since zero based counting"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
207 count = 0
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
208 total = 0
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
209 handle = open(signalp_input_file, "w")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
210 for title, seq in fasta_iterator(fasta_file):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
211 total += 1
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
212 name = title.split(None, 1)[0]
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
213 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper())
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
214 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
215 # This is a potential RXLR, depending on the SignalP results.
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
216 # Might as well truncate the sequence now, makes the temp file smaller
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
217 if signalp_trunc:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
218 handle.write(">%s (truncated)\n%s\n" % (name, seq[:signalp_trunc]))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
219 else:
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
220 # Does it matter we don't line wrap?
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
221 handle.write(">%s\n%s\n" % (name, seq))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
222 count += 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
223 handle.close()
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
224 # print "Running SignalP on %i/%i potentials." % (count, total)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
225
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
226
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
227 # Run SignalP (using our wrapper script to get multi-core support etc)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
228 signalp_script = os.path.join(os.path.split(sys.argv[0])[0], "signalp3.py")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
229 if not os.path.isfile(signalp_script):
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
230 sys.exit("Error - missing signalp3.py script")
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
231 cmd = "python '%s' 'euk' '%i' '%s' '%s' '%s'" % (
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
232 signalp_script,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
233 signalp_trunc,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
234 threads,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
235 signalp_input_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
236 signalp_output_file,
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
237 )
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
238 return_code = os.system(cmd)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
239 if return_code:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
240 sys.exit("Error %i from SignalP:\n%s" % (return_code, cmd))
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
241 # print "SignalP done"
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
242
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
243
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
244 def parse_signalp(filename):
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
245 """Parse SignalP output, yield tuples of values.
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
246
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
247 Returns tuples of ID, HMM_Sprob_score and NN predicted signal
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
248 peptide length.
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
249
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
250 For signal peptide length we use NN_Ymax_pos (minus one).
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
251 """
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
252 handle = open(filename)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
253 line = handle.readline()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
254 assert line.startswith("#ID\t"), line
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
255 for line in handle:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
256 parts = line.rstrip("\t").split("\t")
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
257 assert len(parts) == 20, repr(line)
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
258 yield parts[0], float(parts[18]), int(parts[5]) - 1
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
259 handle.close()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
260
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
261
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
262 # Parse SignalP results and apply the strict RXLR criteria
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
263 total = 0
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
264 tally = {}
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
265 handle = open(tabular_file, "w")
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
266 handle.write("#ID\t%s\n" % model)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
267 signalp_results = parse_signalp(signalp_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
268 for title, seq in fasta_iterator(fasta_file):
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
269 total += 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
270 rxlr = "N"
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
271 name = title.split(None, 1)[0]
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
272 match = re_rxlr.search(seq[min_rxlr_start - 1 :].upper())
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
273 if match and min_rxlr_start - 1 + match.start() + 1 <= max_rxlr_start:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
274 del match
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
275 # This was the criteria for calling SignalP,
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
276 # so it will be in the SignalP results.
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
277 sp_id, sp_hmm_score, sp_nn_len = next(signalp_results)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
278 assert name == sp_id, "%s vs %s" % (name, sp_id)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
279 if sp_hmm_score >= min_signalp_hmm and min_sp <= sp_nn_len <= max_sp:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
280 match = re_rxlr.search(seq[sp_nn_len:].upper())
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
281 if match and match.start() + 1 <= max_sp_rxlr: # 1-based counting
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
282 rxlr_start = sp_nn_len + match.start() + 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
283 if min_rxlr_start <= rxlr_start <= max_rxlr_start:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
284 rxlr = "Y"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
285 if model == "Whisson2007":
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
286 # Combine the signalp with regular expression heuristic and the HMM
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
287 if name in hmm_hits and rxlr == "N":
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
288 rxlr = "hmm" # HMM only
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
289 elif rxlr == "N":
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
290 rxlr = "neither" # Don't use N (no)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
291 elif name not in hmm_hits and rxlr == "Y":
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
292 rxlr = "re" # Heuristic only
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
293 # Now have a four way classifier: Y, hmm, re, neither
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
294 # and count is the number of Y results (both HMM and heuristic)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
295 handle.write("%s\t%s\n" % (name, rxlr))
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
296 try:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
297 tally[rxlr] += 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
298 except KeyError:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
299 tally[rxlr] = 1
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
300 handle.close()
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
301 assert sum(tally.values()) == total
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
302
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
303 # Check the iterator is finished
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
304 try:
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
305 next(signalp_results)
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
306 assert False, "Unexpected data in SignalP output"
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
307 except StopIteration:
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
308 pass
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
309
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
310 # Cleanup
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
311 os.remove(signalp_input_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
312 os.remove(signalp_output_file)
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
313
18
eb6ac44d4b8e Suite v0.2.8, record Promoter 2 verion + misc internal updates
peterjc
parents: 16
diff changeset
314 # Short summary to stdout for Galaxy's info display
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
315 print("%s for %i sequences:" % (model, total))
21
238eae32483c "Check this is up to date with all 2020 changes (black etc)"
peterjc
parents: 20
diff changeset
316 print(", ".join("%s = %i" % kv for kv in sorted(tally.items())))