Mercurial > repos > peterjc > tmhmm_and_signalp
annotate tools/protein_analysis/wolf_psort.py @ 20:a19b3ded8f33 draft
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
author | peterjc |
---|---|
date | Thu, 21 Sep 2017 11:35:20 -0400 |
parents | f3ecd80850e2 |
children | 238eae32483c |
rev | line source |
---|---|
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
1 #!/usr/bin/env python |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 """Wrapper for WoLF PSORT v0.2 for use in Galaxy. |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
3 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
4 This script takes exactly four command line arguments: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
5 * the organism type (animal, plant or fungi) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
6 * number of threads to use (integer) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
7 * an input protein FASTA filename |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 * output tabular filename. |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
9 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
10 It then calls the standalone WoLF PSORT v0.2 program runWolfPsortSummary |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
11 (not the webservice), and coverts the output from something like this: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
12 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
13 # k used for kNN is: 27 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
14 gi|301087619|ref|XP_002894699.1| extr 12, mito 4, E.R. 3, golg 3, mito_nucl 3 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
15 gi|301087623|ref|XP_002894700.1| extr 21, mito 2, cyto 2, cyto_mito 2 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
17 In order to make it easier to use in Galaxy, this wrapper script reformats |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
18 this to use tab separators, with one line per compartment prediction: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
19 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 #ID Compartment Score Rank |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 gi|301087619|ref|XP_002894699.1| extr 12 1 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 gi|301087619|ref|XP_002894699.1| mito 4 2 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 gi|301087619|ref|XP_002894699.1| E.R. 3 3 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
24 gi|301087619|ref|XP_002894699.1| golg 3 4 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 gi|301087619|ref|XP_002894699.1| mito_nucl 3 5 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 gi|301087623|ref|XP_002894700.1| extr 21 1 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 gi|301087623|ref|XP_002894700.1| mito 2 2 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 gi|301087623|ref|XP_002894700.1| cyto 2 3 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 gi|301087623|ref|XP_002894700.1| cyto_mito 2 4 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 Additionally in order to take full advantage of multiple cores, by subdividing |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 the input FASTA file multiple copies of WoLF PSORT are run in parallel. I would |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 normally use Python's multiprocessing library in this situation but it requires |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 at least Python 2.6 and at the time of writing Galaxy still supports Python 2.4. |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
35 """ |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
36 |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
37 from __future__ import print_function |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
38 |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
39 import os |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
40 import sys |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
41 |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
42 from seq_analysis_utils import run_jobs, split_fasta, thread_count |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
44 FASTA_CHUNK = 500 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
45 exe = "runWolfPsortSummary" |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
46 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 """ |
9 | 48 Note: I had trouble getting runWolfPsortSummary on the path (via a link), other |
6
a290c6d4e658
Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents:
5
diff
changeset
|
49 than by including all of /opt/WoLFPSORT_package_v0.2/bin , so used a wrapper |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
50 python script called runWolfPsortSummary as follows: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
51 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
52 #!/usr/bin/env python |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
53 #Wrapper script to call WoLF PSORT from its own directory. |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
54 import os |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
55 import sys |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
56 import subprocess |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
57 saved_dir = os.path.abspath(os.curdir) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
58 os.chdir("/opt/WoLFPSORT_package_v0.2/bin") |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
59 args = ["./runWolfPsortSummary"] + sys.argv[1:] |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
60 return_code = subprocess.call(args) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
61 os.chdir(saved_dir) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
62 sys.exit(return_code) |
19 | 63 |
64 For more details on this workaround, see: | |
65 https://lists.galaxyproject.org/pipermail/galaxy-dev/2015-December/023386.html | |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 """ |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
67 |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
68 if "-v" in sys.argv or "--version" in sys.argv: |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
69 sys.exit("WoLF-PSORT wrapper version 0.0.11") |
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
70 |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
71 if len(sys.argv) != 5: |
19 | 72 sys.exit("Require four arguments, organism, threads, input protein FASTA file & output tabular file") |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
73 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
74 organism = sys.argv[1] |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
75 if organism not in ["animal", "plant", "fungi"]: |
19 | 76 sys.exit("Organism argument %s is not one of animal, plant, fungi" % organism) |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
77 |
9 | 78 num_threads = thread_count(sys.argv[2], default=4) |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
79 fasta_file = sys.argv[3] |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
80 tabular_file = sys.argv[4] |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
81 |
19 | 82 |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
83 def clean_tabular(raw_handle, out_handle): |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
84 """Clean up WoLF PSORT output to make it tabular.""" |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
85 for line in raw_handle: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
86 if not line or line.startswith("#"): |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
87 continue |
19 | 88 name, data = line.rstrip("\r\n").split(None, 1) |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
89 for rank, comp_data in enumerate(data.split(",")): |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
90 comp, score = comp_data.split() |
19 | 91 out_handle.write("%s\t%s\t%s\t%i\n" |
92 % (name, comp, score, rank + 1)) | |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
93 |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
94 |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
95 fasta_files = split_fasta(fasta_file, tabular_file, n=FASTA_CHUNK) |
19 | 96 temp_files = [f + ".out" for f in fasta_files] |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
97 assert len(fasta_files) == len(temp_files) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
98 jobs = ["%s %s < %s > %s" % (exe, organism, fasta, temp) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
99 for (fasta, temp) in zip(fasta_files, temp_files)] |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
100 assert len(fasta_files) == len(temp_files) == len(jobs) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
101 |
19 | 102 |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
103 def clean_up(file_list): |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
104 for f in file_list: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
105 if os.path.isfile(f): |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
106 os.remove(f) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
107 |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
108 |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
109 if len(jobs) > 1 and num_threads > 1: |
19 | 110 # A small "info" message for Galaxy to show the user. |
20
a19b3ded8f33
v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents:
19
diff
changeset
|
111 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))) |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
112 results = run_jobs(jobs, num_threads) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
113 assert len(fasta_files) == len(temp_files) == len(jobs) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
114 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs): |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
115 error_level = results[cmd] |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
116 try: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
117 output = open(temp).readline() |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
118 except IOError: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
119 output = "" |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
120 if error_level or output.lower().startswith("error running"): |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
121 clean_up(fasta_files) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
122 clean_up(temp_files) |
19 | 123 sys.exit("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output), |
5
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
124 error_level) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
125 del results |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
126 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
127 out_handle = open(tabular_file, "w") |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
128 out_handle.write("#ID\tCompartment\tScore\tRank\n") |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
129 for temp in temp_files: |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
130 data_handle = open(temp) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
131 clean_tabular(data_handle, out_handle) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
132 data_handle.close() |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
133 out_handle.close() |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
134 |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
135 clean_up(fasta_files) |
0f1c61998b22
Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
136 clean_up(temp_files) |