annotate tools/protein_analysis/wolf_psort.py @ 20:a19b3ded8f33 draft

v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
author peterjc
date Thu, 21 Sep 2017 11:35:20 -0400
parents f3ecd80850e2
children 238eae32483c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Wrapper for WoLF PSORT v0.2 for use in Galaxy.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * the organism type (animal, plant or fungi)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * number of threads to use (integer)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * an input protein FASTA filename
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * output tabular filename.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 It then calls the standalone WoLF PSORT v0.2 program runWolfPsortSummary
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11 (not the webservice), and coverts the output from something like this:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 # k used for kNN is: 27
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 gi|301087619|ref|XP_002894699.1| extr 12, mito 4, E.R. 3, golg 3, mito_nucl 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 gi|301087623|ref|XP_002894700.1| extr 21, mito 2, cyto 2, cyto_mito 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 In order to make it easier to use in Galaxy, this wrapper script reformats
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 this to use tab separators, with one line per compartment prediction:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 #ID Compartment Score Rank
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 gi|301087619|ref|XP_002894699.1| extr 12 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 gi|301087619|ref|XP_002894699.1| mito 4 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 gi|301087619|ref|XP_002894699.1| E.R. 3 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 gi|301087619|ref|XP_002894699.1| golg 3 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 gi|301087619|ref|XP_002894699.1| mito_nucl 3 5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 gi|301087623|ref|XP_002894700.1| extr 21 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 gi|301087623|ref|XP_002894700.1| mito 2 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 gi|301087623|ref|XP_002894700.1| cyto 2 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 gi|301087623|ref|XP_002894700.1| cyto_mito 2 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 Additionally in order to take full advantage of multiple cores, by subdividing
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 the input FASTA file multiple copies of WoLF PSORT are run in parallel. I would
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 normally use Python's multiprocessing library in this situation but it requires
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 at least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 """
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
36
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
37 from __future__ import print_function
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
38
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 import os
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
40 import sys
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
41
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
42 from seq_analysis_utils import run_jobs, split_fasta, thread_count
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44 FASTA_CHUNK = 500
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 exe = "runWolfPsortSummary"
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 """
9
e52220a9ddad Uploaded v0.1.2
peterjc
parents: 6
diff changeset
48 Note: I had trouble getting runWolfPsortSummary on the path (via a link), other
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents: 5
diff changeset
49 than by including all of /opt/WoLFPSORT_package_v0.2/bin , so used a wrapper
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 python script called runWolfPsortSummary as follows:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 #Wrapper script to call WoLF PSORT from its own directory.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 import os
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 import sys
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 import subprocess
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 saved_dir = os.path.abspath(os.curdir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 os.chdir("/opt/WoLFPSORT_package_v0.2/bin")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 args = ["./runWolfPsortSummary"] + sys.argv[1:]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60 return_code = subprocess.call(args)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 os.chdir(saved_dir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62 sys.exit(return_code)
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
63
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
64 For more details on this workaround, see:
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
65 https://lists.galaxyproject.org/pipermail/galaxy-dev/2015-December/023386.html
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
68 if "-v" in sys.argv or "--version" in sys.argv:
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
69 sys.exit("WoLF-PSORT wrapper version 0.0.11")
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
70
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 if len(sys.argv) != 5:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
72 sys.exit("Require four arguments, organism, threads, input protein FASTA file & output tabular file")
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 organism = sys.argv[1]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 if organism not in ["animal", "plant", "fungi"]:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
76 sys.exit("Organism argument %s is not one of animal, plant, fungi" % organism)
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77
9
e52220a9ddad Uploaded v0.1.2
peterjc
parents: 6
diff changeset
78 num_threads = thread_count(sys.argv[2], default=4)
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 fasta_file = sys.argv[3]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 tabular_file = sys.argv[4]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
82
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 def clean_tabular(raw_handle, out_handle):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 """Clean up WoLF PSORT output to make it tabular."""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 for line in raw_handle:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 if not line or line.startswith("#"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 continue
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
88 name, data = line.rstrip("\r\n").split(None, 1)
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89 for rank, comp_data in enumerate(data.split(",")):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 comp, score = comp_data.split()
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
91 out_handle.write("%s\t%s\t%s\t%i\n"
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
92 % (name, comp, score, rank + 1))
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
94
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 fasta_files = split_fasta(fasta_file, tabular_file, n=FASTA_CHUNK)
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
96 temp_files = [f + ".out" for f in fasta_files]
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 assert len(fasta_files) == len(temp_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 jobs = ["%s %s < %s > %s" % (exe, organism, fasta, temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 for (fasta, temp) in zip(fasta_files, temp_files)]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
102
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 def clean_up(file_list):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 for f in file_list:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 if os.path.isfile(f):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 os.remove(f)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
108
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 if len(jobs) > 1 and num_threads > 1:
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
110 # A small "info" message for Galaxy to show the user.
20
a19b3ded8f33 v0.2.11 Job splitting fast-fail; RXLR tools supports HMMER2 from BioConda; Capture more version information; misc internal changes
peterjc
parents: 19
diff changeset
111 print("Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs)))
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 results = run_jobs(jobs, num_threads)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 error_level = results[cmd]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 try:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 output = open(temp).readline()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 except IOError:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 output = ""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 if error_level or output.lower().startswith("error running"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 clean_up(temp_files)
19
f3ecd80850e2 v0.2.9 Python style improvements
peterjc
parents: 18
diff changeset
123 sys.exit("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 error_level)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 del results
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 out_handle = open(tabular_file, "w")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 out_handle.write("#ID\tCompartment\tScore\tRank\n")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 for temp in temp_files:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
130 data_handle = open(temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
131 clean_tabular(data_handle, out_handle)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
132 data_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
133 out_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
134
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
135 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
136 clean_up(temp_files)