annotate tools/protein_analysis/wolf_psort.py @ 16:7de64c8b258d draft

Uploaded v0.2.5, MIT licence, RST for README, citation information, development moved to GitHub
author peterjc
date Wed, 18 Sep 2013 06:16:58 -0400
parents 99b82a2b1272
children eb6ac44d4b8e
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Wrapper for WoLF PSORT v0.2 for use in Galaxy.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * the organism type (animal, plant or fungi)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * number of threads to use (integer)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * an input protein FASTA filename
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * output tabular filename.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 It then calls the standalone WoLF PSORT v0.2 program runWolfPsortSummary
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11 (not the webservice), and coverts the output from something like this:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 # k used for kNN is: 27
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 gi|301087619|ref|XP_002894699.1| extr 12, mito 4, E.R. 3, golg 3, mito_nucl 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 gi|301087623|ref|XP_002894700.1| extr 21, mito 2, cyto 2, cyto_mito 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 In order to make it easier to use in Galaxy, this wrapper script reformats
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 this to use tab separators, with one line per compartment prediction:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 #ID Compartment Score Rank
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 gi|301087619|ref|XP_002894699.1| extr 12 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 gi|301087619|ref|XP_002894699.1| mito 4 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 gi|301087619|ref|XP_002894699.1| E.R. 3 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 gi|301087619|ref|XP_002894699.1| golg 3 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 gi|301087619|ref|XP_002894699.1| mito_nucl 3 5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 gi|301087623|ref|XP_002894700.1| extr 21 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 gi|301087623|ref|XP_002894700.1| mito 2 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 gi|301087623|ref|XP_002894700.1| cyto 2 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 gi|301087623|ref|XP_002894700.1| cyto_mito 2 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 Additionally in order to take full advantage of multiple cores, by subdividing
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 the input FASTA file multiple copies of WoLF PSORT are run in parallel. I would
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 normally use Python's multiprocessing library in this situation but it requires
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 at least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 import sys
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 import os
9
e52220a9ddad Uploaded v0.1.2
peterjc
parents: 6
diff changeset
38 from seq_analysis_utils import stop_err, split_fasta, run_jobs, thread_count
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 FASTA_CHUNK = 500
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 exe = "runWolfPsortSummary"
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 """
9
e52220a9ddad Uploaded v0.1.2
peterjc
parents: 6
diff changeset
44 Note: I had trouble getting runWolfPsortSummary on the path (via a link), other
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents: 5
diff changeset
45 than by including all of /opt/WoLFPSORT_package_v0.2/bin , so used a wrapper
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 python script called runWolfPsortSummary as follows:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 #Wrapper script to call WoLF PSORT from its own directory.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 import os
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 import sys
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 import subprocess
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 saved_dir = os.path.abspath(os.curdir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 os.chdir("/opt/WoLFPSORT_package_v0.2/bin")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 args = ["./runWolfPsortSummary"] + sys.argv[1:]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 return_code = subprocess.call(args)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 os.chdir(saved_dir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 sys.exit(return_code)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 if len(sys.argv) != 5:
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents: 9
diff changeset
62 stop_err("Require four arguments, organism, threads, input protein FASTA file & output tabular file")
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 organism = sys.argv[1]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 if organism not in ["animal", "plant", "fungi"]:
11
99b82a2b1272 Uploaded v0.2.0 which added PSORTb wrapper (written with Konrad Paszkiewicz)
peterjc
parents: 9
diff changeset
66 stop_err("Organism argument %s is not one of animal, plant, fungi" % organism)
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67
9
e52220a9ddad Uploaded v0.1.2
peterjc
parents: 6
diff changeset
68 num_threads = thread_count(sys.argv[2], default=4)
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 fasta_file = sys.argv[3]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 tabular_file = sys.argv[4]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 def clean_tabular(raw_handle, out_handle):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73 """Clean up WoLF PSORT output to make it tabular."""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 for line in raw_handle:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 if not line or line.startswith("#"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 continue
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 name, data = line.rstrip("\r\n").split(None,1)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 for rank, comp_data in enumerate(data.split(",")):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 comp, score = comp_data.split()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 out_handle.write("%s\t%s\t%s\t%i\n" \
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 % (name, comp, score, rank+1))
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 fasta_files = split_fasta(fasta_file, tabular_file, n=FASTA_CHUNK)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 temp_files = [f+".out" for f in fasta_files]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 assert len(fasta_files) == len(temp_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 jobs = ["%s %s < %s > %s" % (exe, organism, fasta, temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 for (fasta, temp) in zip(fasta_files, temp_files)]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 def clean_up(file_list):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 for f in file_list:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 if os.path.isfile(f):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 os.remove(f)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 if len(jobs) > 1 and num_threads > 1:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 #A small "info" message for Galaxy to show the user.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 results = run_jobs(jobs, num_threads)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 error_level = results[cmd]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 try:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 output = open(temp).readline()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 except IOError:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 output = ""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 if error_level or output.lower().startswith("error running"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 clean_up(temp_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 stop_err("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 error_level)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 del results
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 out_handle = open(tabular_file, "w")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 out_handle.write("#ID\tCompartment\tScore\tRank\n")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 for temp in temp_files:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 data_handle = open(temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 clean_tabular(data_handle, out_handle)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 data_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 out_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 clean_up(temp_files)