annotate tools/protein_analysis/wolf_psort.py @ 8:976a5f2833cd draft

Uploaded v0.1.1 of the bundle, which fixes an error in the header of the tabular output produced for Promoter 2.0
author peterjc
date Mon, 30 Jul 2012 12:56:54 -0400
parents a290c6d4e658
children e52220a9ddad
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Wrapper for WoLF PSORT v0.2 for use in Galaxy.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * the organism type (animal, plant or fungi)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * number of threads to use (integer)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * an input protein FASTA filename
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * output tabular filename.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 It then calls the standalone WoLF PSORT v0.2 program runWolfPsortSummary
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11 (not the webservice), and coverts the output from something like this:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 # k used for kNN is: 27
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 gi|301087619|ref|XP_002894699.1| extr 12, mito 4, E.R. 3, golg 3, mito_nucl 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 gi|301087623|ref|XP_002894700.1| extr 21, mito 2, cyto 2, cyto_mito 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 In order to make it easier to use in Galaxy, this wrapper script reformats
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 this to use tab separators, with one line per compartment prediction:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 #ID Compartment Score Rank
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 gi|301087619|ref|XP_002894699.1| extr 12 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 gi|301087619|ref|XP_002894699.1| mito 4 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 gi|301087619|ref|XP_002894699.1| E.R. 3 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 gi|301087619|ref|XP_002894699.1| golg 3 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 gi|301087619|ref|XP_002894699.1| mito_nucl 3 5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 gi|301087623|ref|XP_002894700.1| extr 21 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 gi|301087623|ref|XP_002894700.1| mito 2 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 gi|301087623|ref|XP_002894700.1| cyto 2 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 gi|301087623|ref|XP_002894700.1| cyto_mito 2 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 Additionally in order to take full advantage of multiple cores, by subdividing
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 the input FASTA file multiple copies of WoLF PSORT are run in parallel. I would
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 normally use Python's multiprocessing library in this situation but it requires
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 at least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 import sys
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 import os
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38 from seq_analysis_utils import stop_err, split_fasta, run_jobs
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 FASTA_CHUNK = 500
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 exe = "runWolfPsortSummary"
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 """
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents: 5
diff changeset
44 Note: I had trouble getting runWolfPsortSummary on the path (via a link, other
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents: 5
diff changeset
45 than by including all of /opt/WoLFPSORT_package_v0.2/bin , so used a wrapper
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 python script called runWolfPsortSummary as follows:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 #Wrapper script to call WoLF PSORT from its own directory.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 import os
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 import sys
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 import subprocess
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 saved_dir = os.path.abspath(os.curdir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 os.chdir("/opt/WoLFPSORT_package_v0.2/bin")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 args = ["./runWolfPsortSummary"] + sys.argv[1:]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 return_code = subprocess.call(args)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 os.chdir(saved_dir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 sys.exit(return_code)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 if len(sys.argv) != 5:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62 stop_err("Require four arguments, organism, threads, input protein FASTA file & output tabular file")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 organism = sys.argv[1]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 if organism not in ["animal", "plant", "fungi"]:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 stop_err("Organism argument %s is not one of animal, plant, fungi" % organism)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 try:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 num_threads = int(sys.argv[2])
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 except:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 num_threads = 0
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 if num_threads < 1:
6
a290c6d4e658 Migrated tool version 0.0.9 from old tool shed archive to new tool shed repository
peterjc
parents: 5
diff changeset
73 stop_err("Threads argument %s is not a positive integer" % sys.argv[2])
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 fasta_file = sys.argv[3]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 tabular_file = sys.argv[4]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 def clean_tabular(raw_handle, out_handle):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 """Clean up WoLF PSORT output to make it tabular."""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 for line in raw_handle:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82 if not line or line.startswith("#"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 continue
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 name, data = line.rstrip("\r\n").split(None,1)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 for rank, comp_data in enumerate(data.split(",")):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 comp, score = comp_data.split()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 out_handle.write("%s\t%s\t%s\t%i\n" \
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 % (name, comp, score, rank+1))
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 fasta_files = split_fasta(fasta_file, tabular_file, n=FASTA_CHUNK)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 temp_files = [f+".out" for f in fasta_files]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 assert len(fasta_files) == len(temp_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 jobs = ["%s %s < %s > %s" % (exe, organism, fasta, temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 for (fasta, temp) in zip(fasta_files, temp_files)]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 def clean_up(file_list):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 for f in file_list:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 if os.path.isfile(f):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 os.remove(f)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 if len(jobs) > 1 and num_threads > 1:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 #A small "info" message for Galaxy to show the user.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 results = run_jobs(jobs, num_threads)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 error_level = results[cmd]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 try:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 output = open(temp).readline()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 except IOError:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 output = ""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 if error_level or output.lower().startswith("error running"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 clean_up(temp_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 stop_err("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 error_level)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118 del results
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 out_handle = open(tabular_file, "w")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 out_handle.write("#ID\tCompartment\tScore\tRank\n")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 for temp in temp_files:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 data_handle = open(temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 clean_tabular(data_handle, out_handle)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 data_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126 out_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
129 clean_up(temp_files)