annotate tools/protein_analysis/wolf_psort.py @ 5:0f1c61998b22

Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 18:06:27 -0400
parents
children a290c6d4e658
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
1 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 """Wrapper for WoLF PSORT v0.2 for use in Galaxy.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 This script takes exactly four command line arguments:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 * the organism type (animal, plant or fungi)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 * number of threads to use (integer)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 * an input protein FASTA filename
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 * output tabular filename.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 It then calls the standalone WoLF PSORT v0.2 program runWolfPsortSummary
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11 (not the webservice), and coverts the output from something like this:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 # k used for kNN is: 27
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 gi|301087619|ref|XP_002894699.1| extr 12, mito 4, E.R. 3, golg 3, mito_nucl 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
15 gi|301087623|ref|XP_002894700.1| extr 21, mito 2, cyto 2, cyto_mito 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 In order to make it easier to use in Galaxy, this wrapper script reformats
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 this to use tab separators, with one line per compartment prediction:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 #ID Compartment Score Rank
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 gi|301087619|ref|XP_002894699.1| extr 12 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 gi|301087619|ref|XP_002894699.1| mito 4 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 gi|301087619|ref|XP_002894699.1| E.R. 3 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 gi|301087619|ref|XP_002894699.1| golg 3 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 gi|301087619|ref|XP_002894699.1| mito_nucl 3 5
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 gi|301087623|ref|XP_002894700.1| extr 21 1
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 gi|301087623|ref|XP_002894700.1| mito 2 2
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 gi|301087623|ref|XP_002894700.1| cyto 2 3
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 gi|301087623|ref|XP_002894700.1| cyto_mito 2 4
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 Additionally in order to take full advantage of multiple cores, by subdividing
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 the input FASTA file multiple copies of WoLF PSORT are run in parallel. I would
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 normally use Python's multiprocessing library in this situation but it requires
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 at least Python 2.6 and at the time of writing Galaxy still supports Python 2.4.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 import sys
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 import os
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38 from seq_analysis_utils import stop_err, split_fasta, run_jobs
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 FASTA_CHUNK = 500
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 exe = "runWolfPsortSummary"
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44 Note: I had trouble getting runWolfPsortSummary on the path, so used a wrapper
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 python script called runWolfPsortSummary as follows:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 #!/usr/bin/env python
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 #Wrapper script to call WoLF PSORT from its own directory.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 import os
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 import sys
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 import subprocess
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 saved_dir = os.path.abspath(os.curdir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 os.chdir("/opt/WoLFPSORT_package_v0.2/bin")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 args = ["./runWolfPsortSummary"] + sys.argv[1:]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 return_code = subprocess.call(args)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 os.chdir(saved_dir)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 sys.exit(return_code)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 """
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
59
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
60 if len(sys.argv) != 5:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
61 stop_err("Require four arguments, organism, threads, input protein FASTA file & output tabular file")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
62
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
63 organism = sys.argv[1]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
64 if organism not in ["animal", "plant", "fungi"]:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
65 stop_err("Organism argument %s is not one of animal, plant, fungi" % organism)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 try:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 num_threads = int(sys.argv[2])
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 except:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 num_threads = 0
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71 if num_threads < 1:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 stop_err("Threads argument %s is not a positive integer" % sys.argv[3])
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 fasta_file = sys.argv[3]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 tabular_file = sys.argv[4]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78 def clean_tabular(raw_handle, out_handle):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 """Clean up WoLF PSORT output to make it tabular."""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 for line in raw_handle:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 if not line or line.startswith("#"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82 continue
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 name, data = line.rstrip("\r\n").split(None,1)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84 for rank, comp_data in enumerate(data.split(",")):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 comp, score = comp_data.split()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 out_handle.write("%s\t%s\t%s\t%i\n" \
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 % (name, comp, score, rank+1))
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89 fasta_files = split_fasta(fasta_file, tabular_file, n=FASTA_CHUNK)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90 temp_files = [f+".out" for f in fasta_files]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
91 assert len(fasta_files) == len(temp_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 jobs = ["%s %s < %s > %s" % (exe, organism, fasta, temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 for (fasta, temp) in zip(fasta_files, temp_files)]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96 def clean_up(file_list):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 for f in file_list:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98 if os.path.isfile(f):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 os.remove(f)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 if len(jobs) > 1 and num_threads > 1:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102 #A small "info" message for Galaxy to show the user.
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 print "Using %i threads for %i tasks" % (min(num_threads, len(jobs)), len(jobs))
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 results = run_jobs(jobs, num_threads)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 assert len(fasta_files) == len(temp_files) == len(jobs)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106 for fasta, temp, cmd in zip(fasta_files, temp_files, jobs):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 error_level = results[cmd]
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 try:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
109 output = open(temp).readline()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
110 except IOError:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
111 output = ""
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
112 if error_level or output.lower().startswith("error running"):
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
113 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
114 clean_up(temp_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
115 stop_err("One or more tasks failed, e.g. %i from %r gave:\n%s" % (error_level, cmd, output),
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
116 error_level)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
117 del results
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
118
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
119 out_handle = open(tabular_file, "w")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
120 out_handle.write("#ID\tCompartment\tScore\tRank\n")
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
121 for temp in temp_files:
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
122 data_handle = open(temp)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
123 clean_tabular(data_handle, out_handle)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
124 data_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
125 out_handle.close()
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
126
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
127 clean_up(fasta_files)
0f1c61998b22 Migrated tool version 0.0.8 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
128 clean_up(temp_files)