comparison tools/protein_analysis/tmhmm2.py @ 2:6901298ac16c

Migrated tool version 0.0.5 from old tool shed archive to new tool shed repository
author peterjc
date Tue, 07 Jun 2011 18:04:39 -0400
parents 3ff1dcbb9440
children 9b45a8743100
comparison
equal deleted inserted replaced
1:3ff1dcbb9440 2:6901298ac16c
4 This script takes exactly two command line arguments - an input protein FASTA 4 This script takes exactly two command line arguments - an input protein FASTA
5 filename and an output tabular filename. It then calls the standalone TMHMM 5 filename and an output tabular filename. It then calls the standalone TMHMM
6 v2.0 program (not the webservice) requesting the short output (one line per 6 v2.0 program (not the webservice) requesting the short output (one line per
7 protein). 7 protein).
8 8
9 First major feature is cleaning up the tabular output. The raw output from 9 The first major feature is cleaning up the tabular output. The short form raw
10 TMHMM v2.0 looks like this (six columns tab separated): 10 output from TMHMM v2.0 looks like this (six columns tab separated):
11 11
12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o 12 gi|2781234|pdb|1JLY|B len=304 ExpAA=0.01 First60=0.00 PredHel=0 Topology=o
13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o 13 gi|4959044|gb|AAD34209.1|AF069992_1 len=600 ExpAA=0.00 First60=0.00 PredHel=0 Topology=o
14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o 14 gi|671626|emb|CAA85685.1| len=473 ExpAA=0.19 First60=0.00 PredHel=0 Topology=o
15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i 15 gi|3298468|dbj|BAA31520.1| len=107 ExpAA=59.37 First60=31.17 PredHel=3 Topology=o23-45i52-74o89-106i
16
17 If there are any additional 'comment' lines starting with the hash (#)
18 character these are ignored by this script.
16 19
17 In order to make it easier to use in Galaxy, this wrapper script simplifies 20 In order to make it easier to use in Galaxy, this wrapper script simplifies
18 this to remove the redundant tags, and instead adds a comment line at the 21 this to remove the redundant tags, and instead adds a comment line at the
19 top with the column names: 22 top with the column names:
20 23
53 56
54 def clean_tabular(raw_handle, out_handle): 57 def clean_tabular(raw_handle, out_handle):
55 """Clean up tabular TMHMM output, returns output line count.""" 58 """Clean up tabular TMHMM output, returns output line count."""
56 count = 0 59 count = 0
57 for line in raw_handle: 60 for line in raw_handle:
58 if not line: 61 if not line.strip() or line.startswith("#"):
62 #Ignore any blank lines or comment lines
59 continue 63 continue
60 parts = line.rstrip("\r\n").split("\t") 64 parts = line.rstrip("\r\n").split("\t")
61 try: 65 try:
62 identifier, length, expAA, first60, predhel, topology = parts 66 identifier, length, expAA, first60, predhel, topology = parts
63 except: 67 except:
80 84
81 #Note that if the input FASTA file contains no sequences, 85 #Note that if the input FASTA file contains no sequences,
82 #split_fasta returns an empty list (i.e. zero temp files). 86 #split_fasta returns an empty list (i.e. zero temp files).
83 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK) 87 fasta_files = split_fasta(fasta_file, tabular_file, FASTA_CHUNK)
84 temp_files = [f+".out" for f in fasta_files] 88 temp_files = [f+".out" for f in fasta_files]
85 jobs = ["tmhmm %s > %s" % (fasta, temp) 89 jobs = ["tmhmm -short %s > %s" % (fasta, temp)
86 for fasta, temp in zip(fasta_files, temp_files)] 90 for fasta, temp in zip(fasta_files, temp_files)]
87 91
88 def clean_up(file_list): 92 def clean_up(file_list):
89 for f in file_list: 93 for f in file_list:
90 if os.path.isfile(f): 94 if os.path.isfile(f):