annotate corebio/resource/astral.py @ 15:981eb8c3a756 default tip

Uploaded
author davidmurphy
date Sat, 31 Mar 2012 16:07:07 -0400
parents c55bdc2fb9fa
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
1
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
2 # Copyright 2000 by Jeffrey Chang. All rights reserved.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
3 # Copyright 2001 by Gavin E. Crooks. All rights reserved.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
4 # Modifications Copyright 2004/2005 James Casbon.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
5 # Copyright 2005 by Regents of the University of California. All rights reserved
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
6 # (Major rewrite for conformance to corebio. Gavin Crooks)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
7 #
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
8 # This code is derived from the Biopython distribution and is governed by it's
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
9 # license. Please see the LICENSE file that should have been included
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
10 # as part of this package.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
11
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
12 """ ASTRAL dataset IO.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
13
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
14 From http://astral.berkeley.edu/ :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
15
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
16 The ASTRAL Compendium for Sequence and Structure Analysis
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
17
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
18 The ASTRAL compendium provides databases and tools useful for analyzing protein structures and their sequences. It is partially derived from, and augments the SCOP: Structural Classification of Proteins database. Most of the resources depend upon the coordinate files maintained and distributed by the Protein Data Bank.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
19
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
20 * Classes :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
21 - Raf -- A file ofASTRAL RAF (Rapid Access Format) Sequence Maps.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
22 - RafSeqMap -- A sequence map, a RAF record.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
23 - Res -- A single residue mapping from a RAF record.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
24
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
25 * Functions :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
26 - parse_domain -- Convert an ASTRAL fasta header string into a Scop domain.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
27 - normalize_letters -- Bormalize RAF amino acid codes.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
28
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
29 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
30
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
31 # TODO : Need to pull more of James Casbon's Astral code.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
32
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
33 import re
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
34 from copy import copy
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
35
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
36 from corebio.resource.scop import Domain, Residues
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
37 from corebio.data import extended_three_to_one as to_one_letter_code
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
38 from corebio.utils import FileIndex
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
39
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
40 __all__ = ('astral_evalues', 'astral_percent_identities',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
41 'astral_evalues_filenames', 'normalize_letters', 'parse_domain',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
42 'Raf', 'RafSeqMap', 'Res')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
43
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
44 # Percentage identity filtered ASTRAL SCOP genetic domain sequence subset
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
45 astral_percent_identities = [10,20,25,30,35,40,50,70,90,95,100]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
46
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
47 # E-value filtered ASTRAL SCOP genetic domain sequence subsets, based on PDB SEQRES records.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
48 astral_evalues = [10, 5, 1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001, 1e-4, 1e-5, 1e-10, 1e-15,1e-20, 1e-25, 1e-50]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
49
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
50 # A map between evalues and astral filename suffixes.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
51 astral_evalues_filenames = {
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
52 10: 'e+1', 5: 'e+0,7', 1: 'e+0', 0.5: 'e-0,3', 0.1: 'e-1',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
53 0.05: 'e-1,3', 0.01: 'e-2', 0.005: 'e-2,3', 0.001: 'e-3',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
54 1e-4: 'e-4', 1e-5: 'e-5', 1e-10: 'e-10', 1e-15: 'e-15',
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
55 1e-20: 'e-20', 1e-25: 'e-25', 1e-50: 'e-50' }
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
56
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
57
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
58
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
59 def normalize_letters(one_letter_code) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
60 """Convert RAF one-letter amino acid codes into IUPAC standard codes.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
61 Letters are uppercased, and "." ("Unknown") is converted to "X".
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
62 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
63 if one_letter_code == '.' :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
64 return 'X'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
65 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
66 return one_letter_code.upper()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
67
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
68 _domain_re = re.compile(r">?([\w_\.]*)\s+([\w\.]*)\s+\(([^)]*)\) (.*)")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
69 def parse_domain(str) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
70 """Convert an ASTRAL fasta header string into a Scop domain.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
71
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
72 An ASTRAL (http://astral.stanford.edu/) header contains a concise
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
73 description of a SCOP domain. A very similar format is used when a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
74 Domain object is converted into a string. The Domain returned by this
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
75 method contains most of the SCOP information, but it will not be located
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
76 within the SCOP hierarchy (i.e. The parent node will be None). The
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
77 description is composed of the SCOP protein and species descriptions.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
78
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
79 A typical ASTRAL header looks like --
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
80 >d1tpt_1 a.46.2.1 (1-70) Thymidine phosphorylase {Escherichia coli}
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
81 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
82
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
83 m = _domain_re.match(str)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
84 if (not m) : raise ValueError("Domain: "+ str)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
85
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
86 dom = Domain()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
87 dom.sid = m.group(1)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
88 dom.sccs = m.group(2)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
89 dom.residues = Residues(m.group(3))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
90 if not dom.residues.pdbid :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
91 dom.residues.pdbid= dom.sid[1:5]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
92 dom.description = m.group(4).strip()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
93
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
94 return dom
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
95
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
96
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
97 class Raf(FileIndex) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
98 """ASTRAL RAF (Rapid Access Format) Sequence Maps.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
99
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
100 The ASTRAL RAF Sequence Maps record the relationship between the PDB SEQRES
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
101 records (representing the sequence of the molecule used in an experiment)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
102 and the ATOM records (representing the atoms experimentally observed).
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
103
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
104 This data is derived from the Protein Data Bank CIF files. Known errors in
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
105 the CIF files are corrected manually, with the original PDB file serving as
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
106 the final arbiter in case of discrepancies.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
107
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
108 Residues are referenced by residue ID. This consists of a the PDB residue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
109 sequence number (up to 4 digits) and an optional PDB insertion code (an
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
110 ascii alphabetic character, a-z, A-Z). e.g. "1", "10A", "1010b", "-1"
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
111
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
112 See "ASTRAL RAF Sequence Maps":http://astral.stanford.edu/raf.html
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
113
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
114 The RAF file itself is about 50 MB. Each line consists of a sequence map of
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
115 a different protein chain. This index provides rapid, random
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
116 access of RAF records without having to load the entire file into memory.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
117
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
118 This class does not load the entire RAF file into memory. Instead, it
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
119 reads the file once, noting the location and content of each RafSeqMap.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
120 The index key is a concatenation of the PDB ID and chain ID. e.g
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
121 "2drcA", "155c_". RAF uses an underscore to indicate blank
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
122 chain IDs. Custom maps of subsequences or spanning multiple chains, can
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
123 be constructed with the get_seqmap method.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
124
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
125 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
126 def __init__(self, raf_file) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
127 def linekey(line) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
128 if not line or len(line)<5 or line.isspace() or line[0]=='#':
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
129 return None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
130 return line[0:5]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
131 def parser( f) : return RafSeqMap(f.readline())
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
132
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
133 FileIndex.__init__(self, raf_file, linekey, parser)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
134
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
135
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
136 def get_seqmap(self, residues) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
137 """Get the sequence map for a collection of residues.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
138
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
139 residues -- A SCOP style description of a collection of residues from a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
140 PDB strucure, (e.g. '(1bba A:10-20,B:)'), as a string or a
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
141 scop.Residues instance.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
142 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
143 if type(residues)== str :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
144 residues = Residues(residues)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
145
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
146 pdbid = residues.pdbid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
147 frags = residues.fragments
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
148 if not frags: frags =(('_','',''),) # All residues of unnamed chain
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
149
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
150 seqMap = None
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
151 for frag in frags :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
152 chainid = frag[0]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
153 if chainid=='' or chainid=='-' or chainid==' ' or chainid=='_':
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
154 chainid = '_'
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
155 sid = pdbid + chainid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
156
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
157 sm = self[sid]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
158
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
159 # Cut out fragment of interest
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
160 start = 0
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
161 end = len(sm.res)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
162 if frag[1] : start = int(sm.index(frag[1], chainid))
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
163 if frag[2] : end = int(sm.index(frag[2], chainid)+1)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
164
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
165 sm = sm[start:end]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
166
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
167 if seqMap is None :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
168 seqMap = sm
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
169 else :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
170 seqMap += sm
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
171
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
172 return seqMap
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
173 # End Raf
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
174
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
175 class RafSeqMap(object) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
176 """ASTRAL RAF (Rapid Access Format) Sequence Maps.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
177
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
178 RafSeqMap is a list like object; You can find the location of particular
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
179 residues with index(), slice this RafSeqMap into fragments, and glue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
180 fragments back together with extend().
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
181
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
182 - pdbid -- The PDB 4 character ID
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
183 - pdb_datestamp -- From the PDB file
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
184 - version -- The RAF format version. e.g. 0.01
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
185 - flags -- RAF flags. (See release notes for more information.)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
186 - res -- A list of Res objects, one for each residue in this sequence map
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
187 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
188
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
189 def __init__(self, raf_record=None) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
190 """Parses a RAF record into a RafSeqMap object."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
191
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
192 self.pdbid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
193 self.pdb_datestamp = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
194 self.version = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
195 self.flags = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
196 self.res = []
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
197
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
198 if not raf_record : return
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
199
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
200 header_len = 38
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
201 line = raf_record.rstrip() # no trailing whitespace
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
202
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
203 if len(line)<header_len:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
204 raise ValueError("Incomplete header: "+line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
205
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
206 self.pdbid = line[0:4]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
207 chainid = line[4:5]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
208
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
209 self.version = line[6:10]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
210
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
211 # Raf format versions 0.01 and 0.02 are identical for practical purposes
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
212 if(self.version != "0.01" and self.version !="0.02") :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
213 raise ValueError("Incompatible RAF version: "+self.version)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
214
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
215 self.pdb_datestamp = line[14:20]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
216 self.flags = line[21:27]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
217
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
218 for i in range(header_len, len(line), 7) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
219 f = line[i : i+7]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
220 if len(f)!=7:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
221 raise ValueError("Corrupt Field: ("+f+")" )
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
222 r = Res()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
223 r.chainid = chainid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
224 r.resid = f[0:5].strip()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
225 r.atom = normalize_letters(f[5:6])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
226 r.seqres = normalize_letters(f[6:7])
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
227
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
228 self.res.append(r)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
229 # end __init__
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
230
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
231 #@staticmethod
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
232 def records(raf_file) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
233 """Iterates over a Raf file, generating RafSeqMaps """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
234 for line in raf_file:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
235 if line[0] =='#': continue # A comment
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
236 if line.isspace() : continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
237 yield RafSeqMap(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
238 records = staticmethod(records)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
239
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
240 def index(self, resid, chainid="_") :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
241 for i in range(0, len(self.res)) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
242 if self.res[i].resid == resid and self.res[i].chainid == chainid :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
243 return i
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
244 raise KeyError("No such residue "+chainid+resid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
245
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
246 def __getslice__(self, i, j) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
247 s = copy(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
248 s.res = s.res[i:j]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
249 return s
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
250
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
251 def append(self, res) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
252 """Append another Res object onto the list of residue mappings."""
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
253 self.res.append(res)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
254
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
255 def extend(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
256 """Append another RafSeqMap onto the end of self.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
257
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
258 Both RafSeqMaps must have the same PDB ID, PDB datestamp and
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
259 RAF version. The RAF flags are erased if they are inconsistent. This
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
260 may happen when fragments are taken from different chains.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
261 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
262 if not isinstance(other, RafSeqMap):
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
263 raise TypeError("Can only extend a RafSeqMap with a RafSeqMap.")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
264 if self.pdbid != other.pdbid :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
265 raise TypeError("Cannot add fragments from different proteins.")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
266 if self.version != other.version :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
267 raise TypeError("Incompatible rafs.")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
268 if self.pdb_datestamp != other.pdb_datestamp :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
269 raise TypeError("Different pdb dates!")
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
270 if self.flags != other.flags :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
271 self.flags = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
272 self.res += other.res
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
273
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
274 def __iadd__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
275 self.extend(other)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
276 return self
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
277
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
278 def __add__(self, other) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
279 s = copy(self)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
280 s.extend(other)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
281 return s
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
282
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
283 def extract_atoms(self, pdb_handle, out_handle) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
284 """Extract all relevant ATOM and HETATOM records from a PDB file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
285
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
286 The PDB file is scanned for ATOM and HETATOM records. If the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
287 chain ID, residue ID (seqNum and iCode), and residue type match
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
288 a residue in this sequence map, then the record is echoed to the
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
289 output handle.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
290
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
291 This is typically used to find the coordinates of a domain, or other
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
292 residue subset.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
293
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
294 pdb_file -- A handle to the relevant PDB file.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
295 out_file -- All output is written to this stream.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
296 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
297 resSet = {}
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
298 for r in self.res :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
299 if r.atom=='X' : # Unknown residue type
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
300 continue
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
301 chainid = r.chainid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
302 if chainid == '_':
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
303 chainid = ' '
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
304 resid = r.resid
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
305 resSet[(chainid,resid)] = r
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
306
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
307 resFound = {}
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
308 for line in pdb_handle :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
309 if line.startswith("ATOM ") or line.startswith("HETATM") :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
310 chainid = line[21:22]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
311 resid = line[22:27].strip()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
312 key = (chainid, resid)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
313 if key in resSet:
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
314 res = resSet[key]
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
315 atom_aa = res.atom
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
316 resName = line[17:20].capitilize()
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
317 if resName in to_one_letter_code :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
318 if to_one_letter_code[resName] == atom_aa :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
319 out_handle.write(line)
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
320 resFound[key] = res
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
321
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
322 if len(resSet) != len(resFound) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
323 raise RuntimeError('I could not find at least one ATOM or '
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
324 'HETATM record for each and every residue in this sequence map.')
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
325
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
326
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
327 class Res(object) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
328 """ A single residue mapping from a RAF record.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
329
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
330 - chainid -- A single character chain ID.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
331 - resid -- The residue ID.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
332 - atom -- amino acid one-letter code from ATOM records.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
333 - seqres -- amino acid one-letter code from SEQRES records.
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
334 """
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
335 def __init__(self) :
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
336 self.chainid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
337 self.resid = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
338 self.atom = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
339 self.seqres = ''
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
340
c55bdc2fb9fa Uploaded
davidmurphy
parents:
diff changeset
341