annotate smart_toolShed/commons/core/parsing/BamParser.py @ 0:e0f8dcca02ed

Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
author yufei-luo
date Thu, 17 Jan 2013 10:52:14 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
1 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
2 # Copyright INRA-URGI 2009-2012
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
3 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
4 # This software is governed by the CeCILL license under French law and
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
5 # abiding by the rules of distribution of free software. You can use,
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
6 # modify and/ or redistribute the software under the terms of the CeCILL
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
7 # license as circulated by CEA, CNRS and INRIA at the following URL
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
8 # "http://www.cecill.info".
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
9 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
10 # As a counterpart to the access to the source code and rights to copy,
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
11 # modify and redistribute granted by the license, users are provided only
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
12 # with a limited warranty and the software's author, the holder of the
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
13 # economic rights, and the successive licensors have only limited
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
14 # liability.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
15 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
16 # In this respect, the user's attention is drawn to the risks associated
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
17 # with loading, using, modifying and/or developing or reproducing the
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
18 # software by the user in light of its specific status of free software,
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
19 # that may mean that it is complicated to manipulate, and that also
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
20 # therefore means that it is reserved for developers and experienced
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
21 # professionals having in-depth computer knowledge. Users are therefore
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
22 # encouraged to load and test the software's suitability as regards their
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
23 # requirements in conditions enabling the security of their systems and/or
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
24 # data to be ensured and, more generally, to use and operate it in the
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
25 # same conditions as regards security.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
26 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
27 # The fact that you are presently reading this means that you have had
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
28 # knowledge of the CeCILL license and that you accept its terms.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
29 #
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
30 import re, sys, gzip, struct
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
31 from commons.core.parsing.MapperParser import MapperParser
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
32 from SMART.Java.Python.structure.Mapping import Mapping
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
33 from SMART.Java.Python.structure.SubMapping import SubMapping
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
34 from SMART.Java.Python.structure.Interval import Interval
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
35
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
36
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
37 BAM_DNA_LOOKUP = "=ACMGRSVTWYHKDBN"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
38
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
39 BAM_CIGAR_LOOKUP = "MIDNSHP=X"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
40 BAM_CIGAR_SHIFT = 4
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
41 BAM_CIGAR_MASK = ((1 << BAM_CIGAR_SHIFT) - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
42
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
43
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
44
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
45 def pack_int32(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
46 return struct.pack('<i', x)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
47
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
48 def pack_uint32(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
49 return struct.pack('<I', x)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
50
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
51 def unpack_int8(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
52 return struct.unpack('<b', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
53
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
54 def unpack_int16(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
55 return struct.unpack('<h', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
56
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
57 def unpack_int32(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
58 return struct.unpack('<i', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
59
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
60 def unpack_int64(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
61 return struct.unpack('<q', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
62
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
63 def unpack_uint8(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
64 return struct.unpack('<B', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
65
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
66 def unpack_uint16(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
67 return struct.unpack('<H', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
68
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
69 def unpack_uint32(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
70 return struct.unpack('<I', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
71
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
72 def unpack_uint64(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
73 return struct.unpack('<Q', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
74
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
75 def unpack_float(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
76 return struct.unpack('<f', x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
77
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
78 def unpack_string(x):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
79 length = len(x)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
80 format_string = "<{0}s".format(length)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
81 string = struct.unpack(format_string, x)[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
82 if string[-1] == '\0':
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
83 return string[:-1]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
84 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
85 return string
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
86
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
87
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
88 BAM_TAG_CODE = {"c": unpack_int8, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
89 "C": unpack_uint8, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
90 "s": unpack_int16, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
91 "S": unpack_uint16, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
92 "i": unpack_int32, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
93 "I": unpack_uint32, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
94 "f": unpack_float, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
95 #"A": unpack_int8, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
96 "A": lambda x: x, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
97 "Z": unpack_int8, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
98 "H": unpack_int8}
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
99
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
100 BAM_TAG_VALUE = {"c": int, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
101 "C": int, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
102 "s": int, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
103 "S": int, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
104 "i": int, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
105 "I": int, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
106 "f": float, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
107 "A": lambda x: x}
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
108
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
109 BAM_TAG_SIZE = {"c": 1, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
110 "C": 1, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
111 "s": 2, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
112 "S": 2, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
113 "i": 4, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
114 "I": 4, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
115 "f": 4, \
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
116 "A": 1}
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
117
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
118
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
119 class CigarOp(object):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
120 def __init__(self, data):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
121 self._length = data >> BAM_CIGAR_SHIFT
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
122 self._type = BAM_CIGAR_LOOKUP[ data & BAM_CIGAR_MASK ]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
123
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
124
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
125 class CigarData(object):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
126 def __init__(self, data, num_ops):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
127 self._ops = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
128 for i in range(num_ops):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
129 cigar_data = unpack_uint32(data[i*4: (i+1)*4])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
130 self._ops.append(CigarOp(cigar_data))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
131
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
132 def getCigarData(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
133 return self._ops
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
134
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
135 def __str__(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
136 return "".join(["%d%s" % (op._length, op._type) for op in self._ops])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
137
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
138
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
139 class TagsData(object):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
140 def __init__(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
141 self._tags = {}
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
142
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
143 def add(self, tag):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
144 self._tags[tag._tag] = tag
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
145
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
146 def getTags(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
147 return self._tags
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
148
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
149 def __str__(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
150 return " ".join([self._tags[tag] for tag in sorted(self._tags.keys())])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
151
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
152
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
153 class TagData(object):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
154 def __init__(self, tag, type, value):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
155 self._tag = tag
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
156 self._type = type
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
157 self._value = value
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
158
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
159 def __str__(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
160 if self._type in "AZHB":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
161 return "%s:%s:%s" % (self._tag, self._type, self._value)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
162 if self._type in "cCsSiI":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
163 return "%s:%s:%d" % (self._tag, self._type, self._value)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
164 return "%s:%s:%f" % (self._tag, self._type, self._value)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
165
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
166
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
167 class TagParser(object):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
168 def __init__(self, data):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
169 self._data = data
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
170 self._tags = TagsData()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
171 self._parse()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
172
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
173 def _parse(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
174 while self._data:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
175 tag = "%s%s" % (chr(unpack_int8(self._data[0])), chr(unpack_int8(self._data[1])))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
176 type = chr(unpack_int8(self._data[2]))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
177 self._data = self._data[3:]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
178 if type in BAM_TAG_VALUE:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
179 value = self._parseUnique(type)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
180 elif type == "Z":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
181 value = self._parseString()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
182 elif type == "H":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
183 size = unpack_int8(self._data[0])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
184 self._data = self._data[1:]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
185 value = self._parseSeveral("C", size)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
186 elif type == "B":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
187 secondType = unpack_int8(self._data[0])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
188 size = unpack_int8(self._data[1]) + unpack_int8(self._data[2]) * 16 + unpack_int8(self._data[3]) * 16 * 16 + unpack_int8(self._data[4]) * 16 * 16 * 16
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
189 self._data = self._data[5:]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
190 value = self._parseSeveral(secondType, size)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
191 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
192 raise Exception("Cannot parse type '%s'." % (type))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
193 fullTag = TagData(tag, type, value)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
194 self._tags.add(fullTag)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
195
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
196 def _parseUnique(self, type):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
197 value = BAM_TAG_CODE[type](self._data[:BAM_TAG_SIZE[type]])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
198 self._data = self._data[BAM_TAG_SIZE[type]:]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
199 return value
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
200
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
201 def _parseSeveral(self, type, size):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
202 value = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
203 for i in range(size):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
204 value.append(self._parseUnique(type))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
205 return value
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
206
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
207 def _parseString(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
208 value = ""
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
209 char = self._data[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
210 self._data = self._data[1:]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
211 while unpack_int8(char) != 0:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
212 value += char
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
213 char = self._data[0]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
214 self._data = self._data[1:]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
215 return value
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
216
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
217 def getTags(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
218 return self._tags.getTags()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
219
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
220 def __str__(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
221 return self._tags
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
222
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
223
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
224 class AlignedRead(object):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
225 def __init__(self, data, refs):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
226 self._data = data
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
227 self._refs = refs
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
228
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
229 def parse(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
230 self._parse_common()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
231 self._parse_flag_nc()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
232 self._parse_bin_mq_nl()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
233 self._parse_name()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
234 self._parse_cigar()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
235 self._parse_sequence()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
236 self._parse_quality()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
237 self._parse_tags()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
238
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
239 def _parse_common(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
240 ref_id = unpack_int32(self._data[0:4])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
241 self._chromosome = self._refs[ref_id]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
242 self._pos = unpack_int32(self._data[4:8]) + 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
243 mate_ref_id = unpack_int32(self._data[20:24])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
244 if mate_ref_id == -1:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
245 self._rnext = "*"
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
246 else:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
247 self._rnext = self._refs[mate_ref_id]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
248 if self._rnext == self._chromosome:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
249 self._rnext = "="
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
250 self._pnext = unpack_int32(self._data[24:28]) + 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
251 self._tlen = unpack_int32(self._data[28:32])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
252
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
253 def _parse_bin_mq_nl(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
254 bin_mq_nl = unpack_uint32(self._data[8:12])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
255 self._bin = bin_mq_nl >> 16
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
256 self._mappingQuality = bin_mq_nl >> 8 & 0xff
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
257 self._query_name_length = bin_mq_nl & 0xff
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
258
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
259 def _parse_flag_nc(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
260 flag_nc = unpack_uint32(self._data[12:16])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
261 self._flag = flag_nc >> 16
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
262 self._num_cigar_ops = flag_nc & 0xffff
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
263
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
264 def _parse_name(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
265 start = 32
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
266 stop = start + self._query_name_length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
267 self._name = unpack_string(self._data[start:stop])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
268
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
269 def _parse_cigar(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
270 start = 32 + self._query_name_length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
271 stop = start + (self._num_cigar_ops * 4)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
272 _buffer = self._data[start:stop]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
273 cigar = CigarData(_buffer, self._num_cigar_ops)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
274 self._cigar = cigar.getCigarData()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
275
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
276 def _parse_sequence(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
277 seq_length = unpack_int32(self._data[16:20])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
278 start = 32 + self._query_name_length + (self._num_cigar_ops * 4)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
279 stop = start + (seq_length + 1) / 2
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
280 _buffer = self._data[start:stop]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
281 self._sequence = ""
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
282 for i in range(seq_length):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
283 x = unpack_uint8(_buffer[(i / 2)])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
284 index = (x >> (4 * (1 - (i % 2)))) & 0xf
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
285 base = BAM_DNA_LOOKUP[index]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
286 self._sequence += base
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
287
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
288 def _parse_quality(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
289 seq_length = unpack_int32(self._data[16:20])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
290 start = 32 + self._query_name_length + (self._num_cigar_ops * 4) + (seq_length + 1) / 2
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
291 stop = start + seq_length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
292 _buffer = self._data[start:stop]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
293 self._quality = "".join(["%s" % (chr(unpack_int8(x) + 33)) for x in _buffer])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
294
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
295 def _parse_tags(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
296 seq_length = unpack_int32(self._data[16:20])
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
297 start = 32 + self._query_name_length + (self._num_cigar_ops * 4) + (seq_length + 1) / 2 + (seq_length + 1) - 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
298 stop = start + seq_length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
299 _buffer = self._data[start:]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
300 tagParser = TagParser(_buffer)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
301 self._tags = tagParser.getTags()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
302
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
303
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
304 class FileReader(object):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
305
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
306 def __init__(self, handle):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
307 self._handle = handle
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
308 self._readHeader()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
309
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
310 def _readHeader(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
311 magic = unpack_string(self._handle.read(4))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
312 if magic != "BAM\1":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
313 raise Exception("File should start with 'BAM\1', starting with '%s' instead." % (magic))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
314 tlen = unpack_int32(self._handle.read(4))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
315 text = unpack_string(self._handle.read(tlen))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
316 nrefs = unpack_int32(self._handle.read(4))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
317 self._refs = []
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
318 for i in range(nrefs):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
319 sizeName = unpack_int32(self._handle.read(4))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
320 name = unpack_string(self._handle.read(sizeName))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
321 size = unpack_int32(self._handle.read(4))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
322 self._refs.append(name)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
323 self._startPos = self._handle.tell()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
324
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
325 def reset(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
326 self._handle.seek(self._startPos)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
327
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
328 def getNextAlignment(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
329 try:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
330 blockSize = unpack_int32(self._handle.read(4))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
331 except struct.error:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
332 return False
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
333 block = self._handle.read(blockSize)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
334 currentRead = AlignedRead(block, self._refs)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
335 return currentRead
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
336
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
337
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
338
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
339 def parseAlignedRead(read):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
340 if (read._flag & 0x4) == 0x4:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
341 return None
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
342
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
343 mapping = Mapping()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
344 direction = 1 if (read._flag & 0x10) == 0x0 else -1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
345 genomeStart = read._pos
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
346 nbOccurrences = 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
347 nbMismatches = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
348 nbMatches = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
349 nbGaps = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
350 subMapping = None
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
351 queryOffset = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
352 targetOffset = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
353 readStart = None
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
354
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
355 for tag, value in read._tags.iteritems():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
356 if tag == "X0":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
357 nbOccurrences = value._value
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
358 elif tag == "X1":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
359 nbOccurrences += value._value
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
360 elif tag == "XM":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
361 nbMismatches = value._value
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
362 mapping.setTagValue("nbOccurrences", nbOccurrences)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
363 mapping.setTagValue("quality", read._mappingQuality)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
364
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
365 for operation in read._cigar:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
366 if operation._type == "M":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
367 if readStart == None:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
368 readStart = queryOffset
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
369 if subMapping == None:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
370 subMapping = SubMapping()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
371 subMapping.setSize(operation._length)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
372 subMapping.setDirection(direction)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
373 subMapping.queryInterval.setName(read._name)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
374 subMapping.queryInterval.setStart(queryOffset)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
375 subMapping.queryInterval.setDirection(direction)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
376 subMapping.targetInterval.setChromosome(read._chromosome)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
377 subMapping.targetInterval.setStart(genomeStart + targetOffset)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
378 subMapping.targetInterval.setDirection(1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
379 nbMatches += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
380 targetOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
381 queryOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
382 currentNumber = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
383 continue
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
384 if operation._type == "I":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
385 nbGaps += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
386 queryOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
387 currentNumber = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
388 continue
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
389 if operation._type == "D":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
390 if subMapping != None:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
391 subMapping.queryInterval.setEnd(queryOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
392 subMapping.targetInterval.setEnd(genomeStart + targetOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
393 mapping.addSubMapping(subMapping)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
394 subMapping = None
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
395 nbGaps += 1
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
396 targetOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
397 currentNumber = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
398 continue
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
399 if operation._type == "N":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
400 if subMapping != None:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
401 subMapping.queryInterval.setEnd(queryOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
402 subMapping.targetInterval.setEnd(genomeStart + targetOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
403 mapping.addSubMapping(subMapping)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
404 subMapping = None
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
405 targetOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
406 currentNumber = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
407 continue
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
408 if operation._type == "S":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
409 nbMismatches += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
410 targetOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
411 queryOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
412 currentNumber = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
413 continue
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
414 if operation._type == "H":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
415 targetOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
416 queryOffset += operation._length
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
417 currentNumber = 0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
418 continue
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
419 if operation._type == "P":
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
420 continue
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
421 raise Exception("Do not understand parameter '%s'" % (operation._type))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
422
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
423 if subMapping != None:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
424 subMapping.queryInterval.setEnd(queryOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
425 subMapping.targetInterval.setEnd(genomeStart + targetOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
426 mapping.addSubMapping(subMapping)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
427 mapping.queryInterval.setStart(readStart)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
428 mapping.queryInterval.setEnd(queryOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
429 mapping.targetInterval.setEnd(genomeStart + targetOffset - 1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
430 mapping.setNbMismatches(nbMismatches)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
431 mapping.setNbGaps(nbGaps)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
432 mapping.queryInterval.setName(read._name)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
433 mapping.queryInterval.setDirection(direction)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
434 mapping.targetInterval.setChromosome(read._chromosome)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
435 mapping.targetInterval.setStart(genomeStart)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
436 mapping.targetInterval.setDirection(direction)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
437 mapping.setSize(len(read._sequence))
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
438 mapping.setDirection(direction)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
439 return mapping
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
440
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
441
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
442 class BamParser(MapperParser):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
443 """A class that parses BAM format"""
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
444
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
445 def __init__(self, fileName, verbosity = 0):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
446 self.verbosity = verbosity
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
447 self.handle = gzip.open(fileName, "rb")
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
448 self.reader = FileReader(self.handle)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
449 self.nbMappings = None
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
450 self.fileName = fileName
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
451
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
452
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
453 def __del__(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
454 self.handle.close()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
455
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
456
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
457 def getFileFormats():
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
458 return ["bam"]
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
459 getFileFormats = staticmethod(getFileFormats)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
460
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
461
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
462 def reset(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
463 self.reader.reset()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
464
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
465
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
466 def getNextMapping(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
467 self.currentMapping = None
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
468 while self.currentMapping == None:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
469 read = self.reader.getNextAlignment()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
470 if not read:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
471 self.currentMapping = False
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
472 return False
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
473 read.parse()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
474 self.currentMapping = parseAlignedRead(read)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
475 return self.currentMapping
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
476
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
477
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
478 def setDefaultTagValue(self, name, value):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
479 pass
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
480
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
481
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
482 def skipFirstLines(self):
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
483 pass