annotate srf2fastq/io_lib-1.12.2/docs/ZTR_format @ 0:d901c9f41a6a default tip

Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author dawe
date Tue, 07 Jun 2011 17:48:05 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1 Notes: 28th May 2008
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
2
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
3 For version 2.0 consider the following:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
4
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
5 1) Remove defunct or useless chunk types and compression formats.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
6 2) Rationalise inconsitent behaviour (eg endianness on zlib chunk).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
7 3) Support split header/data formats for SRF
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
8 4) Formalise meta-data use better.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
9 5) More pie-in-the-sky ideas?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
10
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
11 What we've described so far could easily be said to be v1.4. It's
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
12 backwards compatible and fairly minor is change. If we truely want to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
13 go for version 2 then taking the chance to remove all those niggles
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
14 that we've kept purely for backwards compatibility would be good.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
15
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
16 In more detail:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
17
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
18 1) Removal of RLE and floating point chebyshev polynomials. Mark XRLE
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
19 as deprecated?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
20
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
21 We may wish to add an extra option to XRLE2 to indicate the repeat
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
22 count before specifying the remaining run-length. This breaks the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
23 format though. (Or add XRLE3 to allow such control?)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
24
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
25 2) Strange things I can see are:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
26
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
27 2.1) All chunks use big-endian data except for zlib which has a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
28 little-endian length.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
29
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
30 2.2) The order that data is stored in differs per chunk type. For
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
31 trace data we store all As, then all Cs, all Gs and finally
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
32 all Ts. For confidence values we store called first followed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
33 by remaining. Both SMP4 and CNF4 essentially hold 1 piece of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
34 data per base type per base position, it's just the word size
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
35 and packing order that differs.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
36
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
37 This means TSHIFT and QSHIFT compression types are tied very
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
38 much to trace and quality value chunks, rather than being
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
39 generic transforms. Maybe we should always have the same
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
40 encoding order and some standard compression/transformations
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
41 to reorder as desired.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
42
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
43 An example:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
44 All data related per call is stored in the natural order
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
45 produced. (eg as utilised in CNF1, BPOS).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
46
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
47 All data related per base-type per call is stored in the order
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
48 produced: A, C, G, T for the first base position, A, C, G, T
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
49 for the second position, and so on.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
50
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
51 Then we have standard filters that can swap between
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
52 ACGTACGTACGT... and AAA...CCC...GGG...TTT... or to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
53 <called><non-called * 3>... order (which requires a BASE chunk
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
54 present to encode/decode). We'd have 1, 2 and 4 byte variants
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
55 of such filters. They do not need to understand the nature of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
56 the data they're manipulating, just the word size and a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
57 predetermined order to shuffle the data around in.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
58
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
59 For CNF4 a combination of {ACGT}* to {<called><non-called*3>}*
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
60 followed by {ACGT}* to A*C*G*T* ordering would end up with all
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
61 <called> followed by all 3 remaining non-called. Ie as it is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
62 now (which we then promptly "undo" in solexa data by using
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
63 TSHIFT).a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
64
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
65 3) I'm wondering if there's mileage here in having negative lengths to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
66 indicate constant data + variable data further on.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
67
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
68 Eg length -10 means the next 10 bytes are the start of the data for
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
69 this chunk. Some stage later we'll read a 4-byte length followed by
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
70 the remaining data for this chunk.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
71
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
72 Rationale: often we end up with many identical bytes at the start
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
73 of a chunk. For example, we take a solexa trace (0 0 value...), run
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
74 it through TSHIFT (80 0 0 0 previous data => 80 0 0 0 0 0 value ...)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
75 and then through STHUFF (77 80(eg) data), but data is the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
76 compressed stream always starting with 80 0 0 0 0 0 so typically it's
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
77 always the same starting string.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
78
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
79 Tested on an SRF file I see SMP4 always starting with the same 9
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
80 bytes of data, BASE starting with the same 3 bytes and CNF4 always
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
81 starting with the same 7 bytes. Hence we'd have lengths -9, -3 and
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
82 -7 in the chunk headers and move that common data to the header
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
83 block too. That's approx 3% of the size of our SRF file.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
84
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
85 4) I propose *all* chunks have some standard meta-data fields
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
86 available for use. These can be:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
87
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
88 4.1) GROUP - all chunks sharing the same GROUP value are considered
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
89 as being related to one another. This provides a mechanism for
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
90 multiple base-call, base position and confidence value chunks
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
91 while still knowing which confidence values belong to which
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
92 call. It also allows for multiple SAMP chunks (instead of the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
93 SMP4 chunk) to be collated together if desired.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
94
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
95 I don't expect many ZTR files to contain calls from multiple
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
96 base-callers, but it's maybe a nice extension and seems quite
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
97 a simple/clean use of meta-data.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
98
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
99 4.2) ENCODING - the default encoding for the chunk data is as
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
100 described in the chunk. We may however wish to override this
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
101 and, for example, store SMP4 data as 32-bit floating point
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
102 values instead of 16-bit integers. This specifies that.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
103
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
104 Question: do we want this available universally everywhere? If
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
105 not, we should at least use the same meta-data keyword for all
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
106 occurrences.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
107
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
108 4.3) TRANSFORM - a simple transformation description. This is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
109 essentially a mini-formula. It replaces the OFFS meta-data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
110 used in SMP4 which is simply a transform of X+value.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
111
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
112 5) There are more generic ways to save storage by removing redundancy.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
113
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
114 Most probably they're not worth it, but I list them here for
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
115 discussion still.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
116
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
117 5.1) Use 7-bit variable sized encodings for values instead of fixed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
118 32-bit sizes.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
119
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
120 Eg instead of storing 1000 as 0x3*0x100 + 0xe8 (00 00 03 e8)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
121 we could store it as 0x7*0x80 + 0x68 (80|07 68). The logic
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
122 here being setting the top bit implies this isn't the final
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
123 value and more data follows. It allows for variable sized
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
124 fields so that small numbers take up fewer bytes. The same can
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
125 be applied to data in SRF structs too.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
126
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
127 Realistically it saves 2 bytes per record in SRF and an
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
128 unknown amount for ZTR - estimated 8 or so (3 for cnf4/base
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
129 and 2 for smp4). It's only 1.5% saving though in total.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
130
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
131 5.2) A general purpose dictionary system. Instead of attempting to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
132 move headers to one area and data somewhere else, possibly
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
133 also taking common portions of data and putting that somewhere
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
134 too, we could provide a dictionary system whereby we
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
135 previously remove redundancy by replacing all occurrences of a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
136 particular byte pattern with a new shorter code. (We'd need an
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
137 escape mechanism for when it occurs by chance.) The dictionary
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
138 can then be specified in it's own chunk which is stored in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
139 header portion.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
140
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
141 This then works for portions of chunk header (eg if the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
142 meta-data changes) rather than full headers, where the data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
143 blocks always start with the same text, or where we want to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
144 have sensible names in text fields but don't like them taking
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
145 up too much space.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
146
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
147 It's maybe a bit messy though and complex to implement, plus
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
148 it's unknown how big an impact having to escape accidental
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
149 dictionary codes from appearing in real data. The more formal
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
150 way of removing redundancy is probably better.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
151
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
152 5.3) Lossy compression. I believe there's still room for this,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
153 although it needs careful thought.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
154
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
155 The floating point format really isn't an ideal way to do it
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
156 though, so I'd much rather have an encoding system that uses
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
157 N*log(signal/M+1) plus a sign bit, stored in integers.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
158
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
159 As we store data in integers the value of N combined with the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
160 maximum value for log(signal/M+1) gives us the number of bits
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
161 we wish to encode to. Essentially we're storing the log value
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
162 to a fixed point precision.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
163
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
164 The value of M dictates the slope of the errors we get from
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
165 logging. It's hard to describe, but basically as signal gets
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
166 larger our average error in storing the signal also gets
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
167 larger. That's true for floating point values too as there's
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
168 a fixed number of bits and they're being used to represent
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
169 larger and larger values, meaning the resolution drops.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
170
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
171 I have various test code and graphs showing error profiles
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
172 for logs vs fixed point vs floating point. Logs or fixed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
173 point are nearly always preferable to a floating point format
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
174 for size vs accuracy.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
175
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
176 -----------------------------------------------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
177
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
178 CHANGE (since 1.2):
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
179 SAMP and SMP4 now has meta data fields indicating the zero base-line.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
180
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
181 CLARIFICATION
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
182 The specification now explicitly states that trace samples are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
183 unsigned, although the new OFFS meta-data can be used to turn these
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
184 into signed values.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
185
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
186 CLARIFICATION
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
187 We explicitly state that multiple TEXT chunks maybe present in the ZTR
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
188 file and will be concatenated together. Also the trailing (nul) byte
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
189 is now optional.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
190
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
191 CHANGE
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
192 Added CSET (character set) meta-data for BASEs so ABI SOLID encoding
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
193 can be used. This removes the requirement of IUPAC characters only.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
194
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
195 CHANGE
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
196 Added XRLE2, QSHIFT, TSHIFT and STHUFF compression types.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
197
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
198 INCOMPATIBLE CHANGE:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
199 I propose for this version to make all meta-data adhere to a specific
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
200 format rather than adhoc. It'll consist of zero or more copies of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
201 'identifier nul value nul'. See the format below for details.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
202
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
203 The only use of meta-data in 1.2 was for SAMP (not SMP4) chunks to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
204 indicate the channel the data came from. From now on file readers will
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
205 need to check the version number in the header to determine how to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
206 parse the SAMP meta-data.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
207
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
208
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
209 [Search for "FIXME" for my comments / questions to be answered. They
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
210 elaborate on the summary below and provide more context.]
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
211
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
212
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
213 QUESTION1:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
214 Should we adapt ZTR to not be so inefficient with regards
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
215 to tiny chunks. Specifically a 5 byte chunk size, 4 byte meta-data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
216 size (normally zero anyway) and 4 byte data length is all
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
217 wasteful. These combined comprise 5-10% of the total SRF size. Note
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
218 that changing this would break backwards compatibility.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
219
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
220 QUESTION2:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
221 Do I need a means to specify the "default meta-data". Specifically if
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
222 we have lots of SAMP chunks (for example) and every single one is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
223 stating that the zero "offset" value is 32768 then we may want a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
224 mechanism of specifying that the default OFFS value is 32768 for all
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
225 subsequent SAMP chunks.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
226
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
227 One possible way to do this is to have a new chunk type which sets the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
228 default. Eg for the SAMP chunk we could define a SaMP chunk to modify
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
229 the default for SAMP. This seems oddly named, but it's utilising the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
230 bit5 of the 2nd byte which so far has been reserved as zero. (In the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
231 first byte bit 5 set => private namespace and not part of the public spec.)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
232
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
233 For now I'm just ignoring this issue though.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
234
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
235 QUESTION3:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
236 I've defined new transforms named TSHIFT and QSHIFT specifically
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
237 designed for adjusting the layout of CND4 and SMP4 chunk types to an
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
238 order more amenable for compression by interlaced deflate. They do the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
239 job, but I'm wondering if it's better to simply redefine the input
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
240 data to be a more consistent ordering so that we can define more
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
241 general purpose transforms rather than one dedicated to the original
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
242 trace layout and one for the quality layout.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
243
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
244 I'm ignoring this for now as it would break backwards compatibility.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
245
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
246 QUESTION4:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
247 For the OFFS meta-data in SMP4 and SAMP chunks I have a 16-bit offset
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
248 to specify the zero position. Ie OFFS of 10000 means a sample of 9000
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
249 becomes -1000 after processing.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
250
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
251 Should it be a signed or unsigned 16-bit value. Signed means we could
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
252 encode values ranging from 10000 to 70000 by specify OFFS as -10000.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
253
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
254 Should it be 32-bit instead? Should we have OFFI and OFFF for integer
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
255 and floating point equivalents?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
256
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
257 QUESTION5:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
258 For region encoding where should the region name belong - the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
259 meta-data section or the REGION_LIST TEXT identifier? It's currently
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
260 in both places. My gut instinct tells me it belongs in the meta-data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
261 for the REGION_LIST chunk itself.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
262
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
263 QUESTION6:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
264 Can we have clarification on what the region code types mean,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
265 specifically "tech read".
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
266
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
267 QUESTION7:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
268 Should we add SAMP/SMP4 meta-data indicating a down-scale factor? For
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
269 454 data this could be 100, so we know value 123 is really 1.23. Note
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
270 this is maybe better implemented below using fixed-point precision.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
271
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
272 QUESTION8:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
273 How do we deal with floating point values?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
274
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
275 I think the chunk meta-data should detail the format of the data block
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
276 itself (as it is strictly speaking data about the data so it fits
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
277 there well). A lack of meta data should imply the usual unsigned
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
278 16-bit quantities.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
279
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
280 There's two main ways to encode fractions:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
281
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
282 Floating point where we have a mantissa and an exponent.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
283 - See http://en.wikipedia.org/wiki/IEEE_floating-point_standard
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
284 - large dynamic range
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
285 - fixed number of significant bits
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
286 - varying "resolution". Ie can represent tiny differences
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
287 between two very small floating point numbers, but not
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
288 between two very large floating point numbers.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
289
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
290 Fixed point where we have a fixed number of bits for the component
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
291 before and after the decimal point.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
292 - See http://en.wikipedia.org/wiki/Q_%28number_format%29
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
293 - constant resolution
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
294 - effectively used by SFF (specified to 2 decimal places)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
295 - easy to treat as integers so can be fast and dealt with by
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
296 small embedded CPUs without FPUs.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
297
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
298
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
299 Floating point maybe appropriate as effectively it's the same as
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
300 logging your signals and storing those. It offers large dynamic range
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
301 so can cope with abnormally large values (at the expense of precision)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
302 while retaining lots of variation at the low end to distinguish small
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
303 values. However it's CPU intensive to cope with anything other than
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
304 the CPU provided 32-bit and 64-bit floating point formats.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
305
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
306 Single precision 32-bit floats in IEEE-754 have:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
307 1 bit (31): Sign
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
308 8 bits (23-30): Exponent (bias 127, so stroring 100 => -27)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
309 23 bits (0-22): Mantissa
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
310
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
311 Effectively we store any binary value as a normalised expression:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
312
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
313 <exponent>
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
314 1.<mantissa> * 2
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
315
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
316
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
317 Eg 1732.5:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
318
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
319 => 11011000100.1 (binary)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
320 => 1.10110001001 (binary) * 2^10
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
321
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
322 Exponent+127 => 137 => 10001001 (binary)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
323
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
324 sign exponent mantissa
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
325 0 10001001 10110001001000000000000
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
326
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
327 (17325 => 0x43ad => 0x0010001110101101
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
328
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
329 However we probably want 16-bit and 24-bit floating point types for
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
330 efficiencies sake. Do we go with some fixed predefined floating point
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
331 formats for 8-bit, 16-bit, 24-bit and 32-bit layouts (with 32-bit
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
332 being identical to IEEE754) or do we allow for specification of the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
333 mantissa and exponent Eg FLOAT=23.8, FLOAT=17.6 or FLOAT=5.2 in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
334 meta-data block?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
335
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
336 FLOAT=17.6 (24-bit) gives ranges +/- 8.6*10^9
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
337 FLOAT=5.2 (8-bit) gives ranges +/- 64 (I think).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
338
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
339 Alternatively if we restrict ourselves to only using the most
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
340 significant 14 bits of the mantissa then storing as standard 32-bit
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
341 floats implies 1 in every 4 bytes is zero. This may provide for a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
342 very crude, but fast way to implement reduced size floating point
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
343 values - ie FLOAT=15.8 (24-bit signed).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
344
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
345 For fixed point (as in SFF values) there's already a draft standard
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
346 for implementation in C (ISO/IEC TR 18037:2004).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
347
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
348 One benefit of fixed point over floating point is speed of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
349 implementation. Fixed point numbers can just be dealt with as
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
350 integers. Eg subtracting two fixed point 16-bit values can be done in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
351 integers using a-b and the result is the same as if we'd done all the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
352 bit twiddling and maths directly simulating a real fixed-point unit.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
353
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
354 My gut feeling is that we'd want to explicitly declare the number of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
355 bits for integral and fractional components in the meta-data block.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
356
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
357 Comments?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
358
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
359 James
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
360
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
361 PS. The latest (only minor tweaks from before) ZTR draft spec
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
362 follows.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
363
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
364
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
365
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
366
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
367 1.3 draft 3 (19 Oct 2007)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
368
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
369 ZTR SPEC v1.3
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
370 =============
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
371
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
372 Header
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
373 ======
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
374
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
375 The header consists of an 8 byte magic number (see below), followed by
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
376 a 1-byte major version number and 1-byte minor version number.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
377
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
378 Changes in minor numbers should not cause problems for parsers. It indicates
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
379 a change in chunk types (different contents), but the file format is the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
380 same.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
381
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
382 The major number is reserved for any incompatible file format changes (which
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
383 hopefully should be never).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
384
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
385 /* The header */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
386 typedef struct {
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
387 unsigned char magic[8]; /* 0xae5a54520d0a1a0a (b.e.) */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
388 unsigned char version_major; /* 1 */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
389 unsigned char version_minor; /* 3 */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
390 } ztr_header_t;
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
391
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
392 /* The ZTR magic numbers */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
393 #define ZTR_MAGIC "\256ZTR\r\n\032\n"
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
394 #define ZTR_VERSION_MAJOR 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
395 #define ZTR_VERSION_MINOR 3
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
396
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
397 So the total header will consist of:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
398
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
399 Byte number 0 1 2 3 4 5 6 7 8 9
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
400 +--+--+--+--+--+--+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
401 Hex values |ae 5a 54 52 0d 0a 1a 0a|01 03|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
402 +--+--+--+--+--+--+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
403
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
404 Chunk format
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
405 ============
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
406
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
407 The basic structure of a ZTR file is (header,chunk*) - ie header followed by
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
408 zero or more chunks. Each chunk consists of a type, some meta-data and some
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
409 data, along with the lengths of both the meta-data and data.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
410
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
411 Byte number 0 1 2 3 4 5 6 7 8 9
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
412 +--+--+--+--+----+----+----+---+--+ - +--+--+--+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
413 Hex values | type |meta-data length | meta-data |data length| data .. |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
414 +--+--+--+--+----+----+----+---+--+ - +--+--+--+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
415
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
416 FIXME: For very short reads this is a large overhead. We have 8 bytes
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
417 of length information (of which typically only 1-2 are non-zero) and 4
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
418 bytes for type (which typically only has one of 4-5 values). This
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
419 means about 10 bytes wasted per chunk, or maybe 5-10% of the total
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
420 file size. Changing this would be a radical departure from ZTR; is it
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
421 justified given the savings? (est. 4.8% for 74bp reads, 8.4% for 27bp
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
422 reads).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
423 One idea if to consider a ZTR file (the non "block" components at
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
424 least) to be a series of huffman codes, by default all 8-bit long and
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
425 matching their ASCII codes. Then a dedicated chunk could be used to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
426 adjust these default codes. It's therefore backwards compatible, but
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
427 is that also overkill? (NB, this looks like it'd save 6% on the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
428 overall file size.)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
429
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
430 Ie in C:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
431
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
432 typedef struct {
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
433 uint4 type; /* chunk type (b.e.) */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
434 uint4 mdlength; /* length of meta-data field (b.e.) */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
435 char *mdata; /* meta data */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
436 uint4 dlength; /* length of data field (b.e.) */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
437 char *data; /* a format byte and the data itself */
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
438 } ztr_chunk_t;
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
439
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
440 All 2 and 4-byte integer values are stored in big endian format.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
441
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
442 The meta-data is uncompressed (and so it does not start with a format
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
443 byte). From version 1.3 onwards meta-data is defined to be in key
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
444 value pairs adhering to the same structure defined in the TEXT chunk
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
445 ("key\0value\0"). Exceptions are made for this only for purposes of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
446 backwards compatibility in the SAMP chunk type. The contents of the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
447 meta-data is chunk specific, and many chunk types will have no
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
448 meta-data. In this case the meta-data length field will be zero and
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
449 this will be followed immediately by the data-length field.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
450
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
451 Ie all meta-data adheres to the following structure:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
452
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
453 Meta-data: (version 1.3 onwards only)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
454 +- - -+--+- - -+--+- -+- - -+--+- - -+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
455 Hex values | ident | 0| value | 0| - | ident | 0| value | 0|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
456 +- - -+--+- - -+--+- -+- - -+--+- - -+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
457
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
458 FIXME: Can we have specify the meta-data once per ZTR file and omit it
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
459 in subsequent chunks? Eg a blank chunk with meta-data only in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
460 header. Chunks in the body then specify meta-data length as 0xFFFFFFFF
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
461 as an indicator meaning "use the last meta-data defined for this chunk
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
462 type". Useful when split in two, as in SRF?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
463
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
464 Note that this means both ident and values must not themselves contain
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
465 the zero byte (a nul character), hence we generally store ident-value
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
466 pairs in ASCII string forms.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
467
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
468 The data length ("dlength") is the length in bytes of the entire
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
469 'data' block, including the format information held within it.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
470
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
471 The first byte of the data consists of a format byte. The most basic format is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
472 zero - indicating that the data is "as is"; it's the real thing. Other formats
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
473 exist in order to encode various filtering and compression techniques. The
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
474 information encoded in the next bytes will depend on the format byte.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
475
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
476
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
477 RAW (#0) - no formatting
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
478 --------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
479
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
480 Byte number 0 1 2 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
481 +--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
482 Hex values | 0| raw data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
483 +--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
484
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
485 Raw data has no compression or filtering. It just contains the unprocessed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
486 data. It consists of a one byte header (0) indicating raw format followed by N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
487 bytes of data.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
488
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
489
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
490 RLE (#1) - simple run-length encoding
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
491 -------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
492
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
493 Byte number 0 1 2 3 4 5 6 7 8 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
494 +--+----+----+-----+-----+-------+--+--+--+-- - --+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
495 Hex values | 1| Uncompressed length | guard | run length encoded data|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
496 +--+----+----+-----+-----+-------+--+--+--+-- - --+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
497
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
498 Run length encoding replaces stretches of N identical bytes (with value V)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
499 with the guard byte G followed by N and V. All other byte values are stored
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
500 as normal, except for occurrences of the guard byte, which is stored as G 0.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
501 For example with a guard value of 8:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
502
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
503 Input data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
504 20 9 9 9 9 9 10 9 8 7
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
505
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
506 Output data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
507 1 (rle format)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
508 0 0 0 10 (original length)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
509 8 (guard)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
510 20 8 5 9 10 9 8 0 7 (rle data)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
511
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
512
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
513 ZLIB (#2) - see RFC 1950
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
514 ---------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
515
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
516 Byte number 0 1 2 3 4 5 6 7 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
517 +--+----+----+-----+-----+--+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
518 Hex values | 2| Uncompressed length | Zlib encoded data|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
519 +--+----+----+-----+-----+--+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
520
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
521 This uses the zlib code to compress a data stream. The ZLIB data may itself be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
522 encoded using a variety of methods (LZ77, Huffman), but zlib will
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
523 automatically determine the format itself. Often using zlib mode
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
524 Z_HUFFMAN_ONLY will provide best compression when combined with other
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
525 filtering techniques.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
526
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
527
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
528 XRLE (#3) - multi-byte run-length encoding
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
529 ---------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
530
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
531 Byte number 0 1 2 3 4 5 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
532 +--+------+-------+--+--+--+-- - --+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
533 Hex values | 3| size | guard | run length encoded data|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
534 +--+------+-------+--+--+--+-- - --+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
535
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
536 Much standard RLE, but this mechanism has a byte to specify the length
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
537 of the data item we compare to check for runs. It is not restricted to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
538 spotted runs aligned on 'size' byte boundaries either.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
539
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
540 No uncompressed length is encoded here as technically this is not
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
541 required (although it does make decoding a bit slower). The compressed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
542 length alone is sufficient to work out the uncompressed length after
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
543 decompressing.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
544
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
545 Guard bytes in the input stream are 'escaped' by the replacing the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
546 guard byte followed by zero. Guard bytes in a parameterised run (ie X
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
547 copies of Y where Y contains the guard) do not need to be 'escaped'
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
548
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
549 Input data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
550 10 12 12 13 12 13 12 13 12 13 14
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
551
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
552 Output data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
553 3 (xrle format)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
554 2 (size of blocks to compare)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
555 12 (guard, 12 is a bad choice but illustrative)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
556 10 12 0 12 4 12 13 14 (rle data)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
557
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
558
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
559 XRLE2 (#4) - word aligned multi-byte run-length encoding
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
560 ----------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
561 Version 1.3 onwards
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
562
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
563 Byte number 0 1 RSZ multiple of RSZ
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
564 +--+-----+---------+-- - - - - - - - - - ---+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
565 Hex values | 4| RSZ | padding | run length encoded data|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
566 +--+-----+---------+-- - - - - - - - - - ---+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
567
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
568 This achieves the same goal as XRLE, but is designed to maintain data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
569 aligned to specific 'record size' boundaries. This sometimes has
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
570 benefits over XRLE in that subsequent a interlaced deflate entropy
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
571 encoding may work better on record-aligned data streams.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
572
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
573 The first byte holds the format (#4) while the record size (RSZ) is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
574 held in the second byte. In order to ensure the entire block of data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
575 is aligned on 'RSZ' bounaries RSZ-2 padding bytes are written out
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
576 before the data itself starts. The contents of these bytes can be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
577 anything.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
578
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
579 Unlike XRLE it also does not use an explicit guard byte. If we term a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
580 'word' to be a block of data of size RSZ, then whenever we read a word
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
581 which is identical to the last word written then we write out that
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
582 word (so we have two consecutive words in the output data) followed by
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
583 a counter of how many additional copies of that word are found, up to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
584 255. This counter consists of 1 byte indicating the number of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
585 additional copies of the word followed by RSZ-1 padding bytes to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
586 maintain word alignment. While the contents of these padding bytes may
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
587 be anything, it is suggested that they adhere to same value
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
588 distribution as observed elsewhere in the data block in order to keep
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
589 the data entropy low. (For example repeating the previous bytes from
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
590 'word' will do.)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
591
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
592 Example:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
593
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
594 Input data: taken in pairs:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
595 1 0 2 2 2 2 3 1 3 1 3 1 2 4 2 4 2 4 2 3
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
596
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
597 Output data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
598 4 2 (xrle2 format, rec size 2)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
599 1 0 ("1 0" from input)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
600 2 2 2 2 0 2 ("2 2" x 2)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
601 3 1 3 1 1 1 ("3 1" x 3)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
602 2 4 2 4 1 4 ("2 4" x 3)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
603 2 3 ("2 3")
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
604
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
605
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
606 DELTA1 (#64) - 8-bit delta
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
607 ------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
608
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
609 Byte number 0 1 2 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
610 +--+-------------+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
611 Hex values |40| Delta level | data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
612 +--+-------------+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
613
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
614 This technique replaces successive bytes with their differences. The level
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
615 indicates how many rounds of differencing to apply, which should be between 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
616 and 3. For determining the first difference we compare against zero. All
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
617 differences are internally performed using unsigned values with automatic an
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
618 wrap-around (taking the bottom 8-bits). Hence 2-1 is 1 and 1-2 is 255.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
619
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
620 For example, with level set to 1:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
621
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
622 Input data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
623 10 20 10 200 190 5
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
624
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
625 Output data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
626 1 (delta1 format)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
627 1 (level)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
628 10 10 246 190 246 71 (delta data)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
629
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
630 For level set to 2:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
631
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
632 Input data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
633 10 20 10 200 190 5
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
634
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
635 Output data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
636 1 (delta1 format)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
637 2 (level)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
638 10 0 236 200 56 81 (delta data)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
639
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
640
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
641 DELTA2 (#65) - 16-bit delta
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
642 ------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
643
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
644 Byte number 0 1 2 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
645 +--+-------------+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
646 Hex values |41| Delta level | data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
647 +--+-------------+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
648
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
649 This format is as data format 64 except that the input data is read in 2-byte
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
650 values, so we take the difference between successive 16-bit numbers. For
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
651 example "0x10 0x20 0x30 0x10" (4 8-bit numbers; 2 16-bit numbers) yields "0x10
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
652 0x20 0x1f 0xf0". All 16-bit input data is assumed to be aligned to the start
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
653 of the buffer and is assumed to be in big-endian format.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
654
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
655
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
656 DELTA2 (#66) - 32-bit delta
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
657 ------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
658
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
659 Byte number 0 1 2 3 4 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
660 +--+-------------+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
661 Hex values |42| Delta level | 0| 0| data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
662 +--+-------------+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
663
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
664
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
665 This format is as data formats 64 and 65 except that the input data is read in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
666 4-byte values, so we take the difference between successive 32-bit numbers.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
667
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
668 Two padding bytes (2 and 3) should always be set to zero. Their purpose is to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
669 make sure that the compressed block is still aligned on a 4-byte boundary
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
670 (hence making it easy to pass straight into the 32to8 filter).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
671
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
672
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
673 Data format 67-69/0x43-0x45 - reserved
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
674 ---------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
675
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
676 At present these are reserved for dynamic differencing where the 'level' field
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
677 varies - applying the appropriate level for each section of data. Experimental
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
678 at present...
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
679
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
680
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
681 16TO8 (#70) - 16 to 8 bit conversion
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
682 -----------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
683
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
684 Byte number 0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
685 +--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
686 Hex values |46| data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
687 +--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
688
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
689 This method assumes that the input data is a series of big endian 2-byte
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
690 signed integer values. If the value is in the range of -127 to +127 inclusive
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
691 then it is written as a single signed byte in the output stream, otherwise we
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
692 write out -128 followed by the 2-byte value (in big endian format). This
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
693 method works well following one of the delta techniques as most of the 16-bit
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
694 values are typically then small enough to fit in one byte.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
695
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
696 Example input data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
697 0 10 0 5 -1 -5 0 200 -4 -32 (bytes)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
698 (As 16-bit big-endian values: 10 5 -5 200 -800)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
699
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
700 Output data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
701 70 (16-to-8 format)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
702 10 5 -5 -128 0 200 -128 -4 -32
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
703
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
704
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
705 32TO8 (#71) - 32 to 8 bit conversion
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
706 -----------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
707
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
708 Byte number 0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
709 +--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
710 Hex values |47| data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
711 +--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
712
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
713 This format is similar to format 16TO8, but we are reducing 32-bit numbers (big
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
714 endian) to 8-bit numbers.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
715
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
716
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
717 FOLLOW1 (#72) - "follow" predictor
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
718 -------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
719
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
720 Byte number 0 1 FF 100 101 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
721 +--+-- - - - --+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
722 Hex values |48| follow bytes | data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
723 +--+-- - - - --+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
724
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
725 For each symbol we compute the most frequent symbol following it. This is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
726 stored in the "follow bytes" block (256 bytes). The first character in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
727 data block is stored as-is. Then for each subsequent character we store the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
728 difference between the predicted character value (obtained by using
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
729 follow[previous_character]) and the real value. This is a very crude, but
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
730 fast, method of removing some residual non-randomness in the input data and so
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
731 will reduce the data entropy. It is best to use this prior to entropy encoding
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
732 (such as huffman encoding).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
733
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
734
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
735 CHEB445 (#73) - floating point 16-bit chebyshev polynomial predictor
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
736 -------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
737 Version 1.1 only.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
738 Deprecated: replaced by format 74 in Version 1.2.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
739
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
740 WARNING: This method was experimental and have been replaced with an
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
741 integer equivalent. The floating point method may give system specific
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
742 results.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
743
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
744 Byte number 0 1 2 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
745 +--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
746 Hex values |49| 0| data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
747 +--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
748
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
749 This method takes big-endian 16-bit data and attempts to curve-fit it using
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
750 chebyshev polynomials. The exact method employed uses the 4 preceeding values
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
751 to calculate chebyshev polynomials with 5 coefficents. Of these 5 coefficients
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
752 only 4 are used to predict the next value. Then we store the difference
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
753 between the predicted value and the real value. This procedure is repeated
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
754 throughout each 16-bit value in the data. The first four 16-bit values are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
755 stored with a simple 1-level 16-bit delta function. Reversing the predictor
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
756 follows the same procedure, except now adding the differences between stored
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
757 value and predicted value to get the real value.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
758
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
759
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
760 ICHEB (#74) - integer based 16-bit chebyshev polynomial predictor
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
761 -----------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
762 Version 1.2 onwards
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
763 This replaces the floating point CHEB445 format in ZTR v1.1.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
764
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
765
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
766 Byte number 0 1 2 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
767 +--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
768 Hex values |4A| 0| data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
769 +--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
770
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
771 This method takes big-endian 16-bit data and attempts to curve-fit it using
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
772 chebyshev polynomials. The exact method employed uses the 4 preceeding values
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
773 to calculate chebyshev polynomials with 5 coefficents. Of these 5 coefficients
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
774 only 4 are used to predict the next value. Then we store the difference
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
775 between the predicted value and the real value. This procedure is repeated
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
776 throughout each 16-bit value in the data. The first four 16-bit values are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
777 stored with a simple 1-level 16-bit delta function. Reversing the predictor
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
778 follows the same procedure, except now adding the differences between stored
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
779 value and predicted value to get the real value.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
780
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
781 STHUFF (#77) - Interlaced Deflate
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
782 ------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
783 Version 1.3 onwards
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
784
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
785 Byte number 0 1 2 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
786 +--+--+-- - - - - - --+-- - - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
787 Hex values |4D| C| huffman codes | data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
788 +--+--+-- - - - - - --+-- - - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
789
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
790 This compresses data using huffman encoding using the Deflate
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
791 algorithm for storing the codes and data. It is analogous to using
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
792 zlib with the Z_HUFFMAN_ONLY strategy and a negative window
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
793 size. However it has a few tweaks for optimal compression of very
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
794 small data sets. See RFC 1951 for details of Deflate. If the following
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
795 text is in decrepancy with RFC 1951 then the RFC takes priority. The
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
796 following is included as additional explanatory material only.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
797
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
798 Huffman compression works by replacing each character (or 'symbol')
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
799 with a string of bits. Common symbols have are encoded using few bits
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
800 and rare symbols need a longer string of bits. The net effect is that
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
801 the overall number of bits needed to store a message is reduced.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
802
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
803 To uncompress a compressed data stream it is necessary to know which
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
804 symbols are present and what their bit-strings are. For brevity this
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
805 is achieved by storing only the lengths of the bit-string for each
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
806 symbol and generating bit-strings from the lengths. As long as the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
807 same canonical algorithm is used in both the encoder and decoder then
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
808 knowing the lengths alone is sufficient. Knowledge of this algorithm
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
809 is required for uncompressing the data, so it is defined as follows:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
810
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
811 1. Sort symbols by the length of their bit-strings, smallest first.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
812
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
813 The collating order for symbols sharing the same length is defined
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
814 as ASCII values 0 to 255 inclusive followed by the EOF symbol.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
815
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
816 2. X = 0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
817
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
818 3. For all bit lengths 'L' from 1 to 24 inclusive:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
819
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
820 For all Symbols of bit length 'L', sorted as above:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
821 Code(Symbol) = least significant 'L' bits of X
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
822 X = X + 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
823 End loop
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
824
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
825 X = X * 2
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
826
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
827 End loop
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
828
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
829 This is the same algorithm utilised in the Deflate algorithm (RFC 1951).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
830
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
831
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
832 For example compressing "abracadabra" gives: /\
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
833 0 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
834 Symbol bit-length Code(X) / \
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
835 ------------------------------- a /\
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
836 a 1 0 0 / \
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
837 b 3 4 100 0 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
838 c 3 5 101 / \
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
839 r 3 6 110 / \
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
840 d 4 14 1110 /\ /\
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
841 EOF 4 15 1111 0 1 0 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
842 / \ / \
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
843 which in turn leads to 28 bits b c r /\
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
844 of output: 0 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
845 / \
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
846 0100110010101110010011001111 d EOF
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
847 (ab r ac ad ab r aEOF)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
848
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
849
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
850 In the data format defined above, 'C' is a code-set number. If it is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
851 zero the the huffman codes to uncompress 'data' are stored in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
852 following bytes using the same format describe in the DFLH chunk type
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
853 below, otherwise no huffman codes are stored and a predefined
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
854 set of huffman codes are used being either defined in a preceeding
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
855 DFLH chunk (for 128 <= 'C' <= 255) or statically defined in this
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
856 document (for 1 <= 'C' <= 127). Immediately following this is the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
857 compressed bit-stream itself.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
858
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
859 The statically defined huffman code-sets are as follows. The symbols
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
860 are listed below as their printable ASCII character or hash followed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
861 by a number, so A and #65 are the same symbol. We use the algorithm
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
862 described above to turn these bit-lengths into actual huffman codes.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
863
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
864 C=1: CODE_DNA
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
865
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
866 Length Symbols
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
867 ----------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
868 2 A C T
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
869 3 G
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
870 4 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
871 5 #0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
872 6 EOF
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
873 13 #1 to #6 inclusive
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
874 14 #7 to #255 except where already listed above
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
875
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
876 C=2: CODE_DNA_AMBIG (DNA with IUPAC ambiguity codes)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
877
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
878 Length Symbols
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
879 ----------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
880 2 A C T
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
881 3 G
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
882 4 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
883 7 #0 #45
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
884 8 B D H K M R S V W Y
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
885 11 EOF
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
886 14 #226
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
887 15 #1 to #255 except where already listed above
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
888
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
889 C=3: CODE_ENGLISH (English text)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
890
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
891 Length Symbols
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
892 ----------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
893 3 #32 e
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
894 4 a i n o s t
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
895 5 d h l r u
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
896 6 #10 #13 #44 c f g m p w y
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
897 7 #46 b v
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
898 8 #34 I k
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
899 9 #45 A N T
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
900 10 #39 #59 #63 B C E H M S W x
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
901 11 #33 0 1 F G
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
902 15 #0 to #255 except where already listed above
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
903
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
904
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
905 It is recommended that this compression format is used only for small
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
906 data sizes and ZLIB is used for larger (a few K and above) data.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
907
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
908
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
909 QSHIFT (#79) - 4-byte quality reorder
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
910 ------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
911 Version 1.3 onwards
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
912
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
913 This reorders the quality signal to be 4-tuples of the quality for the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
914 called base followed by the quality of the other 3 base types in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
915 order they appear in a,c,g,t (minus the called base).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
916
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
917 The purpose is to allow a 4-byte interlaced deflate algorithm to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
918 operate efficiently.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
919
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
920
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
921 TSHIFT (#70) - 8-byte trace reorder
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
922 ------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
923 Version 1.3 onwards
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
924
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
925 This reorders the trace signal to be 4-tuples of the 16-bit trace
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
926 signals for the called base followed by the signal from the other 3
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
927 base types in the order they appear in a,c,g,t (minus the called
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
928 base).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
929
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
930 The purpose is to allow a 8-byte interlaced deflate algorithm to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
931 operate efficiently.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
932
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
933 FIXME: QSHIFT and TSHIFT could be general purpose byte rearrangements
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
934 without any knowledge of the data type they're holding. They need the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
935 input data to be consistently ordered and not the large differences we
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
936 see between quality and trace right now.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
937
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
938
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
939 Version 1.3 onwards
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
940 Chunk types
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
941 ===========
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
942
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
943 As described above, each chunk has a type. The format of the data contained in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
944 the chunk data field (when written in format 0) is described below.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
945 Note that no chunks are mandatory. It is valid to have no chunks at all.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
946 However some chunk types may depend on the existance of others. This will be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
947 indicated below, where applicable.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
948
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
949 Each chunk type is stored as a 4-byte value. Bit 5 of the first byte is used
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
950 to indicate whether the chunk type is part of the public ZTR spec (bit 5 of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
951 first byte == 0) or is a private/custom type (bit 5 of first byte == 1). Bit
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
952 5 of the remaining 3 bytes is reserved - they must always be set to zero.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
953
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
954 Practically speaking this means that public chunk types consist entirely of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
955 upper case letters (eg TEXT) whereas private chunk types start with a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
956 lowercase letter (eg tEXT). Note that in this example TEXT and tEXT are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
957 completely independent types and they may have no more relationship with each
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
958 other than (for example) TEXT and BPOS types.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
959
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
960 It is valid to have multiples of some chunks (eg text chunks), but not for
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
961 others (such as base calls). The order of chunks does not matter unless
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
962 explicitly specified.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
963
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
964 A chunk may have meta-data associated with it. This is data about the data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
965 chunk. For example the data chunk could be a series of 16-bit trace samples,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
966 while the meta-data could be a label attached to that trace (to distinguish
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
967 trace A from traces C, G and T). Meta-data is typically very small and so it
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
968 is never need be compressed in any of the public chunk types (although
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
969 meta-data is specific to each chunk type and so it would be valid to have
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
970 private chunks with compressed meta-data if desirable).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
971
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
972 The first byte of each chunk data when uncompressed must be zero, indicating
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
973 raw format. If, having read the chunk data, this is not the case then the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
974 chunk needs decompressing or reverse filtering until the first byte is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
975 zero. There may be a few padding bytes between the format byte and the first
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
976 element of real data in the chunk. This is to make file processing simpler
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
977 when the chunk data consists of 16 or 32-bit words; the padding bytes ensure
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
978 that the data is aligned to the appropriate word size. Any padding bytes
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
979 required will be listed in the appopriate chunk definition below.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
980
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
981
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
982 The following lists the chunk types available in 32-bit big-endian format.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
983 In all cases the data is presented in the uncompressed form, starting with the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
984 raw format byte and any appropriate padding.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
985
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
986 SAMP
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
987 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
988
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
989 Or Meta-data: (version 1.2 and before)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
990 Byte number 0 1 2 3
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
991 +--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
992 Hex values | data name |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
993 +--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
994
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
995 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
996 Byte number 0 1 2 3 4 5 6 7 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
997 +--+--+--+--+--+--+--+--+- -+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
998 Hex values | 0| 0| data| data| data| - |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
999 +--+--+--+--+--+--+--+--+- -+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1000
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1001 This encodes a series of 16-bit unsigned trace samples. The first data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1002 byte is the format (raw); the second data byte is present for padding
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1003 purposes only. After that comes a series of 16-bit big-endian
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1004 values. Although stored as unsigned, a baseline value can be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1005 specified which is should then be subtracted from all values to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1006 generated signed data if required. By default the baseline is zero.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1007
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1008 Valid identifiers for the meta-data (version 1.3 onwards) are:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1009
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1010 Ident Value(s)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1011 ---------------------------------------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1012 TYPE "A", "C", "G", "T", "PYNO" or "PYRW"
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1013 OFFS 16-bit signed integer representing the 'zero' position,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1014 in ASCII.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1015
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1016 [ FIXME: signed or unsigned? Signed means we couldn't store data in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1017 the range from -48K to +16K. Unsigned means we couldn't store data in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1018 the range 10K to 70K. What's most useful? Or should OFFS be 32-bit
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1019 instead? ]
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1020
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1021 Versions prior to 1.3 specified meta-data consisted of a single 4-byte
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1022 block containing a 4-byte name associated with the trace. If a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1023 type-name is shorter than 4 bytes then it should be right padded with
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1024 nul characters to 4 bytes. For sequencing traces the four lanes
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1025 representig A, C, G and T signals have names "A\0\0\0", "C\0\0\0",
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1026 "G\0\0\0" and "T\0\0\0". PYNO and PYRW refer to normalised and raw
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1027 pyrogram data (eg from 454 instruments). At present other names are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1028 not reserved, but it is recommended that (for consistency with
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1029 elsewhere) you label private trace arrays with names starting in a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1030 lowercase letter (specifically, bit 5 is 1).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1031
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1032 For the purposes of backwards compatibility, readers should check the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1033 version number in the ZTR header to determine whether the old or new
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1034 style meta-data formatting is in use.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1035
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1036 For sequencing traces it is expected that there will be four SAMP chunks,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1037 although the order is not specified.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1038
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1039
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1040 SMP4
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1041 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1042
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1043 Meta-data: optional - see below
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1044
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1045 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1046 Byte number 0 1 2 3 4 5 6 7 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1047 +--+--+--+--+--+--+--+--+- -+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1048 Hex values | 0| 0| data| data| data| - |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1049 +--+--+--+--+--+--+--+--+- -+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1050
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1051
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1052 As per SAMP, this encodes a series of unsigned 16-bit trace values, to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1053 be base-line corrected by the OFFS meta-data value as appropriate.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1054
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1055 The first byte is 0 (raw format). Next is a single padding byte (also 0).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1056 Then follows a series of 2-byte big-endian trace samples for the "A" trace,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1057 followed by a series of 2-byte big-endian traces samples for the "C" trace,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1058 also followed by the "G" and "T" traces (in that order). The assumption is
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1059 made that there is the same number of data points for all traces and hence the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1060 length of each trace is simply the number of data elements divided by four.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1061
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1062 Experimentation has shown that this gives around 3% saving over 4
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1063 separate SAMP chunks, but it lacks in flexibility.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1064
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1065 Valid identifiers for the meta-data are:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1066
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1067 Ident Value(s)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1068 ---------------------------------------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1069 OFFS 16-bit signed integer representing the 'zero' position
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1070 TYPE The type of data-set encoded. Values can be:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1071 "PROC" - processed data for viewing, also the default
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1072 when no type field is found.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1073 "SLXI" - Illumina GA raw intensities (.int.txt files)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1074 "SLXN" - Illumina GA noise intensities (.nse.txt files)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1075
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1076
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1077 BASE
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1078 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1079
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1080 Meta-data: optional - see below
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1081
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1082 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1083 Byte number 0 1 2 3 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1084 +--+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1085 Hex values | 0| base calls |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1086 +--+--+--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1087
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1088 The first byte is 0 (raw format). This is followed by the base calls in ASCII
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1089 format (one base per byte). By default it is assumed that all base
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1090 calls are stored using the IUPAC characters[1].
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1091
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1092 Valid identifiers for the meta-data are:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1093
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1094 Ident Meaning Value(s)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1095 ---------------------------------------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1096 CSET Character-set 'I' (ASCII #73) => IUPAC ("ACGTUMRWSYKVHDBN")
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1097 '0' (ASCII #49) => ABI SOLiD ("0123N")
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1098
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1099 BPOS
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1100 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1101
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1102 Meta-data: none present
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1103
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1104 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1105 Byte number 0 1 2 3 4 5 6 7
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1106 +--+--+--+--+--+--+--+--+- -+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1107 Hex values | 0| padding| data | - | data |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1108 +--+--+--+--+--+--+--+--+- -+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1109
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1110 This chunk contains the mapping of base call (BASE) numbers to sample (SAMP)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1111 numbers; it defines the position of each base call in the trace data. The
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1112 position here is defined as the numbering of the 16-bit positions held in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1113 SAMP array, counting zero as the first value.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1114
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1115 The format is 0 (raw format) followed by three padding bytes (all 0). Next
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1116 follows a series of 4-byte big-endian numbers specifying the position of each
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1117 base call as an index into the sample arrays (when considered as a 2-byte
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1118 array with the format header stripped off).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1119
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1120 Excluding the format and padding bytes, the number of 4-byte elements should
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1121 be identical to the number of base calls. All sample numbers are counted from
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1122 zero. No sample number in BPOS should be beyond the end of the SAMP arrays
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1123 (although it should not be assumed that the SAMP chunks will be before this
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1124 chunk). Note that the BPOS elements may not be totally in sorted order as
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1125 the base calls may be shifted relative to one another due to compressions.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1126
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1127
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1128 CNF1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1129 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1130
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1131 Meta-data: optional - see below
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1132
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1133 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1134 Byte number 0 1 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1135 +--+--+-- - --+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1136 Hex values | 0| call confidence |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1137 +--+--+-- - --+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1138
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1139 (N == number of bases in BASE chunk)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1140
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1141 Valid identifiers for the meta-data are:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1142
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1143 Ident Value(s) Meaning
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1144 ---------------------------------------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1145 SCALE PH Phred-scaled confidence values. (Default). i.e. for
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1146 a call with probability p: -10*log10(1-p)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1147 LO Log-odds scaled values. ie: 10*log10(p/(1-p))
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1148
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1149
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1150 The first byte of this chunk is 0 (raw format). This is then followed by a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1151 series signed 8-bit confidence values for the called bases.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1152
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1153 Either phred or log-odds (as used by the Illumina GA) scale ranges are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1154 appropriate.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1155
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1156
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1157 CNF4
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1158 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1159
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1160 Meta-data: optional - see below
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1161
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1162 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1163 Byte number 0 1 N 4N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1164 +--+--+-- - --+--+----- - -----+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1165 Hex values | 0| call confidence | A/C/G/T conf |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1166 +--+--+-- - --+--+----- - -----+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1167
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1168 (N == number of bases in BASE chunk)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1169
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1170 Valid identifiers for the meta-data are:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1171
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1172 Ident Value(s) Meaning
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1173 ---------------------------------------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1174 SCALE PH Phred-scaled confidence values. i.e. for a call
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1175 with probability p: -10*log10(1-p)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1176 (NB: default, but often inappropriate.)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1177 LO Log-odds scaled values. ie: 10*log10(p/(1-p))
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1178
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1179
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1180 The first byte of this chunk is 0 (raw format). This is then followed by a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1181 series signed 8-bit confidence values for the called base. Next comes
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1182 all the remaining confidence values for A, C, G and T excluding those
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1183 that have already been written (ie the called base). So for a sequence
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1184 AGT we would store confidences A1 G2 T3 C1 G1 T1 A2 C2 T2 A3 C3 G3.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1185
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1186 The purpose of this is to group the (likely) highest confidence value (those
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1187 for the called base) at the start of the chunk followed by the remaining
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1188 values. Hence if phred confidence values are written in a CNF4 chunk the first
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1189 quarter of chunk will consist of phred confidence values and the last three
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1190 quarters will (assuming no ambiguous base calls) consist entirely of zeros.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1191
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1192 For the purposes of storage the confidence value for a base call that is not
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1193 A, C, G or T (in any case) is stored as if the base call was T.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1194
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1195 If only one confidence value exists per base then either the phred or
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1196 log-odds scales work well. The first N bytes will be the called bases
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1197 and the remaining 3*N will be zero (optimal for run-length-encoding),
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1198 but consider using the CNF1 chunk type instead in this situation.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1199
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1200 If all 4 base types have their own confidence value then the log-odds
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1201 scale will work well. In this case the phred scale is an inappropriate
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1202 choice as it cannot encode both very likely and very unlikely events.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1203
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1204 Note: if this chunk exists it must exist after a BASE chunk.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1205
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1206 TEXT
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1207 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1208
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1209 Meta-data: none present
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1210
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1211 Data: 0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1212 +--+- - -+--+- - -+--+- -+- - -+--+- - -+--+-----+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1213 Hex values | 0| ident | 0| value | 0| - | ident | 0| value | 0| (0) |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1214 +--+- - -+--+- - -+--+- -+- - -+--+- - -+--+-----+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1215
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1216 This contains a series of "identifier\0value\0" pairs.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1217
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1218 The identifiers and values may be any length and may contain any data
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1219 except the nul character. The nul character marks the end of the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1220 identifier or the end of the value. Multiple identifier-value pairs
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1221 are allowable. Prior to version 1.3 a double nul character marked the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1222 end of the list (labeled "(0)" above), but from version 1.3 the end
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1223 of the list may also be marked by the end of chunk.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1224
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1225 Identifiers starting with bit 5 clear (uppercase) are part of the public ZTR
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1226 spec. Any public identifier not listed as part of this spec should be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1227 considered as reserved. Identifiers that have bit 6 set (lowercase) are for
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1228 private use and no restriction is placed on these.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1229
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1230 Multiple TEXT chunks may exist within the ZTR file. If so they are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1231 considered to be concatenated together.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1232
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1233 See below for the text identifier list.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1234
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1235 CLIP
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1236 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1237
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1238 Meta-data: none present
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1239
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1240 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1241 Byte number 0 1 2 3 4 5 6 7 8
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1242 +--+--+--+--+--+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1243 Hex values | 0| left clip | right clip|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1244 +--+--+--+--+--+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1245
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1246 This contains suggested quality clip points. These are stored as zero (raw
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1247 data) followed by a 4-byte big endian value for the left clip point and a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1248 4-byte big endian value for the right clip point. Clip points are defined in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1249 units of base calls, starting from 0. (Q: is that correct!?)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1250
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1251
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1252
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1253 CR32
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1254 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1255
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1256 Meta-data: none present
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1257
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1258 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1259 Byte number 0 1 2 3 4
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1260 +--+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1261 Hex values | 0| CRC-32 |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1262 +--+--+--+--+--+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1263
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1264 This chunk is always just 4 bytes of data containing a CRC-32 checksum,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1265 computed according to the widely used ANSI X3.66 standard. If present, the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1266 checksum will be a check of all of the data since the last CR32 chunk.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1267 This will include checking the header if this is the first CR32 chunk, and
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1268 including the previous CRC32 chunk if it is not. Obviously the checksum will
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1269 not include checks on this CR32 chunk.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1270
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1271
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1272 COMM
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1273 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1274
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1275 Meta-data: none present
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1276
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1277 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1278 Byte number 0 1 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1279 +--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1280 Hex values | 0| free text |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1281 +--+-- - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1282
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1283 This allows arbitrary textual data to be added. It does not require a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1284 identifier-value pairing or any nul termination.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1285
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1286
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1287 DFLH
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1288 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1289
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1290 Meta-data: none present
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1291
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1292 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1293 Byte number 0 1 N
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1294 +--+--+-- - - - - - - - - - - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1295 Hex values | 0| C| Deflate format data ... |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1296 +--+--+-- - - - - - - - - - - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1297
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1298 'C' is the code-set number referred to within that compression method.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1299 It should be 128 onwards and is used to distinguish between multiple
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1300 huffman tables. It is used in conjunction with the data compression
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1301 format 77 ("Deflate").
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1302
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1303 Following this is data in the Deflate format (RFC 1951). This should
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1304 consist of the header for a single block using dynamic huffman with
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1305 the BFINAL (last block) flag set.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1306
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1307 In Deflate streams the end of the huffman codes and the start of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1308 the compressed data stream itself may occur part way through a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1309 byte. Therefore the last byte of the this block is bitwise ORed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1310 with the first byte of the data stream compressed referring back to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1311 this code-set number. Therefore all unused bits in the last byte of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1312 this block should be set to zero. Likewise if the data bit-stream in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1313 this block ends on an exact byte boundary then an additional blank
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1314 byte must be added to ensure the ORing method above still works.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1315
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1316
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1317 DFLC
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1318 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1319
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1320 Meta-data: none present
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1321
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1322 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1323 Byte number 0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1324 +--+---+- - - - ---+--+-- - - - - - - - - - - - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1325 Hex values | 0| C |code-order |FF| Deflate dynamic codes ... |
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1326 +--+---+- - - - ---+--+-- - - - - - - - - - - - --+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1327
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1328 Multi-context Deflate compression codes defined for use by data format
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1329 78 (HUFF_MULTI).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1330
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1331 This is like the DFLH format, except it encodes multiple huffman trees
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1332 instead of a single tree along with the order in which the multiple
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1333 trees should be used (the "code-order").
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1334
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1335 'C' is the code-set number referred to within that compression method.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1336 It should be 128 onwards and is used to distinguish between multiple
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1337 huffman tables.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1338
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1339 The code-order is a run-length encoded series of 8-bit numbers
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1340 indicating which huffman code set should be used for which byte. For
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1341 each byte in the input stream the HUFF_MULTI method selects the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1342 appropriate huffman code by using indexing code-order with the input
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1343 data position modulo the number of values in code-order.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1344
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1345 Following this is data in the Deflate format (RFC 1951). This should
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1346 consist of the header component for a single block using dynamic
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1347 huffman with the BFINAL (last block) flag set, up to and including
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1348 the HDIST+1 code lengths for the distance alphabet. This will then be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1349 immediately followed by the next set of huffman codes, and so on until
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1350 all index values containing within the code-order have been accounted
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1351 for.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1352
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1353 In Deflate streams the end of the huffman codes and the start of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1354 the compressed data stream itself may occur part way through a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1355 byte. Therefore the last byte of the this block is bitwise ORed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1356 with the first byte of the data stream compressed referring back to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1357 this code-set number. Therefore all unused bits in the last byte of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1358 this block should be set to zero. Likewise if the data bit-stream in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1359 this block ends on an exact byte boundary then an additional blank
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1360 byte must be added to ensure the ORing method above still works.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1361
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1362
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1363 For example, compression of 16-bit data is sometimes best achieved by
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1364 producing one set of huffman codes for the top 8 bits and another set
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1365 for the bottom 8 bits, rather than mixing these together by treating
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1366 the 16-bit data as a series of 8-bit quantities. In this case our
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1367 code-order would consist of just two entries; (0, 1).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1368
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1369 Alternatively we may have 4 1-byte confidence values stored per base
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1370 in the order of the confidence of the base-called base type first
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1371 followed by the 3 remaining confidence values. We observe that
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1372 compressing byte 0, 4, 8, 12, ... as one set and bytes 1,2,3, 5,6,7,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1373 ... as another set yields higher compression ratios. In this case the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1374 code-order would consist of 4 entries; (0, 1, 1, 1).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1375
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1376
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1377 REGN
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1378 ----
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1379
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1380 Meta-data: optional - see below
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1381
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1382 Data:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1383 Byte number 0 1 2 3 4 5 6 7 8
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1384 +--+---+---+---+---+---+---+---+---+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1385 Hex values | 0| 1st boundary | 2nd boundary | ...
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1386 +--+---+---+---+---+---+---+---+---+
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1387
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1388 This chunk is used to break a trace down into a series of segments. We
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1389 store the boundary between segments, so the list above will contain
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1390 one less boundary than there are segments with the first segment
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1391 implicitly starting from the first base and the last segment implictly
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1392 extending to the last base.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1393
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1394 Each 4-byte unsigned value indicates a position within the sequence or
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1395 trace counting from 0 as the first element and marking the first base
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1396 of the next region. For example three regions of DNA may be:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1397
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1398 0 1 2 3 4 5 6 7 8 9 10 11 12
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1399 T A C G G A T T C G A A C
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1400 |<-reg. 1->| |<--reg. 2--->| |<-reg. 3->|
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1401
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1402 This would give the 1st boundary as 4 and the 2nd boundary as 9.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1403
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1404 The lack of a REGN chunk implies one single region extending from the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1405 first to last base in the sequence.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1406
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1407 Valid identifiers for the meta-data are:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1408
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1409 Ident Meaning Value(s)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1410 ---------------------------------------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1411 COORD Coordinate system 'T' = trace coordinates
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1412 'B' = base coordinations (default)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1413
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1414 NAME Region names A semicolon separated list of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1415 "name:code" pairs. Eg
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1416 primer1:T;read1:P;primer2:T;read2:P
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1417
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1418 [FIXME: NAME identifier here is the same as the REGION_LIST TEXT
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1419 identifier. We need to decide where it belongs and pick one. If we can
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1420 get a way to specify the default meta-data contents then logically
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1421 speaking the best place to store this is in the meta-data along side
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1422 the chunk data itself.]
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1423
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1424 The NAME identifier is used to attach a meaning to the regions
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1425 described in the data chunk. It consists of a semi-colon separated
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1426 list of names or name:code pairs. The codes, if present are a single
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1427 character from the predefined list below and are separated from the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1428 name by a colon.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1429
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1430 Code Meaning
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1431 ---------------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1432 T Tech read (e.g. primer, linker)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1433 B Bio read
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1434 I Inverted read
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1435 D Duplicate read
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1436 P Paired read
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1437
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1438 FIXME: I don't like the above meanings. They don't, well, "mean" much
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1439 to me! What's a tech read?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1440
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1441
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1442
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1443 Text Identifiers
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1444 ================
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1445
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1446 These are for use in the TEXT segments. None are required, but if any of these
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1447 identifiers are present they must confirm to the description below. Much
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1448 (currently all) of this list has been taken from the NCBI Trace Archive [2]
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1449 documentation. It is duplicated here as the ZTR spec is not tied to the same
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1450 revision schedules as the NCBI trace archive (although it is intended that any
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1451 suitable updates to the trace archive should be mirrored in this ZTR spec).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1452
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1453 The Trace Archive specifies a maximum length of values. The ZTR spec does not
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1454 have length limitations, but for compatibility these sizes should still be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1455 observed.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1456
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1457 The Trace Archive also states some identifiers are mandatory; these are marked
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1458 by asterisks below. These identifiers are not mandatory in the ZTR spec (but
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1459 clearly they need to exist if the data is to be submitted to the NCBI).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1460
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1461 Finally, some fields are not appropriate for use in the ZTR spec, such as
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1462 BASE_FILE (the name of a file containing the base calls). Such fields are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1463 included only for compatibility with the Trace Arhive. It is not expected that
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1464 use of ZTR would allow for the base calls to be read from an external file
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1465 instead of the ZTR BASE chunk.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1466
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1467 [ Quoted from TraceArchiveRFC v1.17 ]
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1468
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1469 Identifier Size Meaning Example value(s)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1470 ---------- ----- ---------------------------- -----------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1471 TRACE_NAME * 250 name of the trace HBBBA1U2211
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1472 as used at the center
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1473 unique within the center
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1474 but not among centers.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1475
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1476 SUBMISSION_TYPE * - type of submission
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1477
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1478 CENTER_NAME * 100 name of center BCM
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1479 CENTER_PROJECT 200 internal project name HBBB
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1480 used within the center
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1481
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1482 TRACE_FILE * 200 file name of the trace ./traces/TRACE001.scf
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1483 relative to the top of
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1484 the volume.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1485
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1486 TRACE_FORMAT * 20 format of the tracefile
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1487
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1488 SOURCE_TYPE * - source of the read
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1489
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1490 INFO_FILE 200 file name of the info file
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1491 INFO_FILE_FORMAT 20
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1492
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1493 BASE_FILE 200 file name of the base calls
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1494 QUAL_FILE 200 file name of the base calls
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1495
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1496
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1497 TRACE_DIRECTION - direction of the read
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1498 TRACE_END - end of the template
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1499 PRIMER 200 primer sequence
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1500 PRIMER_CODE which primer was used
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1501
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1502 STRATEGY - sequencing strategy
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1503 TRACE_TYPE_CODE - purpose of trace
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1504
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1505 PROGRAM_ID 100 creator of trace file phred-0.990722.h
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1506 program-version
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1507
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1508 TEMPLATE_ID 20 used for read pairing HBBBA2211
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1509
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1510 CHEMISTRY_CODE - code of the chemistry (see below)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1511 ITERATION - attempt/redo 1
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1512 (int 1 to 255)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1513
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1514 CLIP_QUALITY_LEFT left clip of the read in bp due to quality
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1515 CLIP_QUALITY_RIGHT right " " " " "
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1516 CLIP_VECTOR_LEFT left clip of the read in bp due to vector
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1517 CLIP_VECTOR_RIGHT right " " " " "
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1518
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1519
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1520 SVECTOR_CODE 40 sequencing vector used (in table)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1521 SVECTOR_ACCESSION 40 sequencing vector used (in table)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1522 CVECTOR_CODE 40 clone vector used (in table)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1523 CVECTOR_ACCESSION 40 clone vector used (in table)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1524
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1525 INSERT_SIZE - expected size of insert 2000,10000
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1526 in base pairs (bp)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1527 (int 1 to 2^32)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1528
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1529 PLATE_ID 32 plate id at the center
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1530 WELL_ID well 1-384
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1531
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1532
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1533 SPECIES_CODE * - code for species
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1534 SUBSPECIES_ID 40 name of the subspecies
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1535 Is this the same as strain
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1536
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1537 CHROMOSOME 8 name of the chromosome ChrX, Chr01, Chr09
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1538
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1539
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1540 LIBRARY_ID 30 the source library of the clone
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1541 CLONE_ID 30 clone id RPCI11-1234
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1542
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1543 ACCESSION 30 NCBI accession number AC00001
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1544
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1545 PICK_GROUP_ID 30 an id to group traces picked
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1546 at the same time.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1547 PREP_GROUP_ID 30 an id to group traces prepared
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1548 at the same time
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1549
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1550
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1551 RUN_MACHINE_ID 30 id of sequencing machine
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1552 RUN_MACHINE_TYPE 30 type/model of machine
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1553 RUN_LANE 30 lane or capillary of the trace
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1554 RUN_DATE - date of run
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1555 RUN_GROUP_ID 30 an identifier to group traces
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1556 run on the same machine
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1557
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1558 [ End of quote from TraceArchiveRFC ]
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1559
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1560 More detailed information on the format of these values should be obtained
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1561 from the Trace Archive RFC [2].
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1562
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1563 In addition to the above the following TEXT identifiers have meaning
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1564 specific to the ZTR format:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1565
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1566 Identifier Meaning Example value(s)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1567 ---------- ---------------------------- -------------------------------
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1568 REGION_LIST A semi-colon separated list primer1:T;read1:P
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1569 identifying regions of a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1570 trace. See the REGN chunk Region 1;Region 2;Region 3
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1571 definition for details.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1572
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1573
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1574 FIXME: Should this simply be the meta-data associated with the REGN
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1575 chunk?
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1576
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1577
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1578
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1579 References
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1580 ==========
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1581 [1] IUPAC: http://www.chem.qmw.ac.uk/iubmb/misc/naseq.html
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1582
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1583 [2] http://www.ncbi.nlm.nih.gov/Traces/TraceArchiveRFC.html
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1584
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1585 [3] J.Bonfield and R.Staden, "ZTR: a new format for DNA sequence trace
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1586 data". Bioinformatics Vol. 18 no. 1 2002.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1587
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1588
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1589 FIXME: As an aside, not doing the final entropy encoding steps (zlib,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1590 deflate, etc) and just using bzip2 on an entire SRF archive yields a
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1591 considerable saving. On tests it varied between 23% (27bp reads) and
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1592 13% (74bp reads) smaller than the Deflate compressed
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1593 data. Unfortunately it pretty much removes all chance of random access
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1594 in the data unless I can get a working FM-Index implementation
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1595 (which is very unlikely in a short time). This makes it appropriate
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1596 for transmission perhaps, but not for indexing and querying random
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1597 sequences.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1598
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1599 A substantial chunk (5-9%) of this saving comes from the repeated ZTR
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1600 block types (names like "BASE", "CNF4" and common components like 0x00000000
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1601 for the meta-data size). The remainder probably comes from
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1602 similarities between one ZTR file and another.