Mercurial > repos > dawe > srf2fastq
annotate srf2fastq/io_lib-1.12.2/docs/ZTR_format @ 0:d901c9f41a6a default tip
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author | dawe |
---|---|
date | Tue, 07 Jun 2011 17:48:05 -0400 |
parents | |
children |
rev | line source |
---|---|
0
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1 Notes: 28th May 2008 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
2 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
3 For version 2.0 consider the following: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
4 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
5 1) Remove defunct or useless chunk types and compression formats. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
6 2) Rationalise inconsitent behaviour (eg endianness on zlib chunk). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
7 3) Support split header/data formats for SRF |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
8 4) Formalise meta-data use better. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
9 5) More pie-in-the-sky ideas? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
10 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
11 What we've described so far could easily be said to be v1.4. It's |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
12 backwards compatible and fairly minor is change. If we truely want to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
13 go for version 2 then taking the chance to remove all those niggles |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
14 that we've kept purely for backwards compatibility would be good. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
15 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
16 In more detail: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
17 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
18 1) Removal of RLE and floating point chebyshev polynomials. Mark XRLE |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
19 as deprecated? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
20 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
21 We may wish to add an extra option to XRLE2 to indicate the repeat |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
22 count before specifying the remaining run-length. This breaks the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
23 format though. (Or add XRLE3 to allow such control?) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
24 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
25 2) Strange things I can see are: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
26 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
27 2.1) All chunks use big-endian data except for zlib which has a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
28 little-endian length. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
29 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
30 2.2) The order that data is stored in differs per chunk type. For |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
31 trace data we store all As, then all Cs, all Gs and finally |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
32 all Ts. For confidence values we store called first followed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
33 by remaining. Both SMP4 and CNF4 essentially hold 1 piece of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
34 data per base type per base position, it's just the word size |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
35 and packing order that differs. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
36 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
37 This means TSHIFT and QSHIFT compression types are tied very |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
38 much to trace and quality value chunks, rather than being |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
39 generic transforms. Maybe we should always have the same |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
40 encoding order and some standard compression/transformations |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
41 to reorder as desired. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
42 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
43 An example: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
44 All data related per call is stored in the natural order |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
45 produced. (eg as utilised in CNF1, BPOS). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
46 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
47 All data related per base-type per call is stored in the order |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
48 produced: A, C, G, T for the first base position, A, C, G, T |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
49 for the second position, and so on. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
50 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
51 Then we have standard filters that can swap between |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
52 ACGTACGTACGT... and AAA...CCC...GGG...TTT... or to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
53 <called><non-called * 3>... order (which requires a BASE chunk |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
54 present to encode/decode). We'd have 1, 2 and 4 byte variants |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
55 of such filters. They do not need to understand the nature of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
56 the data they're manipulating, just the word size and a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
57 predetermined order to shuffle the data around in. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
58 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
59 For CNF4 a combination of {ACGT}* to {<called><non-called*3>}* |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
60 followed by {ACGT}* to A*C*G*T* ordering would end up with all |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
61 <called> followed by all 3 remaining non-called. Ie as it is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
62 now (which we then promptly "undo" in solexa data by using |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
63 TSHIFT).a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
64 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
65 3) I'm wondering if there's mileage here in having negative lengths to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
66 indicate constant data + variable data further on. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
67 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
68 Eg length -10 means the next 10 bytes are the start of the data for |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
69 this chunk. Some stage later we'll read a 4-byte length followed by |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
70 the remaining data for this chunk. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
71 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
72 Rationale: often we end up with many identical bytes at the start |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
73 of a chunk. For example, we take a solexa trace (0 0 value...), run |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
74 it through TSHIFT (80 0 0 0 previous data => 80 0 0 0 0 0 value ...) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
75 and then through STHUFF (77 80(eg) data), but data is the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
76 compressed stream always starting with 80 0 0 0 0 0 so typically it's |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
77 always the same starting string. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
78 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
79 Tested on an SRF file I see SMP4 always starting with the same 9 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
80 bytes of data, BASE starting with the same 3 bytes and CNF4 always |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
81 starting with the same 7 bytes. Hence we'd have lengths -9, -3 and |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
82 -7 in the chunk headers and move that common data to the header |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
83 block too. That's approx 3% of the size of our SRF file. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
84 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
85 4) I propose *all* chunks have some standard meta-data fields |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
86 available for use. These can be: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
87 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
88 4.1) GROUP - all chunks sharing the same GROUP value are considered |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
89 as being related to one another. This provides a mechanism for |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
90 multiple base-call, base position and confidence value chunks |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
91 while still knowing which confidence values belong to which |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
92 call. It also allows for multiple SAMP chunks (instead of the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
93 SMP4 chunk) to be collated together if desired. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
94 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
95 I don't expect many ZTR files to contain calls from multiple |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
96 base-callers, but it's maybe a nice extension and seems quite |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
97 a simple/clean use of meta-data. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
98 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
99 4.2) ENCODING - the default encoding for the chunk data is as |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
100 described in the chunk. We may however wish to override this |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
101 and, for example, store SMP4 data as 32-bit floating point |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
102 values instead of 16-bit integers. This specifies that. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
103 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
104 Question: do we want this available universally everywhere? If |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
105 not, we should at least use the same meta-data keyword for all |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
106 occurrences. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
107 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
108 4.3) TRANSFORM - a simple transformation description. This is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
109 essentially a mini-formula. It replaces the OFFS meta-data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
110 used in SMP4 which is simply a transform of X+value. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
111 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
112 5) There are more generic ways to save storage by removing redundancy. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
113 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
114 Most probably they're not worth it, but I list them here for |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
115 discussion still. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
116 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
117 5.1) Use 7-bit variable sized encodings for values instead of fixed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
118 32-bit sizes. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
119 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
120 Eg instead of storing 1000 as 0x3*0x100 + 0xe8 (00 00 03 e8) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
121 we could store it as 0x7*0x80 + 0x68 (80|07 68). The logic |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
122 here being setting the top bit implies this isn't the final |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
123 value and more data follows. It allows for variable sized |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
124 fields so that small numbers take up fewer bytes. The same can |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
125 be applied to data in SRF structs too. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
126 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
127 Realistically it saves 2 bytes per record in SRF and an |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
128 unknown amount for ZTR - estimated 8 or so (3 for cnf4/base |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
129 and 2 for smp4). It's only 1.5% saving though in total. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
130 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
131 5.2) A general purpose dictionary system. Instead of attempting to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
132 move headers to one area and data somewhere else, possibly |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
133 also taking common portions of data and putting that somewhere |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
134 too, we could provide a dictionary system whereby we |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
135 previously remove redundancy by replacing all occurrences of a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
136 particular byte pattern with a new shorter code. (We'd need an |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
137 escape mechanism for when it occurs by chance.) The dictionary |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
138 can then be specified in it's own chunk which is stored in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
139 header portion. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
140 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
141 This then works for portions of chunk header (eg if the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
142 meta-data changes) rather than full headers, where the data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
143 blocks always start with the same text, or where we want to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
144 have sensible names in text fields but don't like them taking |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
145 up too much space. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
146 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
147 It's maybe a bit messy though and complex to implement, plus |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
148 it's unknown how big an impact having to escape accidental |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
149 dictionary codes from appearing in real data. The more formal |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
150 way of removing redundancy is probably better. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
151 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
152 5.3) Lossy compression. I believe there's still room for this, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
153 although it needs careful thought. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
154 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
155 The floating point format really isn't an ideal way to do it |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
156 though, so I'd much rather have an encoding system that uses |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
157 N*log(signal/M+1) plus a sign bit, stored in integers. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
158 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
159 As we store data in integers the value of N combined with the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
160 maximum value for log(signal/M+1) gives us the number of bits |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
161 we wish to encode to. Essentially we're storing the log value |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
162 to a fixed point precision. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
163 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
164 The value of M dictates the slope of the errors we get from |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
165 logging. It's hard to describe, but basically as signal gets |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
166 larger our average error in storing the signal also gets |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
167 larger. That's true for floating point values too as there's |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
168 a fixed number of bits and they're being used to represent |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
169 larger and larger values, meaning the resolution drops. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
170 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
171 I have various test code and graphs showing error profiles |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
172 for logs vs fixed point vs floating point. Logs or fixed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
173 point are nearly always preferable to a floating point format |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
174 for size vs accuracy. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
175 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
176 ----------------------------------------------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
177 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
178 CHANGE (since 1.2): |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
179 SAMP and SMP4 now has meta data fields indicating the zero base-line. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
180 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
181 CLARIFICATION |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
182 The specification now explicitly states that trace samples are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
183 unsigned, although the new OFFS meta-data can be used to turn these |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
184 into signed values. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
185 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
186 CLARIFICATION |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
187 We explicitly state that multiple TEXT chunks maybe present in the ZTR |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
188 file and will be concatenated together. Also the trailing (nul) byte |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
189 is now optional. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
190 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
191 CHANGE |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
192 Added CSET (character set) meta-data for BASEs so ABI SOLID encoding |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
193 can be used. This removes the requirement of IUPAC characters only. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
194 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
195 CHANGE |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
196 Added XRLE2, QSHIFT, TSHIFT and STHUFF compression types. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
197 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
198 INCOMPATIBLE CHANGE: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
199 I propose for this version to make all meta-data adhere to a specific |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
200 format rather than adhoc. It'll consist of zero or more copies of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
201 'identifier nul value nul'. See the format below for details. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
202 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
203 The only use of meta-data in 1.2 was for SAMP (not SMP4) chunks to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
204 indicate the channel the data came from. From now on file readers will |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
205 need to check the version number in the header to determine how to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
206 parse the SAMP meta-data. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
207 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
208 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
209 [Search for "FIXME" for my comments / questions to be answered. They |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
210 elaborate on the summary below and provide more context.] |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
211 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
212 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
213 QUESTION1: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
214 Should we adapt ZTR to not be so inefficient with regards |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
215 to tiny chunks. Specifically a 5 byte chunk size, 4 byte meta-data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
216 size (normally zero anyway) and 4 byte data length is all |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
217 wasteful. These combined comprise 5-10% of the total SRF size. Note |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
218 that changing this would break backwards compatibility. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
219 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
220 QUESTION2: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
221 Do I need a means to specify the "default meta-data". Specifically if |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
222 we have lots of SAMP chunks (for example) and every single one is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
223 stating that the zero "offset" value is 32768 then we may want a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
224 mechanism of specifying that the default OFFS value is 32768 for all |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
225 subsequent SAMP chunks. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
226 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
227 One possible way to do this is to have a new chunk type which sets the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
228 default. Eg for the SAMP chunk we could define a SaMP chunk to modify |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
229 the default for SAMP. This seems oddly named, but it's utilising the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
230 bit5 of the 2nd byte which so far has been reserved as zero. (In the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
231 first byte bit 5 set => private namespace and not part of the public spec.) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
232 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
233 For now I'm just ignoring this issue though. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
234 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
235 QUESTION3: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
236 I've defined new transforms named TSHIFT and QSHIFT specifically |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
237 designed for adjusting the layout of CND4 and SMP4 chunk types to an |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
238 order more amenable for compression by interlaced deflate. They do the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
239 job, but I'm wondering if it's better to simply redefine the input |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
240 data to be a more consistent ordering so that we can define more |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
241 general purpose transforms rather than one dedicated to the original |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
242 trace layout and one for the quality layout. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
243 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
244 I'm ignoring this for now as it would break backwards compatibility. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
245 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
246 QUESTION4: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
247 For the OFFS meta-data in SMP4 and SAMP chunks I have a 16-bit offset |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
248 to specify the zero position. Ie OFFS of 10000 means a sample of 9000 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
249 becomes -1000 after processing. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
250 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
251 Should it be a signed or unsigned 16-bit value. Signed means we could |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
252 encode values ranging from 10000 to 70000 by specify OFFS as -10000. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
253 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
254 Should it be 32-bit instead? Should we have OFFI and OFFF for integer |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
255 and floating point equivalents? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
256 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
257 QUESTION5: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
258 For region encoding where should the region name belong - the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
259 meta-data section or the REGION_LIST TEXT identifier? It's currently |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
260 in both places. My gut instinct tells me it belongs in the meta-data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
261 for the REGION_LIST chunk itself. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
262 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
263 QUESTION6: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
264 Can we have clarification on what the region code types mean, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
265 specifically "tech read". |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
266 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
267 QUESTION7: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
268 Should we add SAMP/SMP4 meta-data indicating a down-scale factor? For |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
269 454 data this could be 100, so we know value 123 is really 1.23. Note |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
270 this is maybe better implemented below using fixed-point precision. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
271 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
272 QUESTION8: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
273 How do we deal with floating point values? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
274 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
275 I think the chunk meta-data should detail the format of the data block |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
276 itself (as it is strictly speaking data about the data so it fits |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
277 there well). A lack of meta data should imply the usual unsigned |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
278 16-bit quantities. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
279 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
280 There's two main ways to encode fractions: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
281 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
282 Floating point where we have a mantissa and an exponent. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
283 - See http://en.wikipedia.org/wiki/IEEE_floating-point_standard |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
284 - large dynamic range |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
285 - fixed number of significant bits |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
286 - varying "resolution". Ie can represent tiny differences |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
287 between two very small floating point numbers, but not |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
288 between two very large floating point numbers. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
289 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
290 Fixed point where we have a fixed number of bits for the component |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
291 before and after the decimal point. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
292 - See http://en.wikipedia.org/wiki/Q_%28number_format%29 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
293 - constant resolution |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
294 - effectively used by SFF (specified to 2 decimal places) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
295 - easy to treat as integers so can be fast and dealt with by |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
296 small embedded CPUs without FPUs. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
297 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
298 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
299 Floating point maybe appropriate as effectively it's the same as |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
300 logging your signals and storing those. It offers large dynamic range |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
301 so can cope with abnormally large values (at the expense of precision) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
302 while retaining lots of variation at the low end to distinguish small |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
303 values. However it's CPU intensive to cope with anything other than |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
304 the CPU provided 32-bit and 64-bit floating point formats. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
305 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
306 Single precision 32-bit floats in IEEE-754 have: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
307 1 bit (31): Sign |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
308 8 bits (23-30): Exponent (bias 127, so stroring 100 => -27) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
309 23 bits (0-22): Mantissa |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
310 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
311 Effectively we store any binary value as a normalised expression: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
312 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
313 <exponent> |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
314 1.<mantissa> * 2 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
315 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
316 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
317 Eg 1732.5: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
318 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
319 => 11011000100.1 (binary) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
320 => 1.10110001001 (binary) * 2^10 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
321 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
322 Exponent+127 => 137 => 10001001 (binary) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
323 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
324 sign exponent mantissa |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
325 0 10001001 10110001001000000000000 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
326 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
327 (17325 => 0x43ad => 0x0010001110101101 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
328 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
329 However we probably want 16-bit and 24-bit floating point types for |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
330 efficiencies sake. Do we go with some fixed predefined floating point |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
331 formats for 8-bit, 16-bit, 24-bit and 32-bit layouts (with 32-bit |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
332 being identical to IEEE754) or do we allow for specification of the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
333 mantissa and exponent Eg FLOAT=23.8, FLOAT=17.6 or FLOAT=5.2 in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
334 meta-data block? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
335 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
336 FLOAT=17.6 (24-bit) gives ranges +/- 8.6*10^9 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
337 FLOAT=5.2 (8-bit) gives ranges +/- 64 (I think). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
338 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
339 Alternatively if we restrict ourselves to only using the most |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
340 significant 14 bits of the mantissa then storing as standard 32-bit |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
341 floats implies 1 in every 4 bytes is zero. This may provide for a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
342 very crude, but fast way to implement reduced size floating point |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
343 values - ie FLOAT=15.8 (24-bit signed). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
344 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
345 For fixed point (as in SFF values) there's already a draft standard |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
346 for implementation in C (ISO/IEC TR 18037:2004). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
347 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
348 One benefit of fixed point over floating point is speed of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
349 implementation. Fixed point numbers can just be dealt with as |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
350 integers. Eg subtracting two fixed point 16-bit values can be done in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
351 integers using a-b and the result is the same as if we'd done all the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
352 bit twiddling and maths directly simulating a real fixed-point unit. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
353 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
354 My gut feeling is that we'd want to explicitly declare the number of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
355 bits for integral and fractional components in the meta-data block. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
356 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
357 Comments? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
358 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
359 James |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
360 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
361 PS. The latest (only minor tweaks from before) ZTR draft spec |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
362 follows. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
363 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
364 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
365 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
366 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
367 1.3 draft 3 (19 Oct 2007) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
368 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
369 ZTR SPEC v1.3 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
370 ============= |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
371 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
372 Header |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
373 ====== |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
374 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
375 The header consists of an 8 byte magic number (see below), followed by |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
376 a 1-byte major version number and 1-byte minor version number. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
377 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
378 Changes in minor numbers should not cause problems for parsers. It indicates |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
379 a change in chunk types (different contents), but the file format is the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
380 same. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
381 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
382 The major number is reserved for any incompatible file format changes (which |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
383 hopefully should be never). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
384 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
385 /* The header */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
386 typedef struct { |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
387 unsigned char magic[8]; /* 0xae5a54520d0a1a0a (b.e.) */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
388 unsigned char version_major; /* 1 */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
389 unsigned char version_minor; /* 3 */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
390 } ztr_header_t; |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
391 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
392 /* The ZTR magic numbers */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
393 #define ZTR_MAGIC "\256ZTR\r\n\032\n" |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
394 #define ZTR_VERSION_MAJOR 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
395 #define ZTR_VERSION_MINOR 3 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
396 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
397 So the total header will consist of: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
398 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
399 Byte number 0 1 2 3 4 5 6 7 8 9 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
400 +--+--+--+--+--+--+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
401 Hex values |ae 5a 54 52 0d 0a 1a 0a|01 03| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
402 +--+--+--+--+--+--+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
403 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
404 Chunk format |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
405 ============ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
406 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
407 The basic structure of a ZTR file is (header,chunk*) - ie header followed by |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
408 zero or more chunks. Each chunk consists of a type, some meta-data and some |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
409 data, along with the lengths of both the meta-data and data. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
410 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
411 Byte number 0 1 2 3 4 5 6 7 8 9 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
412 +--+--+--+--+----+----+----+---+--+ - +--+--+--+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
413 Hex values | type |meta-data length | meta-data |data length| data .. | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
414 +--+--+--+--+----+----+----+---+--+ - +--+--+--+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
415 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
416 FIXME: For very short reads this is a large overhead. We have 8 bytes |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
417 of length information (of which typically only 1-2 are non-zero) and 4 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
418 bytes for type (which typically only has one of 4-5 values). This |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
419 means about 10 bytes wasted per chunk, or maybe 5-10% of the total |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
420 file size. Changing this would be a radical departure from ZTR; is it |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
421 justified given the savings? (est. 4.8% for 74bp reads, 8.4% for 27bp |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
422 reads). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
423 One idea if to consider a ZTR file (the non "block" components at |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
424 least) to be a series of huffman codes, by default all 8-bit long and |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
425 matching their ASCII codes. Then a dedicated chunk could be used to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
426 adjust these default codes. It's therefore backwards compatible, but |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
427 is that also overkill? (NB, this looks like it'd save 6% on the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
428 overall file size.) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
429 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
430 Ie in C: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
431 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
432 typedef struct { |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
433 uint4 type; /* chunk type (b.e.) */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
434 uint4 mdlength; /* length of meta-data field (b.e.) */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
435 char *mdata; /* meta data */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
436 uint4 dlength; /* length of data field (b.e.) */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
437 char *data; /* a format byte and the data itself */ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
438 } ztr_chunk_t; |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
439 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
440 All 2 and 4-byte integer values are stored in big endian format. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
441 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
442 The meta-data is uncompressed (and so it does not start with a format |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
443 byte). From version 1.3 onwards meta-data is defined to be in key |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
444 value pairs adhering to the same structure defined in the TEXT chunk |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
445 ("key\0value\0"). Exceptions are made for this only for purposes of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
446 backwards compatibility in the SAMP chunk type. The contents of the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
447 meta-data is chunk specific, and many chunk types will have no |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
448 meta-data. In this case the meta-data length field will be zero and |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
449 this will be followed immediately by the data-length field. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
450 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
451 Ie all meta-data adheres to the following structure: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
452 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
453 Meta-data: (version 1.3 onwards only) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
454 +- - -+--+- - -+--+- -+- - -+--+- - -+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
455 Hex values | ident | 0| value | 0| - | ident | 0| value | 0| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
456 +- - -+--+- - -+--+- -+- - -+--+- - -+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
457 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
458 FIXME: Can we have specify the meta-data once per ZTR file and omit it |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
459 in subsequent chunks? Eg a blank chunk with meta-data only in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
460 header. Chunks in the body then specify meta-data length as 0xFFFFFFFF |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
461 as an indicator meaning "use the last meta-data defined for this chunk |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
462 type". Useful when split in two, as in SRF? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
463 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
464 Note that this means both ident and values must not themselves contain |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
465 the zero byte (a nul character), hence we generally store ident-value |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
466 pairs in ASCII string forms. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
467 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
468 The data length ("dlength") is the length in bytes of the entire |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
469 'data' block, including the format information held within it. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
470 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
471 The first byte of the data consists of a format byte. The most basic format is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
472 zero - indicating that the data is "as is"; it's the real thing. Other formats |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
473 exist in order to encode various filtering and compression techniques. The |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
474 information encoded in the next bytes will depend on the format byte. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
475 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
476 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
477 RAW (#0) - no formatting |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
478 -------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
479 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
480 Byte number 0 1 2 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
481 +--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
482 Hex values | 0| raw data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
483 +--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
484 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
485 Raw data has no compression or filtering. It just contains the unprocessed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
486 data. It consists of a one byte header (0) indicating raw format followed by N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
487 bytes of data. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
488 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
489 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
490 RLE (#1) - simple run-length encoding |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
491 ------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
492 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
493 Byte number 0 1 2 3 4 5 6 7 8 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
494 +--+----+----+-----+-----+-------+--+--+--+-- - --+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
495 Hex values | 1| Uncompressed length | guard | run length encoded data| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
496 +--+----+----+-----+-----+-------+--+--+--+-- - --+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
497 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
498 Run length encoding replaces stretches of N identical bytes (with value V) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
499 with the guard byte G followed by N and V. All other byte values are stored |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
500 as normal, except for occurrences of the guard byte, which is stored as G 0. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
501 For example with a guard value of 8: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
502 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
503 Input data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
504 20 9 9 9 9 9 10 9 8 7 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
505 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
506 Output data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
507 1 (rle format) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
508 0 0 0 10 (original length) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
509 8 (guard) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
510 20 8 5 9 10 9 8 0 7 (rle data) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
511 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
512 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
513 ZLIB (#2) - see RFC 1950 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
514 --------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
515 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
516 Byte number 0 1 2 3 4 5 6 7 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
517 +--+----+----+-----+-----+--+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
518 Hex values | 2| Uncompressed length | Zlib encoded data| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
519 +--+----+----+-----+-----+--+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
520 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
521 This uses the zlib code to compress a data stream. The ZLIB data may itself be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
522 encoded using a variety of methods (LZ77, Huffman), but zlib will |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
523 automatically determine the format itself. Often using zlib mode |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
524 Z_HUFFMAN_ONLY will provide best compression when combined with other |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
525 filtering techniques. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
526 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
527 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
528 XRLE (#3) - multi-byte run-length encoding |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
529 --------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
530 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
531 Byte number 0 1 2 3 4 5 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
532 +--+------+-------+--+--+--+-- - --+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
533 Hex values | 3| size | guard | run length encoded data| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
534 +--+------+-------+--+--+--+-- - --+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
535 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
536 Much standard RLE, but this mechanism has a byte to specify the length |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
537 of the data item we compare to check for runs. It is not restricted to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
538 spotted runs aligned on 'size' byte boundaries either. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
539 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
540 No uncompressed length is encoded here as technically this is not |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
541 required (although it does make decoding a bit slower). The compressed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
542 length alone is sufficient to work out the uncompressed length after |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
543 decompressing. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
544 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
545 Guard bytes in the input stream are 'escaped' by the replacing the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
546 guard byte followed by zero. Guard bytes in a parameterised run (ie X |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
547 copies of Y where Y contains the guard) do not need to be 'escaped' |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
548 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
549 Input data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
550 10 12 12 13 12 13 12 13 12 13 14 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
551 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
552 Output data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
553 3 (xrle format) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
554 2 (size of blocks to compare) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
555 12 (guard, 12 is a bad choice but illustrative) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
556 10 12 0 12 4 12 13 14 (rle data) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
557 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
558 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
559 XRLE2 (#4) - word aligned multi-byte run-length encoding |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
560 ---------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
561 Version 1.3 onwards |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
562 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
563 Byte number 0 1 RSZ multiple of RSZ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
564 +--+-----+---------+-- - - - - - - - - - ---+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
565 Hex values | 4| RSZ | padding | run length encoded data| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
566 +--+-----+---------+-- - - - - - - - - - ---+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
567 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
568 This achieves the same goal as XRLE, but is designed to maintain data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
569 aligned to specific 'record size' boundaries. This sometimes has |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
570 benefits over XRLE in that subsequent a interlaced deflate entropy |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
571 encoding may work better on record-aligned data streams. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
572 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
573 The first byte holds the format (#4) while the record size (RSZ) is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
574 held in the second byte. In order to ensure the entire block of data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
575 is aligned on 'RSZ' bounaries RSZ-2 padding bytes are written out |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
576 before the data itself starts. The contents of these bytes can be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
577 anything. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
578 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
579 Unlike XRLE it also does not use an explicit guard byte. If we term a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
580 'word' to be a block of data of size RSZ, then whenever we read a word |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
581 which is identical to the last word written then we write out that |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
582 word (so we have two consecutive words in the output data) followed by |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
583 a counter of how many additional copies of that word are found, up to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
584 255. This counter consists of 1 byte indicating the number of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
585 additional copies of the word followed by RSZ-1 padding bytes to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
586 maintain word alignment. While the contents of these padding bytes may |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
587 be anything, it is suggested that they adhere to same value |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
588 distribution as observed elsewhere in the data block in order to keep |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
589 the data entropy low. (For example repeating the previous bytes from |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
590 'word' will do.) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
591 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
592 Example: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
593 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
594 Input data: taken in pairs: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
595 1 0 2 2 2 2 3 1 3 1 3 1 2 4 2 4 2 4 2 3 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
596 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
597 Output data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
598 4 2 (xrle2 format, rec size 2) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
599 1 0 ("1 0" from input) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
600 2 2 2 2 0 2 ("2 2" x 2) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
601 3 1 3 1 1 1 ("3 1" x 3) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
602 2 4 2 4 1 4 ("2 4" x 3) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
603 2 3 ("2 3") |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
604 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
605 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
606 DELTA1 (#64) - 8-bit delta |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
607 ------------ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
608 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
609 Byte number 0 1 2 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
610 +--+-------------+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
611 Hex values |40| Delta level | data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
612 +--+-------------+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
613 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
614 This technique replaces successive bytes with their differences. The level |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
615 indicates how many rounds of differencing to apply, which should be between 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
616 and 3. For determining the first difference we compare against zero. All |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
617 differences are internally performed using unsigned values with automatic an |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
618 wrap-around (taking the bottom 8-bits). Hence 2-1 is 1 and 1-2 is 255. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
619 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
620 For example, with level set to 1: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
621 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
622 Input data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
623 10 20 10 200 190 5 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
624 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
625 Output data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
626 1 (delta1 format) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
627 1 (level) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
628 10 10 246 190 246 71 (delta data) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
629 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
630 For level set to 2: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
631 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
632 Input data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
633 10 20 10 200 190 5 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
634 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
635 Output data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
636 1 (delta1 format) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
637 2 (level) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
638 10 0 236 200 56 81 (delta data) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
639 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
640 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
641 DELTA2 (#65) - 16-bit delta |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
642 ------------ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
643 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
644 Byte number 0 1 2 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
645 +--+-------------+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
646 Hex values |41| Delta level | data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
647 +--+-------------+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
648 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
649 This format is as data format 64 except that the input data is read in 2-byte |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
650 values, so we take the difference between successive 16-bit numbers. For |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
651 example "0x10 0x20 0x30 0x10" (4 8-bit numbers; 2 16-bit numbers) yields "0x10 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
652 0x20 0x1f 0xf0". All 16-bit input data is assumed to be aligned to the start |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
653 of the buffer and is assumed to be in big-endian format. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
654 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
655 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
656 DELTA2 (#66) - 32-bit delta |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
657 ------------ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
658 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
659 Byte number 0 1 2 3 4 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
660 +--+-------------+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
661 Hex values |42| Delta level | 0| 0| data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
662 +--+-------------+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
663 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
664 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
665 This format is as data formats 64 and 65 except that the input data is read in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
666 4-byte values, so we take the difference between successive 32-bit numbers. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
667 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
668 Two padding bytes (2 and 3) should always be set to zero. Their purpose is to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
669 make sure that the compressed block is still aligned on a 4-byte boundary |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
670 (hence making it easy to pass straight into the 32to8 filter). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
671 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
672 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
673 Data format 67-69/0x43-0x45 - reserved |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
674 --------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
675 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
676 At present these are reserved for dynamic differencing where the 'level' field |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
677 varies - applying the appropriate level for each section of data. Experimental |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
678 at present... |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
679 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
680 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
681 16TO8 (#70) - 16 to 8 bit conversion |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
682 ----------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
683 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
684 Byte number 0 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
685 +--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
686 Hex values |46| data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
687 +--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
688 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
689 This method assumes that the input data is a series of big endian 2-byte |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
690 signed integer values. If the value is in the range of -127 to +127 inclusive |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
691 then it is written as a single signed byte in the output stream, otherwise we |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
692 write out -128 followed by the 2-byte value (in big endian format). This |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
693 method works well following one of the delta techniques as most of the 16-bit |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
694 values are typically then small enough to fit in one byte. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
695 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
696 Example input data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
697 0 10 0 5 -1 -5 0 200 -4 -32 (bytes) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
698 (As 16-bit big-endian values: 10 5 -5 200 -800) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
699 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
700 Output data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
701 70 (16-to-8 format) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
702 10 5 -5 -128 0 200 -128 -4 -32 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
703 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
704 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
705 32TO8 (#71) - 32 to 8 bit conversion |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
706 ----------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
707 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
708 Byte number 0 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
709 +--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
710 Hex values |47| data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
711 +--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
712 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
713 This format is similar to format 16TO8, but we are reducing 32-bit numbers (big |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
714 endian) to 8-bit numbers. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
715 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
716 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
717 FOLLOW1 (#72) - "follow" predictor |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
718 ------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
719 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
720 Byte number 0 1 FF 100 101 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
721 +--+-- - - - --+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
722 Hex values |48| follow bytes | data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
723 +--+-- - - - --+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
724 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
725 For each symbol we compute the most frequent symbol following it. This is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
726 stored in the "follow bytes" block (256 bytes). The first character in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
727 data block is stored as-is. Then for each subsequent character we store the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
728 difference between the predicted character value (obtained by using |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
729 follow[previous_character]) and the real value. This is a very crude, but |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
730 fast, method of removing some residual non-randomness in the input data and so |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
731 will reduce the data entropy. It is best to use this prior to entropy encoding |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
732 (such as huffman encoding). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
733 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
734 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
735 CHEB445 (#73) - floating point 16-bit chebyshev polynomial predictor |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
736 ------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
737 Version 1.1 only. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
738 Deprecated: replaced by format 74 in Version 1.2. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
739 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
740 WARNING: This method was experimental and have been replaced with an |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
741 integer equivalent. The floating point method may give system specific |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
742 results. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
743 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
744 Byte number 0 1 2 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
745 +--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
746 Hex values |49| 0| data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
747 +--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
748 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
749 This method takes big-endian 16-bit data and attempts to curve-fit it using |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
750 chebyshev polynomials. The exact method employed uses the 4 preceeding values |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
751 to calculate chebyshev polynomials with 5 coefficents. Of these 5 coefficients |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
752 only 4 are used to predict the next value. Then we store the difference |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
753 between the predicted value and the real value. This procedure is repeated |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
754 throughout each 16-bit value in the data. The first four 16-bit values are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
755 stored with a simple 1-level 16-bit delta function. Reversing the predictor |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
756 follows the same procedure, except now adding the differences between stored |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
757 value and predicted value to get the real value. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
758 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
759 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
760 ICHEB (#74) - integer based 16-bit chebyshev polynomial predictor |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
761 ----------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
762 Version 1.2 onwards |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
763 This replaces the floating point CHEB445 format in ZTR v1.1. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
764 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
765 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
766 Byte number 0 1 2 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
767 +--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
768 Hex values |4A| 0| data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
769 +--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
770 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
771 This method takes big-endian 16-bit data and attempts to curve-fit it using |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
772 chebyshev polynomials. The exact method employed uses the 4 preceeding values |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
773 to calculate chebyshev polynomials with 5 coefficents. Of these 5 coefficients |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
774 only 4 are used to predict the next value. Then we store the difference |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
775 between the predicted value and the real value. This procedure is repeated |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
776 throughout each 16-bit value in the data. The first four 16-bit values are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
777 stored with a simple 1-level 16-bit delta function. Reversing the predictor |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
778 follows the same procedure, except now adding the differences between stored |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
779 value and predicted value to get the real value. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
780 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
781 STHUFF (#77) - Interlaced Deflate |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
782 ------------ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
783 Version 1.3 onwards |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
784 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
785 Byte number 0 1 2 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
786 +--+--+-- - - - - - --+-- - - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
787 Hex values |4D| C| huffman codes | data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
788 +--+--+-- - - - - - --+-- - - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
789 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
790 This compresses data using huffman encoding using the Deflate |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
791 algorithm for storing the codes and data. It is analogous to using |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
792 zlib with the Z_HUFFMAN_ONLY strategy and a negative window |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
793 size. However it has a few tweaks for optimal compression of very |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
794 small data sets. See RFC 1951 for details of Deflate. If the following |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
795 text is in decrepancy with RFC 1951 then the RFC takes priority. The |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
796 following is included as additional explanatory material only. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
797 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
798 Huffman compression works by replacing each character (or 'symbol') |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
799 with a string of bits. Common symbols have are encoded using few bits |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
800 and rare symbols need a longer string of bits. The net effect is that |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
801 the overall number of bits needed to store a message is reduced. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
802 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
803 To uncompress a compressed data stream it is necessary to know which |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
804 symbols are present and what their bit-strings are. For brevity this |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
805 is achieved by storing only the lengths of the bit-string for each |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
806 symbol and generating bit-strings from the lengths. As long as the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
807 same canonical algorithm is used in both the encoder and decoder then |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
808 knowing the lengths alone is sufficient. Knowledge of this algorithm |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
809 is required for uncompressing the data, so it is defined as follows: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
810 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
811 1. Sort symbols by the length of their bit-strings, smallest first. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
812 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
813 The collating order for symbols sharing the same length is defined |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
814 as ASCII values 0 to 255 inclusive followed by the EOF symbol. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
815 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
816 2. X = 0 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
817 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
818 3. For all bit lengths 'L' from 1 to 24 inclusive: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
819 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
820 For all Symbols of bit length 'L', sorted as above: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
821 Code(Symbol) = least significant 'L' bits of X |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
822 X = X + 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
823 End loop |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
824 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
825 X = X * 2 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
826 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
827 End loop |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
828 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
829 This is the same algorithm utilised in the Deflate algorithm (RFC 1951). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
830 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
831 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
832 For example compressing "abracadabra" gives: /\ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
833 0 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
834 Symbol bit-length Code(X) / \ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
835 ------------------------------- a /\ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
836 a 1 0 0 / \ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
837 b 3 4 100 0 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
838 c 3 5 101 / \ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
839 r 3 6 110 / \ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
840 d 4 14 1110 /\ /\ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
841 EOF 4 15 1111 0 1 0 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
842 / \ / \ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
843 which in turn leads to 28 bits b c r /\ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
844 of output: 0 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
845 / \ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
846 0100110010101110010011001111 d EOF |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
847 (ab r ac ad ab r aEOF) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
848 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
849 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
850 In the data format defined above, 'C' is a code-set number. If it is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
851 zero the the huffman codes to uncompress 'data' are stored in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
852 following bytes using the same format describe in the DFLH chunk type |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
853 below, otherwise no huffman codes are stored and a predefined |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
854 set of huffman codes are used being either defined in a preceeding |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
855 DFLH chunk (for 128 <= 'C' <= 255) or statically defined in this |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
856 document (for 1 <= 'C' <= 127). Immediately following this is the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
857 compressed bit-stream itself. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
858 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
859 The statically defined huffman code-sets are as follows. The symbols |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
860 are listed below as their printable ASCII character or hash followed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
861 by a number, so A and #65 are the same symbol. We use the algorithm |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
862 described above to turn these bit-lengths into actual huffman codes. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
863 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
864 C=1: CODE_DNA |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
865 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
866 Length Symbols |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
867 ---------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
868 2 A C T |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
869 3 G |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
870 4 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
871 5 #0 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
872 6 EOF |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
873 13 #1 to #6 inclusive |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
874 14 #7 to #255 except where already listed above |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
875 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
876 C=2: CODE_DNA_AMBIG (DNA with IUPAC ambiguity codes) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
877 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
878 Length Symbols |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
879 ---------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
880 2 A C T |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
881 3 G |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
882 4 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
883 7 #0 #45 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
884 8 B D H K M R S V W Y |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
885 11 EOF |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
886 14 #226 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
887 15 #1 to #255 except where already listed above |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
888 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
889 C=3: CODE_ENGLISH (English text) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
890 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
891 Length Symbols |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
892 ---------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
893 3 #32 e |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
894 4 a i n o s t |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
895 5 d h l r u |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
896 6 #10 #13 #44 c f g m p w y |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
897 7 #46 b v |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
898 8 #34 I k |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
899 9 #45 A N T |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
900 10 #39 #59 #63 B C E H M S W x |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
901 11 #33 0 1 F G |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
902 15 #0 to #255 except where already listed above |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
903 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
904 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
905 It is recommended that this compression format is used only for small |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
906 data sizes and ZLIB is used for larger (a few K and above) data. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
907 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
908 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
909 QSHIFT (#79) - 4-byte quality reorder |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
910 ------------ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
911 Version 1.3 onwards |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
912 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
913 This reorders the quality signal to be 4-tuples of the quality for the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
914 called base followed by the quality of the other 3 base types in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
915 order they appear in a,c,g,t (minus the called base). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
916 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
917 The purpose is to allow a 4-byte interlaced deflate algorithm to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
918 operate efficiently. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
919 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
920 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
921 TSHIFT (#70) - 8-byte trace reorder |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
922 ------------ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
923 Version 1.3 onwards |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
924 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
925 This reorders the trace signal to be 4-tuples of the 16-bit trace |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
926 signals for the called base followed by the signal from the other 3 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
927 base types in the order they appear in a,c,g,t (minus the called |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
928 base). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
929 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
930 The purpose is to allow a 8-byte interlaced deflate algorithm to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
931 operate efficiently. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
932 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
933 FIXME: QSHIFT and TSHIFT could be general purpose byte rearrangements |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
934 without any knowledge of the data type they're holding. They need the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
935 input data to be consistently ordered and not the large differences we |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
936 see between quality and trace right now. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
937 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
938 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
939 Version 1.3 onwards |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
940 Chunk types |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
941 =========== |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
942 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
943 As described above, each chunk has a type. The format of the data contained in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
944 the chunk data field (when written in format 0) is described below. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
945 Note that no chunks are mandatory. It is valid to have no chunks at all. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
946 However some chunk types may depend on the existance of others. This will be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
947 indicated below, where applicable. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
948 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
949 Each chunk type is stored as a 4-byte value. Bit 5 of the first byte is used |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
950 to indicate whether the chunk type is part of the public ZTR spec (bit 5 of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
951 first byte == 0) or is a private/custom type (bit 5 of first byte == 1). Bit |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
952 5 of the remaining 3 bytes is reserved - they must always be set to zero. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
953 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
954 Practically speaking this means that public chunk types consist entirely of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
955 upper case letters (eg TEXT) whereas private chunk types start with a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
956 lowercase letter (eg tEXT). Note that in this example TEXT and tEXT are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
957 completely independent types and they may have no more relationship with each |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
958 other than (for example) TEXT and BPOS types. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
959 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
960 It is valid to have multiples of some chunks (eg text chunks), but not for |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
961 others (such as base calls). The order of chunks does not matter unless |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
962 explicitly specified. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
963 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
964 A chunk may have meta-data associated with it. This is data about the data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
965 chunk. For example the data chunk could be a series of 16-bit trace samples, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
966 while the meta-data could be a label attached to that trace (to distinguish |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
967 trace A from traces C, G and T). Meta-data is typically very small and so it |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
968 is never need be compressed in any of the public chunk types (although |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
969 meta-data is specific to each chunk type and so it would be valid to have |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
970 private chunks with compressed meta-data if desirable). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
971 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
972 The first byte of each chunk data when uncompressed must be zero, indicating |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
973 raw format. If, having read the chunk data, this is not the case then the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
974 chunk needs decompressing or reverse filtering until the first byte is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
975 zero. There may be a few padding bytes between the format byte and the first |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
976 element of real data in the chunk. This is to make file processing simpler |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
977 when the chunk data consists of 16 or 32-bit words; the padding bytes ensure |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
978 that the data is aligned to the appropriate word size. Any padding bytes |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
979 required will be listed in the appopriate chunk definition below. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
980 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
981 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
982 The following lists the chunk types available in 32-bit big-endian format. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
983 In all cases the data is presented in the uncompressed form, starting with the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
984 raw format byte and any appropriate padding. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
985 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
986 SAMP |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
987 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
988 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
989 Or Meta-data: (version 1.2 and before) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
990 Byte number 0 1 2 3 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
991 +--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
992 Hex values | data name | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
993 +--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
994 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
995 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
996 Byte number 0 1 2 3 4 5 6 7 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
997 +--+--+--+--+--+--+--+--+- -+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
998 Hex values | 0| 0| data| data| data| - | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
999 +--+--+--+--+--+--+--+--+- -+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1000 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1001 This encodes a series of 16-bit unsigned trace samples. The first data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1002 byte is the format (raw); the second data byte is present for padding |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1003 purposes only. After that comes a series of 16-bit big-endian |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1004 values. Although stored as unsigned, a baseline value can be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1005 specified which is should then be subtracted from all values to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1006 generated signed data if required. By default the baseline is zero. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1007 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1008 Valid identifiers for the meta-data (version 1.3 onwards) are: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1009 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1010 Ident Value(s) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1011 --------------------------------------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1012 TYPE "A", "C", "G", "T", "PYNO" or "PYRW" |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1013 OFFS 16-bit signed integer representing the 'zero' position, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1014 in ASCII. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1015 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1016 [ FIXME: signed or unsigned? Signed means we couldn't store data in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1017 the range from -48K to +16K. Unsigned means we couldn't store data in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1018 the range 10K to 70K. What's most useful? Or should OFFS be 32-bit |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1019 instead? ] |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1020 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1021 Versions prior to 1.3 specified meta-data consisted of a single 4-byte |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1022 block containing a 4-byte name associated with the trace. If a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1023 type-name is shorter than 4 bytes then it should be right padded with |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1024 nul characters to 4 bytes. For sequencing traces the four lanes |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1025 representig A, C, G and T signals have names "A\0\0\0", "C\0\0\0", |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1026 "G\0\0\0" and "T\0\0\0". PYNO and PYRW refer to normalised and raw |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1027 pyrogram data (eg from 454 instruments). At present other names are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1028 not reserved, but it is recommended that (for consistency with |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1029 elsewhere) you label private trace arrays with names starting in a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1030 lowercase letter (specifically, bit 5 is 1). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1031 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1032 For the purposes of backwards compatibility, readers should check the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1033 version number in the ZTR header to determine whether the old or new |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1034 style meta-data formatting is in use. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1035 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1036 For sequencing traces it is expected that there will be four SAMP chunks, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1037 although the order is not specified. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1038 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1039 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1040 SMP4 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1041 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1042 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1043 Meta-data: optional - see below |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1044 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1045 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1046 Byte number 0 1 2 3 4 5 6 7 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1047 +--+--+--+--+--+--+--+--+- -+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1048 Hex values | 0| 0| data| data| data| - | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1049 +--+--+--+--+--+--+--+--+- -+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1050 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1051 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1052 As per SAMP, this encodes a series of unsigned 16-bit trace values, to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1053 be base-line corrected by the OFFS meta-data value as appropriate. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1054 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1055 The first byte is 0 (raw format). Next is a single padding byte (also 0). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1056 Then follows a series of 2-byte big-endian trace samples for the "A" trace, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1057 followed by a series of 2-byte big-endian traces samples for the "C" trace, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1058 also followed by the "G" and "T" traces (in that order). The assumption is |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1059 made that there is the same number of data points for all traces and hence the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1060 length of each trace is simply the number of data elements divided by four. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1061 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1062 Experimentation has shown that this gives around 3% saving over 4 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1063 separate SAMP chunks, but it lacks in flexibility. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1064 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1065 Valid identifiers for the meta-data are: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1066 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1067 Ident Value(s) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1068 --------------------------------------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1069 OFFS 16-bit signed integer representing the 'zero' position |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1070 TYPE The type of data-set encoded. Values can be: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1071 "PROC" - processed data for viewing, also the default |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1072 when no type field is found. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1073 "SLXI" - Illumina GA raw intensities (.int.txt files) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1074 "SLXN" - Illumina GA noise intensities (.nse.txt files) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1075 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1076 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1077 BASE |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1078 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1079 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1080 Meta-data: optional - see below |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1081 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1082 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1083 Byte number 0 1 2 3 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1084 +--+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1085 Hex values | 0| base calls | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1086 +--+--+--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1087 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1088 The first byte is 0 (raw format). This is followed by the base calls in ASCII |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1089 format (one base per byte). By default it is assumed that all base |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1090 calls are stored using the IUPAC characters[1]. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1091 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1092 Valid identifiers for the meta-data are: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1093 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1094 Ident Meaning Value(s) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1095 --------------------------------------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1096 CSET Character-set 'I' (ASCII #73) => IUPAC ("ACGTUMRWSYKVHDBN") |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1097 '0' (ASCII #49) => ABI SOLiD ("0123N") |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1098 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1099 BPOS |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1100 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1101 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1102 Meta-data: none present |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1103 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1104 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1105 Byte number 0 1 2 3 4 5 6 7 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1106 +--+--+--+--+--+--+--+--+- -+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1107 Hex values | 0| padding| data | - | data | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1108 +--+--+--+--+--+--+--+--+- -+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1109 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1110 This chunk contains the mapping of base call (BASE) numbers to sample (SAMP) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1111 numbers; it defines the position of each base call in the trace data. The |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1112 position here is defined as the numbering of the 16-bit positions held in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1113 SAMP array, counting zero as the first value. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1114 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1115 The format is 0 (raw format) followed by three padding bytes (all 0). Next |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1116 follows a series of 4-byte big-endian numbers specifying the position of each |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1117 base call as an index into the sample arrays (when considered as a 2-byte |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1118 array with the format header stripped off). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1119 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1120 Excluding the format and padding bytes, the number of 4-byte elements should |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1121 be identical to the number of base calls. All sample numbers are counted from |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1122 zero. No sample number in BPOS should be beyond the end of the SAMP arrays |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1123 (although it should not be assumed that the SAMP chunks will be before this |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1124 chunk). Note that the BPOS elements may not be totally in sorted order as |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1125 the base calls may be shifted relative to one another due to compressions. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1126 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1127 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1128 CNF1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1129 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1130 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1131 Meta-data: optional - see below |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1132 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1133 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1134 Byte number 0 1 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1135 +--+--+-- - --+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1136 Hex values | 0| call confidence | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1137 +--+--+-- - --+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1138 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1139 (N == number of bases in BASE chunk) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1140 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1141 Valid identifiers for the meta-data are: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1142 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1143 Ident Value(s) Meaning |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1144 --------------------------------------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1145 SCALE PH Phred-scaled confidence values. (Default). i.e. for |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1146 a call with probability p: -10*log10(1-p) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1147 LO Log-odds scaled values. ie: 10*log10(p/(1-p)) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1148 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1149 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1150 The first byte of this chunk is 0 (raw format). This is then followed by a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1151 series signed 8-bit confidence values for the called bases. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1152 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1153 Either phred or log-odds (as used by the Illumina GA) scale ranges are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1154 appropriate. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1155 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1156 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1157 CNF4 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1158 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1159 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1160 Meta-data: optional - see below |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1161 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1162 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1163 Byte number 0 1 N 4N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1164 +--+--+-- - --+--+----- - -----+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1165 Hex values | 0| call confidence | A/C/G/T conf | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1166 +--+--+-- - --+--+----- - -----+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1167 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1168 (N == number of bases in BASE chunk) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1169 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1170 Valid identifiers for the meta-data are: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1171 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1172 Ident Value(s) Meaning |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1173 --------------------------------------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1174 SCALE PH Phred-scaled confidence values. i.e. for a call |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1175 with probability p: -10*log10(1-p) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1176 (NB: default, but often inappropriate.) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1177 LO Log-odds scaled values. ie: 10*log10(p/(1-p)) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1178 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1179 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1180 The first byte of this chunk is 0 (raw format). This is then followed by a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1181 series signed 8-bit confidence values for the called base. Next comes |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1182 all the remaining confidence values for A, C, G and T excluding those |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1183 that have already been written (ie the called base). So for a sequence |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1184 AGT we would store confidences A1 G2 T3 C1 G1 T1 A2 C2 T2 A3 C3 G3. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1185 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1186 The purpose of this is to group the (likely) highest confidence value (those |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1187 for the called base) at the start of the chunk followed by the remaining |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1188 values. Hence if phred confidence values are written in a CNF4 chunk the first |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1189 quarter of chunk will consist of phred confidence values and the last three |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1190 quarters will (assuming no ambiguous base calls) consist entirely of zeros. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1191 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1192 For the purposes of storage the confidence value for a base call that is not |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1193 A, C, G or T (in any case) is stored as if the base call was T. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1194 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1195 If only one confidence value exists per base then either the phred or |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1196 log-odds scales work well. The first N bytes will be the called bases |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1197 and the remaining 3*N will be zero (optimal for run-length-encoding), |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1198 but consider using the CNF1 chunk type instead in this situation. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1199 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1200 If all 4 base types have their own confidence value then the log-odds |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1201 scale will work well. In this case the phred scale is an inappropriate |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1202 choice as it cannot encode both very likely and very unlikely events. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1203 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1204 Note: if this chunk exists it must exist after a BASE chunk. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1205 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1206 TEXT |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1207 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1208 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1209 Meta-data: none present |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1210 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1211 Data: 0 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1212 +--+- - -+--+- - -+--+- -+- - -+--+- - -+--+-----+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1213 Hex values | 0| ident | 0| value | 0| - | ident | 0| value | 0| (0) | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1214 +--+- - -+--+- - -+--+- -+- - -+--+- - -+--+-----+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1215 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1216 This contains a series of "identifier\0value\0" pairs. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1217 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1218 The identifiers and values may be any length and may contain any data |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1219 except the nul character. The nul character marks the end of the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1220 identifier or the end of the value. Multiple identifier-value pairs |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1221 are allowable. Prior to version 1.3 a double nul character marked the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1222 end of the list (labeled "(0)" above), but from version 1.3 the end |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1223 of the list may also be marked by the end of chunk. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1224 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1225 Identifiers starting with bit 5 clear (uppercase) are part of the public ZTR |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1226 spec. Any public identifier not listed as part of this spec should be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1227 considered as reserved. Identifiers that have bit 6 set (lowercase) are for |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1228 private use and no restriction is placed on these. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1229 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1230 Multiple TEXT chunks may exist within the ZTR file. If so they are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1231 considered to be concatenated together. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1232 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1233 See below for the text identifier list. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1234 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1235 CLIP |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1236 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1237 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1238 Meta-data: none present |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1239 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1240 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1241 Byte number 0 1 2 3 4 5 6 7 8 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1242 +--+--+--+--+--+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1243 Hex values | 0| left clip | right clip| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1244 +--+--+--+--+--+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1245 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1246 This contains suggested quality clip points. These are stored as zero (raw |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1247 data) followed by a 4-byte big endian value for the left clip point and a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1248 4-byte big endian value for the right clip point. Clip points are defined in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1249 units of base calls, starting from 0. (Q: is that correct!?) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1250 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1251 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1252 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1253 CR32 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1254 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1255 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1256 Meta-data: none present |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1257 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1258 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1259 Byte number 0 1 2 3 4 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1260 +--+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1261 Hex values | 0| CRC-32 | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1262 +--+--+--+--+--+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1263 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1264 This chunk is always just 4 bytes of data containing a CRC-32 checksum, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1265 computed according to the widely used ANSI X3.66 standard. If present, the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1266 checksum will be a check of all of the data since the last CR32 chunk. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1267 This will include checking the header if this is the first CR32 chunk, and |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1268 including the previous CRC32 chunk if it is not. Obviously the checksum will |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1269 not include checks on this CR32 chunk. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1270 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1271 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1272 COMM |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1273 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1274 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1275 Meta-data: none present |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1276 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1277 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1278 Byte number 0 1 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1279 +--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1280 Hex values | 0| free text | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1281 +--+-- - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1282 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1283 This allows arbitrary textual data to be added. It does not require a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1284 identifier-value pairing or any nul termination. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1285 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1286 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1287 DFLH |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1288 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1289 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1290 Meta-data: none present |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1291 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1292 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1293 Byte number 0 1 N |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1294 +--+--+-- - - - - - - - - - - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1295 Hex values | 0| C| Deflate format data ... | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1296 +--+--+-- - - - - - - - - - - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1297 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1298 'C' is the code-set number referred to within that compression method. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1299 It should be 128 onwards and is used to distinguish between multiple |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1300 huffman tables. It is used in conjunction with the data compression |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1301 format 77 ("Deflate"). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1302 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1303 Following this is data in the Deflate format (RFC 1951). This should |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1304 consist of the header for a single block using dynamic huffman with |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1305 the BFINAL (last block) flag set. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1306 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1307 In Deflate streams the end of the huffman codes and the start of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1308 the compressed data stream itself may occur part way through a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1309 byte. Therefore the last byte of the this block is bitwise ORed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1310 with the first byte of the data stream compressed referring back to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1311 this code-set number. Therefore all unused bits in the last byte of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1312 this block should be set to zero. Likewise if the data bit-stream in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1313 this block ends on an exact byte boundary then an additional blank |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1314 byte must be added to ensure the ORing method above still works. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1315 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1316 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1317 DFLC |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1318 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1319 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1320 Meta-data: none present |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1321 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1322 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1323 Byte number 0 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1324 +--+---+- - - - ---+--+-- - - - - - - - - - - - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1325 Hex values | 0| C |code-order |FF| Deflate dynamic codes ... | |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1326 +--+---+- - - - ---+--+-- - - - - - - - - - - - --+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1327 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1328 Multi-context Deflate compression codes defined for use by data format |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1329 78 (HUFF_MULTI). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1330 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1331 This is like the DFLH format, except it encodes multiple huffman trees |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1332 instead of a single tree along with the order in which the multiple |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1333 trees should be used (the "code-order"). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1334 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1335 'C' is the code-set number referred to within that compression method. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1336 It should be 128 onwards and is used to distinguish between multiple |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1337 huffman tables. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1338 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1339 The code-order is a run-length encoded series of 8-bit numbers |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1340 indicating which huffman code set should be used for which byte. For |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1341 each byte in the input stream the HUFF_MULTI method selects the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1342 appropriate huffman code by using indexing code-order with the input |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1343 data position modulo the number of values in code-order. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1344 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1345 Following this is data in the Deflate format (RFC 1951). This should |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1346 consist of the header component for a single block using dynamic |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1347 huffman with the BFINAL (last block) flag set, up to and including |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1348 the HDIST+1 code lengths for the distance alphabet. This will then be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1349 immediately followed by the next set of huffman codes, and so on until |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1350 all index values containing within the code-order have been accounted |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1351 for. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1352 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1353 In Deflate streams the end of the huffman codes and the start of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1354 the compressed data stream itself may occur part way through a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1355 byte. Therefore the last byte of the this block is bitwise ORed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1356 with the first byte of the data stream compressed referring back to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1357 this code-set number. Therefore all unused bits in the last byte of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1358 this block should be set to zero. Likewise if the data bit-stream in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1359 this block ends on an exact byte boundary then an additional blank |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1360 byte must be added to ensure the ORing method above still works. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1361 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1362 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1363 For example, compression of 16-bit data is sometimes best achieved by |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1364 producing one set of huffman codes for the top 8 bits and another set |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1365 for the bottom 8 bits, rather than mixing these together by treating |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1366 the 16-bit data as a series of 8-bit quantities. In this case our |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1367 code-order would consist of just two entries; (0, 1). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1368 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1369 Alternatively we may have 4 1-byte confidence values stored per base |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1370 in the order of the confidence of the base-called base type first |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1371 followed by the 3 remaining confidence values. We observe that |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1372 compressing byte 0, 4, 8, 12, ... as one set and bytes 1,2,3, 5,6,7, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1373 ... as another set yields higher compression ratios. In this case the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1374 code-order would consist of 4 entries; (0, 1, 1, 1). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1375 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1376 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1377 REGN |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1378 ---- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1379 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1380 Meta-data: optional - see below |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1381 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1382 Data: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1383 Byte number 0 1 2 3 4 5 6 7 8 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1384 +--+---+---+---+---+---+---+---+---+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1385 Hex values | 0| 1st boundary | 2nd boundary | ... |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1386 +--+---+---+---+---+---+---+---+---+ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1387 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1388 This chunk is used to break a trace down into a series of segments. We |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1389 store the boundary between segments, so the list above will contain |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1390 one less boundary than there are segments with the first segment |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1391 implicitly starting from the first base and the last segment implictly |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1392 extending to the last base. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1393 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1394 Each 4-byte unsigned value indicates a position within the sequence or |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1395 trace counting from 0 as the first element and marking the first base |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1396 of the next region. For example three regions of DNA may be: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1397 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1398 0 1 2 3 4 5 6 7 8 9 10 11 12 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1399 T A C G G A T T C G A A C |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1400 |<-reg. 1->| |<--reg. 2--->| |<-reg. 3->| |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1401 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1402 This would give the 1st boundary as 4 and the 2nd boundary as 9. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1403 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1404 The lack of a REGN chunk implies one single region extending from the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1405 first to last base in the sequence. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1406 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1407 Valid identifiers for the meta-data are: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1408 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1409 Ident Meaning Value(s) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1410 --------------------------------------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1411 COORD Coordinate system 'T' = trace coordinates |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1412 'B' = base coordinations (default) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1413 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1414 NAME Region names A semicolon separated list of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1415 "name:code" pairs. Eg |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1416 primer1:T;read1:P;primer2:T;read2:P |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1417 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1418 [FIXME: NAME identifier here is the same as the REGION_LIST TEXT |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1419 identifier. We need to decide where it belongs and pick one. If we can |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1420 get a way to specify the default meta-data contents then logically |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1421 speaking the best place to store this is in the meta-data along side |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1422 the chunk data itself.] |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1423 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1424 The NAME identifier is used to attach a meaning to the regions |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1425 described in the data chunk. It consists of a semi-colon separated |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1426 list of names or name:code pairs. The codes, if present are a single |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1427 character from the predefined list below and are separated from the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1428 name by a colon. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1429 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1430 Code Meaning |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1431 --------------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1432 T Tech read (e.g. primer, linker) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1433 B Bio read |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1434 I Inverted read |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1435 D Duplicate read |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1436 P Paired read |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1437 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1438 FIXME: I don't like the above meanings. They don't, well, "mean" much |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1439 to me! What's a tech read? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1440 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1441 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1442 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1443 Text Identifiers |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1444 ================ |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1445 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1446 These are for use in the TEXT segments. None are required, but if any of these |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1447 identifiers are present they must confirm to the description below. Much |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1448 (currently all) of this list has been taken from the NCBI Trace Archive [2] |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1449 documentation. It is duplicated here as the ZTR spec is not tied to the same |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1450 revision schedules as the NCBI trace archive (although it is intended that any |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1451 suitable updates to the trace archive should be mirrored in this ZTR spec). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1452 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1453 The Trace Archive specifies a maximum length of values. The ZTR spec does not |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1454 have length limitations, but for compatibility these sizes should still be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1455 observed. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1456 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1457 The Trace Archive also states some identifiers are mandatory; these are marked |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1458 by asterisks below. These identifiers are not mandatory in the ZTR spec (but |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1459 clearly they need to exist if the data is to be submitted to the NCBI). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1460 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1461 Finally, some fields are not appropriate for use in the ZTR spec, such as |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1462 BASE_FILE (the name of a file containing the base calls). Such fields are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1463 included only for compatibility with the Trace Arhive. It is not expected that |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1464 use of ZTR would allow for the base calls to be read from an external file |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1465 instead of the ZTR BASE chunk. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1466 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1467 [ Quoted from TraceArchiveRFC v1.17 ] |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1468 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1469 Identifier Size Meaning Example value(s) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1470 ---------- ----- ---------------------------- ----------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1471 TRACE_NAME * 250 name of the trace HBBBA1U2211 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1472 as used at the center |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1473 unique within the center |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1474 but not among centers. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1475 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1476 SUBMISSION_TYPE * - type of submission |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1477 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1478 CENTER_NAME * 100 name of center BCM |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1479 CENTER_PROJECT 200 internal project name HBBB |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1480 used within the center |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1481 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1482 TRACE_FILE * 200 file name of the trace ./traces/TRACE001.scf |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1483 relative to the top of |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1484 the volume. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1485 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1486 TRACE_FORMAT * 20 format of the tracefile |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1487 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1488 SOURCE_TYPE * - source of the read |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1489 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1490 INFO_FILE 200 file name of the info file |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1491 INFO_FILE_FORMAT 20 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1492 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1493 BASE_FILE 200 file name of the base calls |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1494 QUAL_FILE 200 file name of the base calls |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1495 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1496 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1497 TRACE_DIRECTION - direction of the read |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1498 TRACE_END - end of the template |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1499 PRIMER 200 primer sequence |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1500 PRIMER_CODE which primer was used |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1501 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1502 STRATEGY - sequencing strategy |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1503 TRACE_TYPE_CODE - purpose of trace |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1504 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1505 PROGRAM_ID 100 creator of trace file phred-0.990722.h |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1506 program-version |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1507 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1508 TEMPLATE_ID 20 used for read pairing HBBBA2211 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1509 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1510 CHEMISTRY_CODE - code of the chemistry (see below) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1511 ITERATION - attempt/redo 1 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1512 (int 1 to 255) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1513 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1514 CLIP_QUALITY_LEFT left clip of the read in bp due to quality |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1515 CLIP_QUALITY_RIGHT right " " " " " |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1516 CLIP_VECTOR_LEFT left clip of the read in bp due to vector |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1517 CLIP_VECTOR_RIGHT right " " " " " |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1518 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1519 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1520 SVECTOR_CODE 40 sequencing vector used (in table) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1521 SVECTOR_ACCESSION 40 sequencing vector used (in table) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1522 CVECTOR_CODE 40 clone vector used (in table) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1523 CVECTOR_ACCESSION 40 clone vector used (in table) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1524 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1525 INSERT_SIZE - expected size of insert 2000,10000 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1526 in base pairs (bp) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1527 (int 1 to 2^32) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1528 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1529 PLATE_ID 32 plate id at the center |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1530 WELL_ID well 1-384 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1531 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1532 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1533 SPECIES_CODE * - code for species |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1534 SUBSPECIES_ID 40 name of the subspecies |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1535 Is this the same as strain |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1536 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1537 CHROMOSOME 8 name of the chromosome ChrX, Chr01, Chr09 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1538 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1539 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1540 LIBRARY_ID 30 the source library of the clone |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1541 CLONE_ID 30 clone id RPCI11-1234 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1542 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1543 ACCESSION 30 NCBI accession number AC00001 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1544 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1545 PICK_GROUP_ID 30 an id to group traces picked |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1546 at the same time. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1547 PREP_GROUP_ID 30 an id to group traces prepared |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1548 at the same time |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1549 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1550 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1551 RUN_MACHINE_ID 30 id of sequencing machine |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1552 RUN_MACHINE_TYPE 30 type/model of machine |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1553 RUN_LANE 30 lane or capillary of the trace |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1554 RUN_DATE - date of run |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1555 RUN_GROUP_ID 30 an identifier to group traces |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1556 run on the same machine |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1557 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1558 [ End of quote from TraceArchiveRFC ] |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1559 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1560 More detailed information on the format of these values should be obtained |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1561 from the Trace Archive RFC [2]. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1562 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1563 In addition to the above the following TEXT identifiers have meaning |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1564 specific to the ZTR format: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1565 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1566 Identifier Meaning Example value(s) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1567 ---------- ---------------------------- ------------------------------- |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1568 REGION_LIST A semi-colon separated list primer1:T;read1:P |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1569 identifying regions of a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1570 trace. See the REGN chunk Region 1;Region 2;Region 3 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1571 definition for details. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1572 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1573 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1574 FIXME: Should this simply be the meta-data associated with the REGN |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1575 chunk? |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1576 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1577 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1578 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1579 References |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1580 ========== |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1581 [1] IUPAC: http://www.chem.qmw.ac.uk/iubmb/misc/naseq.html |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1582 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1583 [2] http://www.ncbi.nlm.nih.gov/Traces/TraceArchiveRFC.html |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1584 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1585 [3] J.Bonfield and R.Staden, "ZTR: a new format for DNA sequence trace |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1586 data". Bioinformatics Vol. 18 no. 1 2002. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1587 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1588 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1589 FIXME: As an aside, not doing the final entropy encoding steps (zlib, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1590 deflate, etc) and just using bzip2 on an entire SRF archive yields a |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1591 considerable saving. On tests it varied between 23% (27bp reads) and |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1592 13% (74bp reads) smaller than the Deflate compressed |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1593 data. Unfortunately it pretty much removes all chance of random access |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1594 in the data unless I can get a working FM-Index implementation |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1595 (which is very unlikely in a short time). This makes it appropriate |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1596 for transmission perhaps, but not for indexing and querying random |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1597 sequences. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1598 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1599 A substantial chunk (5-9%) of this saving comes from the repeated ZTR |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1600 block types (names like "BASE", "CNF4" and common components like 0x00000000 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1601 for the meta-data size). The remainder probably comes from |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1602 similarities between one ZTR file and another. |