Mercurial > repos > dawe > srf2fastq
annotate srf2fastq/io_lib-1.12.2/docs/Hash_File_Format @ 0:d901c9f41a6a default tip
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author | dawe |
---|---|
date | Tue, 07 Jun 2011 17:48:05 -0400 |
parents | |
children |
rev | line source |
---|---|
0
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
1 A Hash File is an on-disk copy of a Hash Table keyed by filenames and |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
2 with data containing a file size and position within an archive. It's |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
3 designed to be a general purpose indexing tool for most archive |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
4 formats or for "solid" (concatenated) file archives. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
5 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
6 Basic operations need to be performed on hash files and there are |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
7 tools to do this: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
8 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
9 Listing the contents |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
10 hash_list [-l] |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
11 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
12 Extraction |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
13 hash_extract |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
14 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
15 Concatenation |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
16 hash_cat |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
17 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
18 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
19 The Hash File format is: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
20 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
21 Header, archive file name, file headers/footers, hash buckets, hash |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
22 linked list items, footer. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
23 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
24 In more detail: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
25 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
26 Header: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
27 ".hsh" (magic numebr) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
28 x4 (1-bytes of version code, eg "1.00") |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
29 x1 (HASH_FUNC_? function used) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
30 x1 (number of file headers: FH. These count from 1 to FH inclusive) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
31 x1 (number of file footers: FF. These count from 1 to FF inclusive) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
32 x1 (reserved - zero for now) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
33 x4 (4-bytes big-endian; number of hash buckets) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
34 x8 (offset to add to item positions. eg size of this index) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
35 x4 (total size of hashfile, includingf header, ..., index, footer) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
36 Archive name: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
37 x1 (length 'L', zero => no name) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
38 xL (archive filename) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
39 File headers (FH copies of): |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
40 x8 (position) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
41 x4 (size) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
42 File footers (FH copies of): |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
43 x8 (position) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
44 x4 (size) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
45 Buckets (multiples of) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
46 x4 (4-byte offset of linked list pos, rel. to the start of the hdr) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
47 Items (per bucket chain, not written if Bucket[?]==0) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
48 x1 (key length 'K', zero => end of chain) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
49 xK (key) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
50 x0.5 (File header to use. zero => none) top 4 bits |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
51 x0.5 (File footer to use. zero => none) bottom 4 bits |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
52 x8 (position) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
53 x4 (size) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
54 Index footer: |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
55 ".hsh" (magic number) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
56 x8 (offset to Hash Header. >=0 = absolute, -ve = relative to end) |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
57 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
58 The HashFile index may either be a separate file to the archive, in |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
59 which case the "Archive name" section references the archive itself, |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
60 or part of the archive itself in which case archive name is zero |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
61 length. Additionally if the archive name length is non-zero but the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
62 first byte of the archive filename is zero then it is also considered |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
63 to be part of the same archive. This allows for an index previously |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
64 generated as a separate file to simply be appended to the archive with |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
65 a minimal of binary editing (ie zeroing 1 byte). |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
66 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
67 The HashFile index may also be at the start (preferred and searched |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
68 for first) or the end of the file. This is the rationale behind having |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
69 an index footer. It allows us to simply append a hash of a tar file to |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
70 the end of the tar file itself and it'll work just fine without |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
71 breaking the format of the tar file. (Tar files end with a blank |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
72 block, so additional data is not read by tar.) Appending the hashfile |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
73 requires an extra 2 seeks and 1 read (if opening from scratch) to fetch |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
74 a file compared to prepending the hashfile. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
75 |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
76 If the hash file was originally stored as a separate file from the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
77 archive but is now being merged then zero the first byte of the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
78 archive filename and either prepend or append as desired. If you |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
79 prepend the hash file then note that all the absolute offsets in the |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
80 Item structures will now be incorrect. A correction factor may be |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
81 applied, of the size of the HashFile itself, and this is the purpose |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
82 of the offset field in the header. |
d901c9f41a6a
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff
changeset
|
83 |