annotate srf2fastq/io_lib-1.12.2/docs/Hash_File_Format @ 0:d901c9f41a6a default tip

Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author dawe
date Tue, 07 Jun 2011 17:48:05 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
1 A Hash File is an on-disk copy of a Hash Table keyed by filenames and
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
2 with data containing a file size and position within an archive. It's
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
3 designed to be a general purpose indexing tool for most archive
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
4 formats or for "solid" (concatenated) file archives.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
5
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
6 Basic operations need to be performed on hash files and there are
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
7 tools to do this:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
8
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
9 Listing the contents
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
10 hash_list [-l]
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
11
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
12 Extraction
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
13 hash_extract
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
14
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
15 Concatenation
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
16 hash_cat
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
17
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
18
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
19 The Hash File format is:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
20
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
21 Header, archive file name, file headers/footers, hash buckets, hash
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
22 linked list items, footer.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
23
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
24 In more detail:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
25
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
26 Header:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
27 ".hsh" (magic numebr)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
28 x4 (1-bytes of version code, eg "1.00")
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
29 x1 (HASH_FUNC_? function used)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
30 x1 (number of file headers: FH. These count from 1 to FH inclusive)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
31 x1 (number of file footers: FF. These count from 1 to FF inclusive)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
32 x1 (reserved - zero for now)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
33 x4 (4-bytes big-endian; number of hash buckets)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
34 x8 (offset to add to item positions. eg size of this index)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
35 x4 (total size of hashfile, includingf header, ..., index, footer)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
36 Archive name:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
37 x1 (length 'L', zero => no name)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
38 xL (archive filename)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
39 File headers (FH copies of):
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
40 x8 (position)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
41 x4 (size)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
42 File footers (FH copies of):
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
43 x8 (position)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
44 x4 (size)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
45 Buckets (multiples of)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
46 x4 (4-byte offset of linked list pos, rel. to the start of the hdr)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
47 Items (per bucket chain, not written if Bucket[?]==0)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
48 x1 (key length 'K', zero => end of chain)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
49 xK (key)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
50 x0.5 (File header to use. zero => none) top 4 bits
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
51 x0.5 (File footer to use. zero => none) bottom 4 bits
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
52 x8 (position)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
53 x4 (size)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
54 Index footer:
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
55 ".hsh" (magic number)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
56 x8 (offset to Hash Header. >=0 = absolute, -ve = relative to end)
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
57
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
58 The HashFile index may either be a separate file to the archive, in
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
59 which case the "Archive name" section references the archive itself,
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
60 or part of the archive itself in which case archive name is zero
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
61 length. Additionally if the archive name length is non-zero but the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
62 first byte of the archive filename is zero then it is also considered
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
63 to be part of the same archive. This allows for an index previously
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
64 generated as a separate file to simply be appended to the archive with
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
65 a minimal of binary editing (ie zeroing 1 byte).
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
66
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
67 The HashFile index may also be at the start (preferred and searched
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
68 for first) or the end of the file. This is the rationale behind having
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
69 an index footer. It allows us to simply append a hash of a tar file to
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
70 the end of the tar file itself and it'll work just fine without
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
71 breaking the format of the tar file. (Tar files end with a blank
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
72 block, so additional data is not read by tar.) Appending the hashfile
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
73 requires an extra 2 seeks and 1 read (if opening from scratch) to fetch
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
74 a file compared to prepending the hashfile.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
75
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
76 If the hash file was originally stored as a separate file from the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
77 archive but is now being merged then zero the first byte of the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
78 archive filename and either prepend or append as desired. If you
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
79 prepend the hash file then note that all the absolute offsets in the
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
80 Item structures will now be incorrect. A correction factor may be
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
81 applied, of the size of the HashFile itself, and this is the purpose
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
82 of the offset field in the header.
d901c9f41a6a Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
dawe
parents:
diff changeset
83