diff srf2fastq/io_lib-1.12.2/docs/Hash_File_Format @ 0:d901c9f41a6a default tip

Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author dawe
date Tue, 07 Jun 2011 17:48:05 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/srf2fastq/io_lib-1.12.2/docs/Hash_File_Format	Tue Jun 07 17:48:05 2011 -0400
@@ -0,0 +1,83 @@
+A Hash File is an on-disk copy of a Hash Table keyed by filenames and
+with data containing a file size and position within an archive. It's
+designed to be a general purpose indexing tool for most archive
+formats or for "solid" (concatenated) file archives.
+
+Basic operations need to be performed on hash files and there are
+tools to do this:
+
+Listing the contents
+	hash_list [-l]
+
+Extraction
+	hash_extract
+
+Concatenation
+	hash_cat
+
+
+The Hash File format is:
+
+Header, archive file name, file headers/footers, hash buckets, hash
+linked list items, footer.
+
+In more detail:
+
+Header:
+   ".hsh" (magic numebr)
+   x4     (1-bytes of version code, eg "1.00")
+   x1     (HASH_FUNC_? function used)
+   x1     (number of file headers: FH. These count from 1 to FH inclusive)
+   x1     (number of file footers: FF. These count from 1 to FF inclusive)
+   x1     (reserved - zero for now)
+   x4     (4-bytes big-endian; number of hash buckets)
+   x8     (offset to add to item positions. eg size of this index)
+   x4     (total size of hashfile, includingf header, ..., index, footer)
+Archive name:
+   x1     (length 'L', zero => no name)
+   xL      (archive filename)
+File headers (FH copies of):
+   x8     (position)
+   x4     (size)
+File footers (FH copies of):
+   x8     (position)
+   x4     (size)
+Buckets (multiples of)
+   x4     (4-byte offset of linked list pos,  rel. to the start of the hdr)
+Items (per bucket chain, not written if Bucket[?]==0)
+   x1     (key length 'K', zero => end of chain)
+   xK     (key)
+   x0.5   (File header to use. zero => none) top 4 bits
+   x0.5   (File footer to use. zero => none) bottom 4 bits
+   x8     (position)
+   x4     (size)
+Index footer:
+   ".hsh" (magic number)
+   x8     (offset to Hash Header. >=0 = absolute, -ve = relative to end)
+
+The HashFile index may either be a separate file to the archive, in
+which case the "Archive name" section references the archive itself,
+or part of the archive itself in which case archive name is zero
+length. Additionally if the archive name length is non-zero but the
+first byte of the archive filename is zero then it is also considered
+to be part of the same archive. This allows for an index previously
+generated as a separate file to simply be appended to the archive with
+a minimal of binary editing (ie zeroing 1 byte).
+
+The HashFile index may also be at the start (preferred and searched
+for first) or the end of the file. This is the rationale behind having
+an index footer. It allows us to simply append a hash of a tar file to
+the end of the tar file itself and it'll work just fine without
+breaking the format of the tar file. (Tar files end with a blank
+block, so additional data is not read by tar.) Appending the hashfile
+requires an extra 2 seeks and 1 read (if opening from scratch) to fetch
+a file compared to prepending the hashfile. 
+
+If the hash file was originally stored as a separate file from the
+archive but is now being merged then zero the first byte of the
+archive filename and either prepend or append as desired. If you
+prepend the hash file then note that all the absolute offsets in the
+Item structures will now be incorrect. A correction factor may be
+applied, of the size of the HashFile itself, and this is the purpose
+of the offset field in the header.
+