Mercurial > repos > dawe > srf2fastq
diff srf2fastq/io_lib-1.12.2/CHANGES @ 0:d901c9f41a6a default tip
Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author | dawe |
---|---|
date | Tue, 07 Jun 2011 17:48:05 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/srf2fastq/io_lib-1.12.2/CHANGES Tue Jun 07 17:48:05 2011 -0400 @@ -0,0 +1,1342 @@ +Version 1.12.2 (15th Jan 2010) +-------------- + +* Extra options in srf2fastq: -S to output split regions sequentially + to stdout. -r to request a region to be reverse complemented before + output. + +* API addition + - Added pooled_alloc.h. This is a general purpose mechanism of + pooling multiple fixed size memory allocations into fewer malloc() + library calls. + - HashTables now have a HASH_POOL_ITEMS option to use the above + pooling system. This reduces memory wasted and speeds them up. + +* Bug fix: Fixed ztr_add_text() so that is leaves two nul bytes on the + end of TEXT chunks instead of one, as documented in the ZTR + specification. + +* Bug fix: Fixed buffer overrun in parse region chunks; srf2fastq and + srf2fasta. + +* Bug fix: API read_sff_read_data() did not skip ahead to the next + 8-byte boundary. + + +Version 1.12.1 (7th August 2009) +-------------- + +* Fixed the endianness detection in io_lib/os.h when used in + conjuction with auto-conf. This fix allows for "fat" binaries to be + built on MacOS X. + +* Fixed io_lib-config program to use -lstaden-read instead of -lread. + + +Version 1.12.0 (29th July 2009) +-------------- + +* Renamed the library from libread.so to libstaden-read.so. This was + already the case for the Fedora bundled RPM. + +* Switched to using libtool to allow building of dynamic libraries. + Note that this is tweaked to not use -rpath though. Proper library + versioning has been added too. + +* Removed deprecated platform specific tools: illumina2srf, + srf2illumina. + +* Srf_info now reports the compressed size of chunks, sorted by type, + in addition to their counts. It also correctly sums to over 2Gb now + for base-call counting. + +* Various SRF tools have had the maximum sequence length changed from + 1024 to 10000. This allows for even the most gifting capillary traces. + +* API + - The Array functions now take size_t instead of int for the + array dimensions. (API CHANGE) + + - Removed the (unused?) pipe2 function from compress.h. This was + intended to be internal only, and it now clashes with a new linux + kernel function. (API CHANGE) + + - Added iterators to the HashTable* api. + +* Bug fixes + + - Fixed a memory allocation bug in the codes2codeset() function. + + - ztr2read() should now work better on ZTR structs with no BPOS + chunk. + + - Fixed various srf tools when facing an SRF file containing zero + chunks in the data block header. + + - index_tar handles some GNU tar extensions better (LongLink). + + +Version 1.11.6.1 (9th December 2008) +---------------- + +* Identical except removal of a debugging printf statement in solexa2srf. + + +Version 1.11.6 (9th December 2008) +-------------- + +* illumina2srf, srf2illumina, srf2fastq + - We no longer change from log-odds to phred when storing data in + SRF, instead preferring to just mark it in correct input + scale. srf2fastq now honours this scale information and so the + conversion from log-odd to phred is done at the export stage + instead. (Chris Saunders) + + - Bug fix to srf2illumina qcal conversion. Combined with above + changes the qcal output should now be 100% identical to the + original data input via illumina2srf. + +* API + - New function srf_next_ztr_flags. This is like srf_next_ztr but + also returns the SRF flags value (good/bad read, etc). + +* srf_filter, srf2fastq, srf_info (Steven Leonard) + - Improved support for multiple index blocks in SRF files, eg from + manually concatenated files. + + - srf2fastq now sports options for splitting the output into + multiple fastq files when the input data is a paired-end run. + + +Version 1.11.5 (3rd December 2008) +-------------- + +* Illumina2srf + - Fixed major bug with using *both* -qf and -qr together. The + quality values for the reverse strand were shifted by one + character. + + - Fixed qcal quality values so they're not shifted down by 64 + (illumina format fastq). + + - Fixed bugs in parsing directory names if not matching the expected + format. + +* Removed major memory leaks from srf_filter. + +* hash_sff now has support for outputting the table of contents to a + new file rather than appending to an existing sff file or copying + the entire contents to a new file. + +* Various man pages have been added. The list is still incomplete + though. Additions are most welcome. + +* New program: srf_list. This lists and/or counts the number of + sequences within an SRF file. + + +Version 1.11.4 (11th September 2008) +-------------- + +* New "make check" build target to perform some automated tested. + Currently limited to testing the SRF tools. + +* Fixed machine endianness issues. Specifically this resolves known Intel + MacOS-X problems. + +* New SRF tools + + - srf_info: reports simple metrics on the contents of an SRF file. + + - srf_filter: slices and dices the SRF file to produce a new one + with various types of data removed. + +* illumina2srf + + - Minor float/int rounding change when storing int/nse/sig2 data. + + - Improved error detection such that it returns a failure code more + often given a parsing issue. + + - Added -pf/pr parameters for storing Phasing files. + + - Reduced memory usage, especially on large numbers of clusters per + tile. We may now produce multiple DBH blocks per tile. Also major + reduction to memory when handling the .params files. + + - Added storage of 2nd .params file (firecrest). + + - Fixed bug in the automatic base-call version identification. + + - Fixed a bug with using -qf/qr when not providing all tiles (ie not + starting from tile number 1). + + - Bug fix with storing the reverse matrix file in paired-end runs; a + duplicate of the forward one was being used instead. + +* General SRF + + - Improved error checking in srf_index_hash. It now spots duplicate + reads and also has a -c option to check an existing SRF file + without writing the index. + + - Fixed a memory leak in srf_next_ztr(), triggered in srf2fastq -C. + +Version 1.11.3 (9th July 2008) +-------------- + +* illumina2srf change: + + - IMPORTANT bug fix to illumina2srf when using the "-r" flag to + store raw (.int and .nse) data. This could often result in + corrupting the data ZTR meta-data for the SMP4 chunks resulting in + confusion over which trace channels are raw and which are + processed. + + Fortunately the corruption is reversable. For more details and a + fix see the ssrformat announcement of the issue: + + http://www.bcgsc.ca/pipermail/ssrformat/2008-July/000531.html + +* General SRF changes: + + - Removed a memory leak in ztr_find_chunks(). + + - Added SRFB_NULL_INDEX as an SRF block type. This provides a more + transparent way to skip over the 8 zero value bytes that may exist + at the end of an SRF file missing an index block. + +* Other changes + + - Fixed a bug in extract_seq when operating on multiple files and + outputting to a file rather than a pipe. An erroneous seek in the + mFILE code lead to it repeatedly truncating the output, resulting + in one sequence file at the end instead of multiple files. + + +Version 1.11.2 (4th June 2008) +-------------- + +* solexa2srf/srf2solexa changes: + + - Renamed to illumina2srf/srf2illumina. + + - Incorporated support the IPAR format (Come Raczy, Illumina). + + - Added support for qcal format data (Come Raczy). + + - Added -C option to tag data as failing the chastity filter, but it + is still included in the SRF output (Camil Toma). + + - Many more additional features added to srf_dump_all provided by + Camil Toma. It somewhat overlaps srf2solexa now, but may still + have it's own use. + + - Ztr TEXT chunks now output in srf2solexa. + + - Improved ways to specify matrices (-mf/-mr) in solexa2srf. + + - solexa2srf is substantially faster when reading gzipped files. + + - The -N/-n naming scheme options for solexa2srf now default to the + same conventions used by GERALD. Added additional %d, %m and %r + format rules too. + + - Calibrated confidence values are now output if -qf or -qr + paramaters are used, in addition to uncalibrated ones. These are + stored in phred scale in a CNF1 ZTR chunk. + +* srf2fastq now has a -c option to output calibrated confidence values + (if present). It also supports multiple archives on the command line. + +* SRF fixes: + + - Better handling of full pathnames in solexa2srf. + + - Use binary IO mode; fixes bugs on Windows. + + - Fixed an error where some chunks were not compressed properly + (valid still, just not compressed). + + - Removed memory corruption in solexa2srf (in rare cases). + + - Fixed bug with binary formatted read_id suffixes (fixed by + Cristian Goina). + + - Initialised memory in hash table code (used in indexing amongst + other things). + + - Indexes very occasionally failed to find a trace that did infact + exist. + + - Removed memory leak in construct_trace_name (patch from John + Emhoff, Helicos). + + - Fixed reading of XML block in srf_read_xml(). From John Emhoff. + + +* Added SRF= format string to TRACE_PATH to facilitate on-the-fly + extraction from indexed SRF files. This means io_lib can now + transparently pull traces from an archive or treat it as if it was a + directory - eg "foo.srf/IL15_..._123:456". + +* Bug fix (SF-1898427) - now builds on Fedora. + +* Better handling of 64-bit file size sensing in autoconf. + + +Version 1.11.1 (not officially released - internal testing only) +-------------- + +Version 1.11.0 (20th February 2008) +-------------- + +First official release of v1.11.0 and SRF support. + +* Further speed improvements to solexa2srf. + +* Added extract_qual program (analogous to extract_seq). + +* Added new srf2fasta program and also sped up srf2fastq by 25%. + +* Solexa2srf now supports storing the raw .int/.nse trace data instead + of or in addition to the processed .sig2 data. + +* Solexa2srf now stores enough to reproduce sufficient firecrest + output to rerun the solexa basecaller. Specifically that's a couple + matrix files and 'region' data for paired end runs. + +* Minor changes / bug fixes: + + - extract_seq no longer attempts to gzip the output by default if + the input was gzipped + + - ztr2read conversion (eg visible in trace_dump) now correctly + handles ZTR files with multiple SMP4 chunks. + + - Fixed memory leaks in various bits of SRF code (srf_extract_linear + mainly and srf_index_hash). + + +Version 1.11.0b8 (25th January 2008) +---------------- + +(Hopefully final beta test of SRF code before official 1.11.0 release.) + +* Bug fixed the index format. We incorrectly handled null dbhFile and + containerFile elements plus incorrectly computing the index size. + +* Improvements for solexa2srf code. + - Can store raw vs processed data + - Stores matrix and .params contents. + - Optional chastity filtering. + - Input data may now be gzipped. + +* Minor fixes to output of trace_dump and ztr_dump. + +* Minor srf_index_hash bug fixes (when dealing with concatenated + indexed files). + + +Version 1.11.0b7 (11th January 2008) +---------------- + +* IMPORTANT bug fix to the SRF format. The Data Block Header had the + blocksize field 4 bytes too large. Now fixed. Old SRF files will not + be readable by this new code (as they were in error). + + +Version 1.11.0b6 (2nd January 2008) +---------------- + +* Changes to adhere to SRF v1.3: + +* Removal of the readID counter. + +* Added support for printf style name formatting. + +* Minor index format tweaks (64-bit data, dch/container filenames). + Index format is therefore now 1.01. + + +Version 1.11.0b5 (8th November 2007) +---------------- + +* Major reorganisation of directories. All library code is in subdir + "io_lib". The code now uses "io_lib/xxx.h" in all include statements + too. + +* Fixed memory leaks in ZTR code + +* Various SRF bug fixes and better support for sample OFFS metadata in + both ZTR/ZTR. + +* Added srf_extract_hash program to perform random-access on a hash + indexed SRF archive. + + +Version 1.11.0b4 (26th October 2007) +---------------- + +* The SRF format now supported adheres to version 1.2. + +* More speedups, in particular focusing on uncompression this time, so + srf2solexa is an order of magnitude faster. + +* ztr2read() now honours the read_sections() options and so is much + faster when only decoding (say) base and quality values. + +* New program srf2fastq. + +* Internal changes to various ztr data structures. If you use these + yourself take note of the new ztr_owns fields to avoid memory leaks. + + +Version 1.11.0b3 (16th October 2007) +---------------- + +* Major speed improvements for compression. solexa2srf is now 30-35x faster. + +* Fixed various buffer overruns and memory leaks reported by valgrind + in the new deflate interlaced and SRF code. + + +Version 1.11.0b2 (2nd October 2007) +---------------- + +* Minor version change to fix typoes in Makefile system. + + +Version 1.11.0b1 (28th September 2007) +---------------- + +Beta release 1. + +* Added preliminary SRF support. This consists of a new subdirectory + 'srf' (yes these all really need merging into a single directory, + but that's a later task), a substantial update to ZTR and a variety + of SRF tools in progs. + + The old huffman_static.[ch] files were renamed and substantially + worked upon to create deflate_interlaced.[ch]. + + Added new compression types. xrle2, tshift and qshift. The latter two + of these are very specific to trace and quality packings. May need to + rename to be more generic. + + +Version 1.10.3 (???) +-------------- + +* The HashTable interface now also allows for Bob Jenkins' lookup3 + 64-bit hash function. This allows for substantially larger hash + tables. + +* Replaced tempnam() with tmpfile(). On systems without tmpfile + (Windows) this is simply a wrapper to use the old tempnam calls. + +* hash_extract bug fix for windows: now operates in binary mode. + +* INCOMPATIBLE CHANGE: On windows we now use semi-colon as the path + separator. The reason is that with the MinGW getenv() seems to do + "clever things" with PATH variables and consequently ends up + corrupting our clumsy attempt of escaping colons in paths. + +* Fasta format is semi-supported in "plain" format. It returns the + first entry when reading. + +* Experimental support for static huffman (STHUFF) compression type. + + +Version 1.10.2 (30th May 2007) +-------------- + +Primarily this is a bug fix release. + +* Convert_trace now has -signed and -noneg options to control signed + vs unsigned issues when shifting trace data about. + +* Include files now have C++ extern "C" style guards around them. + +* Various programs now accept -ztr command line arguments to force ZTR + format reading. This is for consistencies sake only and it is + recommended that users simply let the programs automatically detect + the file formats. + +* Hash_exp now outputs to the same file containing the experiment + files (in appended hash-table mode). It also has better Windows + handling (stripping ^M and using binary mode). + +* hash_extract bug fix: now only needs at least 1 filename specified + when fofn mode is not in use. + +* mFILE emulation: bug fixes when dealing with ftruncate, append mode, + checking for read/write flags, new mfcreate_from() function. + +* ZTR: added an experimental ZTR_FORM_STHUFF compression scheme. This + uses static huffman encoding on a predefined hard-coded set of + huffman tables. The purpose (as yet not put into action) is to allow + efficient compression of very small data sets for Illumina, AB + SOLiD, etc style traces. + + +Version 1.10.1 (20th June 2006) +-------------- + +* Trace files are now opened in read-only mode by default + (open_trace_file func). + + +Version 1.10.0 (15th June 2006) +-------------- + +* Two new environment variables are used, EXP_PATH and TRACE_PATH, to + replace RAWDATA. EXP_PATH is used when the new open_exp_mfile() + function is called and TRACE_PATH is used when open_trace_mfile() is + called. Both default to using RAWDATA when EXP or TRACE env is now + found. Also defined a trace type TT_ANYTR which is analogous to the + existing TT_ANY except it will not look for experiment or plain + format files. + + Modified the various example programs to use the appropriate open + call. This allows for traces and experiment files to have identical + names, such as is usually the case when querying named trace objects + from a trace server. + +* New program: extract_fastq to generate FASTQ output format. + +* New program: hash_exp. This allows multiple experiment files to be + contatenated together and then indexed so io_lib can still treat + them as single files. + +* The URL based search path mechanism now by default uses libcurl + instead of wget. This makes it considerably faster. + +* If an element in RAWDATA, EXP_PATH or TRACE_PATH now starts with the + pipe symbol ("|") then the compressed file extension code is negated + for that search element. (This prevents looking for foo.gz, foo.Z, + foo.bz2, etc if it fails to find foo.) + +* Added HashTableDel() and HashTableRemove() functions to take items + out of a hash table. + +* ZTR's compress_chunk() and uncompress_chunk() functions are now + externally callable. + +* New program io_lib-config. This has --version, --cflags and --libs + options to query the appropriate configuration when compiling and + linking against io_lib. There's also a new io_lib.m4 file which + provides an AC_CHECK_IO_LIB autoconf macro to use io_lib-config and + generate appropriate Makefile substitutions. + +* Updated the autoconf code to support libcurl searching. + +* Renamed SCF's delta_samples[12] functions to be + scf_delta_samples[12]. (From Saul Kravitz) + +* Added a '-error filename' option to convert_trace. (From Saul Kravitz) + +* Bug fix: HashTableAdd() now works properly with non-string keys. + +* Bug fix to read_dup(). + +* Bug fix to xrle which could read past the array bounds. It also now + handles run-lengths of 256 or more. + +* Bug fix: the fwrite_* functions no longer close the FILE pointer + given to them. + +* Bug fix to fdetermine_trace_type(); it now rewinds the file back. + +* Bug fix to mfseek and mrewind; they both now clear the EOF flag. + +* Bug fix to find_file_dir(). + + +Version 1.9.2 (14th December 2005) +------------- + +* Added AC_CHECK_LIB calls for the nsl and socket libraries + (gethostbyname / socket functions). Needed for Solaris compilations. + +* In extract_seq, used open_trace_mfile instead of + open_trace_file. Functionally this is the same, but it is faster. + +* fwrite_reading() now frees the temporary mFILE it created. + +* mfreopen_compressed() no longer closes the original FILE + pointer. This brings it back into line with the original + functionality provided in 1.8.x. It also cures a bug where the old + file pointer was often left opening meaning operates on many files + could could cause a resource leak ending in the inability to open + more trace files. + +* Added private_data and private_size to the Read struct. Populate + these when reading SCF files. + +* Hash_extract now returns an error code to the calling process upon + failure. + +* Major overhaul of hash_sff. It no longer loads the entire file into + memory. It can now cope with adding a hash index to an archive that + already contains an index. + +* Added support for 454's "sorted index" code. NB this is based on the + extraction code from their getsff.c code and has not been tested + with a genuine indexed SFF file yet. + +* Fixed an uninitialised memory access in mfload(). + +* Fixed a bug where hash query searches for items that do not exist + and map to an empty bucket could cause hangs or crashes. + +* Fixed a hang in mfload() when reading a zero length file. + + +Version 1.9.1 +------------- + +* Implemented the SFF (454) file structure, currently as read-only. + This is supported both as an archive containing multiple files and + also as a single SFF entry. + +* Allow for SFF=? components in RAWDATA search path. + +* Tar files, SFF archives and hashed archives (eg hashed tar, sff, or + "solid" archives) may now be used as part of a pathname. Eg if a + tar file foo.tar contains entry xyzzy.ztr then we can ask to fetch + trace foo.tar/xyzzy.ztr instead of requiring setting of the + RAWDATA environment variable. + +* Changed the HashFile format slightly. It's now format 1.00. + + The key difference is that it has a file footer pointing back to the + hashfile header (so the hashfile can be appended to an archive) and + it also has an offset in the header to apply to all seeks within the + archive itself, so it can be prepending to an archive that's already + been indexed without breaking the offsets. + + Extended the hash_tar program to allow control over these header options. + +* Fixed divide-by-zero buf when calling mfread for zero + +* Removed the warning for unknown ZTR chunk types. It now just + silently stores them in memory. + +* mfopen now honours binary verses ascii differences (and so updated + Read.c calls accordingly) so that Windows works better. + +* Removed file descriptor 'leak' in write_reading(). + +* Unset compression_used when opening uncompressed files instead of + leaving as the last value. + +* Fixed a file descriptor (and some memory) leak in + freopen_compressed. (Bug ID #1289095) + +* Fixed the hash file saving and loading so that it works on all + platforms instead of just x86 linux. There were bugs in assuming the + size of structures. The assumptions are still there in that I assume + they pad the same internally (for ease of coding - we can change it + when we finally see a system which operates differently), but the + final "boundary" padding has been resolved. + + +Version 1.9.0 +------------- + +* ***INCOMPATIBILITIES*** to 1.8.12 + + - The Exp_info structure now internally contains an "mFILE *" member + instead of "FILE *" member. If you use the experiment file functions + for I/O then hopefully it'll still work. However if you directly + manipulated the Exp_info yourself using fprintf etc then you will + need to modify your code. + + - Some functions no longer have external scope. Most of these did not + previously have external function prototypes. If you have a burning + need to use one of these, please contact me directly via sourceforge. + The full list is: + + ctfType (global variable) ztr_encode_samples_C + replace_nl ztr_encode_samples_G + ctfDecorrelate ztr_encode_samples_T + exp_print_line_ ztr_decode_samples + find_file_tar ztr_encode_bases + find_file_archive ztr_decode_bases + find_file_url ztr_encode_positions + ztr_write_header ztr_decode_positions + ztr_write_chunk ztr_encode_confidence_1 + ztr_read_header ztr_decode_confidence_1 + ztr_read_chunk_hdr ztr_encode_confidence_4 + compress_chunk ztr_decode_confidence_4 + uncompress_chunk ztr_encode_text + ztr_encode_samples_4 ztr_decode_text + ztr_decode_samples_4 ztr_encode_clips + ztr_encode_samples_common ztr_decode_clips + ztr_encode_samples_A + + - Some external functions have changed prototypes to use mFILE instead + of FILE. Most cases of these I've put in place a wrapper function + with the old name, but not yet all. Functions changed are: + + ctfFRead write_scf_samples32 + ctfFWrite write_scf_base + exp_print_line write_scf_bases + exp_print_mline write_scf_bases3 + exp_print_seq write_scf_comment + read_scf_header fcompress_file + read_scf_sample1 fopen_compressed + read_scf_samples1 freopen_compressed + read_scf_samples31 be_write_int_1 + read_scf_sample2 be_write_int_2 + read_scf_samples2 be_write_int_4 + read_scf_samples32 be_read_int_1 + read_scf_base be_read_int_2 + read_scf_bases be_read_int_4 + read_scf_bases3 le_write_int_1 + read_scf_comment le_write_int_2 + write_scf_header le_write_int_4 + write_scf_sample1 le_read_int_1 + write_scf_samples1 le_read_int_2 + write_scf_samples31 le_read_int_4 + write_scf_samples2 fdetermine_trace_type + + - Removed support for the OLD unix "pack" program as a valid trace + compression algorithm. + + - Removed CORBA support. (It wasn't enabled and I've no idea if it + even worked as I cannot test it.) + + - The default search order for RAWDATA now has the current working + directory at the end of RAWDATA instead of the start. + +* Significant speed ups, particularly when dealing with reading + gzipped files or when extracting data from tar files. + +* New external functions for faster access via mFILE (memory-file) + structs. These mimic the fread/fwrite calls, but with mfread/mfwrite + etc. + +* Numerous minor tweaks and updates to fix compiler warnings on more + stricter modes of the Intel C Compiler. + +* Preliminary support for storing pyrosequencing style traces. This + has been modeled on the flowgram data from 454, but should be + applicable to other platforms. ZTR has been updated to incorporate + this too. + + The Read structure also has flow, flow_order, nflows and flow_raw + elements too. Code to convert these into the more usual traceA/C/G/T + arrays exists currently as part of Trev (in tk_utils in the Staden + Package), but this may move into io_lib for the next official release. + +* New hash_tar and hash_extract programs. These replace the index_tar + program for rast random access. For RAWDATA include "HASH=hashfile" + as an element to get io_lib to use the archive hash. It's possible + to create hash files of most archive formats as the hash itself + contains the offset and size of each item in the archive. This means + that extracting an item does not need to know the format of the + original archive. + + Some benchmarks show that on ext3 it's actually faster to extract + files from the hash than directly via the directory. This was + testing with ~200,000 files, whereupon directory lookups become + slow. I'd imagine ResierFS or similar to be faster. + +* Added an XRLE encoding for ZTR. This is similar to the existing RLE + mechanism but it copes with run length encoding of items larger than + a single byte. It's current use is for storing the 4-base repeating + flow order in 454 data. + + +Version 1.8.12 +-------------- + +* The ABI format code now reads the confidence values from KB (via + PCON field). + +* New program: trace_dump. Like scf_dump, but deals with generic input + formats. + +* Slightly more sensible average spacing calculation in the ABI + reading code. It's still not perfect, but is only used when the real + spacing value is negative or zero. + +* Disabled the base-reordering fix for ABI files. We believe the bug + causing this no longer exists. + +* Expriment file format: added FT (EMBL feature table) and LF + (LiGation; a combination of LI and LE) records. + +* Experiment files: strip out digits from the sequence we read + (for better support of EMBL files). + +* Experiment files: fixed a potential buffer overrun in the conversion + of binary confidence values to ascii values. + +* Minor improvements to portability (INT_MAX vs MAXINT2) and removal + of some compilation warnings. + +* Extract_seq now accepts a -fofn argument. + +* New functions: read_update_base_positions() and + read_udpate_confidence_values() to replace read_update_opos(). + These apply an edit buffer to the sequence details and are used (for + example) within Trev for saving edits back to a trace file. + +* Better error handling in fcompress_file(). + +* New specifiers in RAWDATA. Added a generic URL format (eg + "URL=http://some/where/trace=%s") implemented via use of wget. There + is also an ARC= format to make use of the Sanger Trace Archive, + although currently this will not work externally. + +* Zero memory used in read_alloc(). Fixes to read_dup(). + + +Version 1.8.11 +-------------- + +* Rewrote the background subtraction in convert_trace to deal with each + channel independently. + +* Make install now install the include files (all of them, although not all + are strictly required) in $prefix/include/io_lib/. + +* Moved the ABI filter wheel order (FWO) reading from outside the sample + reading code into the general reading bit as this is needed for reading the + comments too (it also applies to the order of the signal strengths). Hence + when the READ_COMMENTS section only is defined it now works correctly. + +* Moved the DataCount #defines into static values and added a + abi_set_data_counts function to change these. This allows reading of the raw + data from ABI files. This is used within the new convert_trace -abi_data + option. + +* Removed a one-byte write buffer overflow in the CTF writing code. + +* New Experiment file records WL and WR for indicating clip points within a WT + trace. + +* Removed the saved copy of fp for exp_fread_info in 'e' structure as it + doesn't belong to us. (If we do store it there then the exp_destroy_info + function will free it and this causes bugs.). POTENTIAL INCOMPATIBILITY: + if you assumed that exp_destroy_info closed the files that you opened and + passed into exp_fread_info, then this is no longer true. + +* New function read_dup() to copy a Read structure. + +* get_read_conf() now deals with loading confidence values from any suitable + format and not just SCF. + +* Fixed memory leak in ztr (ztr->text_segments). + + +Version 1.8.10 +-------------- + +* Added Steven Leonard's changes to index_tar. It no longer adds index entries + for directories, unless -d is specified. It also now supports longer names + using the @LongLink tar extension. + +* Fixed a bug in exp2read where the base positions were random if experiment + files are loaded without referencing a trace and without having ON lines. + +* New program get_comment. This queries and extracts text fields held within + the Read 'info' section + +* Overhaul of convert_trace to support the makeSCF options (normalise etc). + + +Version 1.8.9 +------------- + +Sorry this isn't a proper changes-by-source listing. Any suggestions for how I +collate the 'cvs log' output into something more concise? The below text is +simply a list of changes, but more complete than in the NEWS file. + +* ZTR spec updated to v1.2. The chebyshev predictor has been rewritten in + integer format. The old chebyshev still has a format type allocated to it + (73), but the new ICHEB format (74) is now the default. The old floating + point method was potentially unstable (eg when running on non IEEE fp + systems). The new method also seems to save a bit more space. + +* The docs and code disagreed for CNF4 storage. Changed the docs to reflect + the code (which does as intended). + +* ZTR speed increase. Follow1 is substantially faster, increasing write + times by about 10%. + +* New named formats types. ZTR1, ZTR2 and ZTR3. ZTR defaults to ZTR2, but we + can explicitly ask for another compression level if desired. Also explicit + statement of format (TT_ZTR instead of TT_ANY) removes the need for + a rewind() call and so ZTR can now work through a pipe. + +* General tidy up to remove a few compilation warnings (missing include files, + signed vs unsigned issues, etc). + +* Initial support is included for BioLIMS integration, but this is not + complete. (Unfortunately it requires access to a non-public library.) + +* New function compress_str2int - opposite of existing compress_int2str. + +* (Steven Leonard). Uses zlib for gzip compression and decompression. + + + + + +These are extracts from the full Staden Package change log. They may not be +immediately obvious when taken out of context, but we feel this information +may still be useful to the users of io_lib. + +23rd August 2000, James +----------------------- +1. Removed find_trace_file and added an open_trace_file function. +The idea is that searching for a files existance is better done by attempting +to open it. This in turn allows for more possibilities of file searching. + Makefile + utils/open_trace_file.c + read/Read.c + read/scf_extras.c + read/translate.[ch] + progs/extract_seq.c + +2. Added a TAR option to RAWDATA. We can now read trace files directly from +tar files (although they cannot be written to directly). + utils/open_trace_file.c + utils/tar_format.h + +3. Created an index_tar program to optimise tar reading, although it is not +mandatory. + progs/index_tar.c + progs/Makefile + +4. Fixed a bug when dealing with plain text files containing spaces. + plain/seqIOPlain.c + + +31st July 2000, James +--------------------- +1. Renamed TTFF to be ZTR. + read/Read.[ch] + utils/traceType.c + utils/compress.c + ttff/* -> ztr/* + README + +2. ZTR reading will now stop when it spots a ZTR magic number. This allows +concatenation of ZTR files. + ztr/ztr.[ch] + + +15th June 2000, James +--------------------- +1. Added a TTFF_FOLLOW filter type to TTFF. This is enabled with compression +level 2 for the chromatogram data. + io_lib/ttff/ttff.[ch] + io_lib/ttff/compression.[ch] + +9th June 2000, James +-------------------- +* RELEASED 1.8.4 */ + +1. Added zlib bits to windows compilation. + io_lib/mk/windows.mk + +2. Updated convert_trace. It can now reduce sample-size to 8-bit (with the +"-8" option) and the formats may now be specified as either integer or text +format. The text format is case insensitive. + io_lib/progs/convert_trace.c + io_lib/utils/traceType.c + +3. More windows binary vs ascii fixes. When reading we switch to binary mode +before attempting fdetermine_trace_type, otherwise it fails to auto-detect +TTFF (which includes a newline as part of the magic number). Also added a +_setmode() call to the fwrite_reading code too. + io_lib/read/Read.c + +4. Changed the default compression technique of TTFF to that used in 1.8.2. I +accidently left it set to the experimental dynamic-delta method in 1.8.3, +which currently doesn't have the uncompression function! Also removed lots of +debugging output. + io_lib/ttff/ttff.c + io_lib/ttff/ttff_translate.c + +5. Bug fix to exp2read - when no right hand quality cutoff is specified we +were defaulting to the left end of the trace, instead of the right end. (This +only happens when opening experiment files which do not have clip points.) + io_lib/read/translate.c + +6. Changed the strftime() format in ABI reading code to use %H:%M:%S instead +of %T, as %T doesn't appear to be part of ANSI (I think it's probably +XPG4-UNIX). It worked on Unix machines, but not on MS Windows. + io_lib/abi/seqIOABI.c + + +8th June 2000, James +-------------------- +* RELEASED 1.8.3 */ + +1. Updated the CTF support so that it includes a couple of new block +types. This allows for base positions being non-sequentially ordered, as is +possible in severe compressions. + io_lib/ctf/ctfCompress.c + +2. Overhaul of TTFF format - now more PNG based in style. Still highly +experimental. + io_lib/ttff/* + + +16th May 2000, James +-------------------- +* RELEASED 1.8.0 */ + +1. Added szip support. Szip generally gives better compression ratios than +gzip and often marginally better than bzip2, but is generally considerably +slower at decompression. + io_lib/utils/compress.[ch] + +2. Merged in Jean Thierre-Mieg's CTF code. This is a compressed trace format +which holds the same data as SCF, but in reduce space. + io_lib/read/Read.[ch] + io_lib/utils/traceType.c + io_lib/ctf/* + +3. Added my own highly experimental TTFF format. (Thanks to Jean Thierre-Mieg +for re-awakining my interest in this.) TTFF files are typically equivalent in +size to bzip2'ed SCF files, but are much quicker to write than any of the +currently supported compressed formats. Depends on zlib. + io_lib/read/Read.[ch] + io_lib/utils/traceType.c + io_lib/ttff/* + +4. Reorganised the Makefiles for easier building. + */Makefile + +5. New program "convert_trace". Primarily a test tool at present as it needs +a friendlier interface. + progs/convert_trace.c + + +20th April 2000, James +---------------------- +1. Removed a file-descriptor leak in extract_seq. + io_lib/progs/extract_seq.c + +22nd March 2000, James +---------------------- +1. Fixed bug in time formatting from ABI files. We used strftime code +%a without setting tm.tm_wday (number of days since sunday). It's not +easy to work that out, so we convert from struct tm to time_t, which +resets any errornous elements of struct tm. Also fixed a silly error +where the end time was set to the start time (incorrectly). + io_lib/abi/seqIOABI.c + +25th February 2000, James +------------------------- +2. Added checks for QR <= QL in the exp2read conversion function. This caused +trev to display incorrectly (blanking incorrect screen portions) when dealing +with inconsistent experiment files. Also changed qclip so that it doesn't +create this inconsistent case. + io_lib/read/translate.c + +1st February 2000, Kathryn +-------------------------- +1. Fixed bug which caused init_exp to crash when QL was more than 5 digits. +Increased it to handle 15 digits. + io_lib/read/translate.c + +27th January 2000, James +------------------------ +1. Moved Gap4's copy of scf_extras into io_lib, and renamed io_liub's +scf_bits to be scf_extras (to avoid editing too many #include statements). +Without this we were getting errors due to dynamic linking using odd +copies. Eg loading libread.so and then libgap.so meant that +find_trace_file called from edUtils2.c (libgap.so) would pick up the first +copy from libread.so, despite the fact that there's also a copy in the +same libgap.so. + gap4/scf_extras.[ch] + io_lib/scf_bits.[ch] + +25th January 2000, Kathryn +-------------------------- +1. Fixed crash in qclip due to insufficent arguments being passed to +find_trace_file and also fixed an array bounds error in scan_right of qclip.c + io_lib/read/scf_bits.c + +19th January 2000, James +------------------------ +4. Copied bits of the fakii and cap2/3 scf/expFile reading code into +io_lib. Not all of this is in there, just the things which seem to be +common and sensibly fit there. This also helps qclip to build on Windows. +FIXME: We should now remove some of this code from Gap4. +Also fixed a small memory leak in fopen_compressed() - it wasn't freeing +the result of tempnam(). + io_lib/read/translate.c + io_lib/read/scf_bits.[ch] + io_lib/read/seqInfo.[ch] + io_lib/utils/files.c + io_lib/utils/compress.c + +31st August 1999, James +----------------------- +1. -fasta_out mode of extract_seq now changes - to N. + io_lib/progs/extract_seq.c + +27th August 1999, James +----------------------- +1. The order of information items added by the abi to scf code has +changed, to make it more sensible. Also fixed a bug in the textual (rather +than numerical) date output, and wrote this to the DATE field. + io_lib/abi/seqIOABI.c + +2. makeSCF no longer adds a MACH field, as this was redudant. + io_lib/abi/makeSCF.c + +3. Extract_seq now has proper use of CL and CR when using -cosmid_only. It +was assuming they were the same as QL/QR and SL/SR, which is not the case +(rather it's like having a CS line of `CL`..`CR`). Extract_seq also now +has a -fasta_out format option and can handle multiple files, which makes +it easier to produce a fasta file from multiple experiment files. + io_lib/progs/extract_seq.c + +4th August 1999, James +---------------------- +1. The exp2read() function in io_lib now initialises the confidence arrays +(eg r->prob_A) to zero, or to the experiment file AV line. + io_lib/read/translate.c + +2nd June 1999, James +-------------------- +1. The MegaBACE sequencer creates ABI files. However it does so in a odd way. +Sometimes the samples arrays are truncated such that bases are positioned +above samples which are not stored in the ABI file. We now realloc the samples +array in such cases and fill out the remainder with blank data. This removes a +crash in trev when viewing such data. + io_lib/abi/seqIOABI.c + +2. Fixed a memory corruption of io-lib compression. The switch to use tempnam +(for Windows) implies that the filename returned is no longer allocated by us. +Unfortunately we forgot to remove the xfree(fname) calls. + src/io_lib/utils/compress.c + +18th May 1999, James +-------------------- +1. Fixed the trace rescaling option of makeSCF. We now go through the rescale +function twice. Once to work out the maximum value, and again to do the +rescaling. This fixes a bug where the maximum value after rescaling was +sometimes above 65536 and hence cause "trace wraparound" effects. + io_lib/progs/makeSCF.c + +26th April 1999, JohnT +---------------------- +1. Allow : to be entered in RAW_DATA by using :: + Misc/find.c + io_lib/utils/find.c + +2. Support for fetching trace files using Corba + Modified: + Misc/find.c + mk/misc.mk + io_lib/utils/find.c + init_exp/init_exp.c + io_lib/read/Makefile + io_lib/utils/find.c + io_lib/utils/compress.c + io_lib/utils/Makefile + mk/global.mk + Added: + io_lib/utils/corba.cpp + io_lib/utils/stcorba.h + Generated from IDL: + io_lib/utils/trace.h + io_lib/utils/trace.cpp + io_lib/utils/basicServer.h + io_lib/utils/basicServer.cpp + + +3. Added ABI utility progs to NT port + mk/abi.mk + +4. Added Windows 95 support + io_lib/utils/compress.c + mk/WINNT.mk + +5th March 1999, JohnT +--------------------- +Various changes for WINNT support as follows: +io_lib/utils - Don't redirect to /dev/null on WINNT + +3rd February 1999, James +------------------------ +1. Fixed problems reported by Insure on Windows NT. +These are mainly lack of prototypes (malloc/memcpy) and not returning properly +from 'int' functions. However one fix to seqed_translate.c (find_line_start3) +was a array read overflow. + io_lib/progs/makeSCF.c + +18th January 1999, James +------------------------ +1. Changed the read2exp io_lib translation function so that it can accept +lowercase a,c,g,t. Oddly enough it was already coded to accept lowercase IUB +codes, but we missed out a,c,g and t! + io_lib/read/translate.c + +15th January 1999, JohnT +----------------------- +Modified files thoughout for Windows NT Compatibility as follows: + +8. need to explicitly set text or binary file mode under WINNT + io_lib/exp_file/expFileIO.c + +18. need to include stddef.h for size_t with Visual C++ + io_lib/utils/array.h + +19. need to have target LIBS (not LIB) and correct ordering for correct make + on WINNT. Also need additional abstractions to allow for different compile + and link calling conventions with Visual C++, and have rules for building + Windows .def files. + io_lib/abi/Makefile + io_lib/alf/Makefile + io_lib/exp_file/Makefile + io_lib/plain/Makefile + io_lib/progs/Makefile + io_lib/read/Makefile + io_lib/scf/Makefile + io_lib/utils/Makefile + +18th December 1998, James +------------------------- +1. Added bzip2 recognition to the (de)compression code of io_lib. This is now +the latest bzip, and is recognised by phred (unlike bzip version 1). Bzip2 is +approx the same as bzip1, but more or less twice as fast for decompression. + io_lib/utils/compress.c + +27th November 1998, James +------------------------- +1. Fixed the trace file searching mechanism in io_lib. When loading an +experiment file with LN/LT lines, we now first search for the trace file +relative to the location of the experiment file. + io_lib/read/Read.c + io_lib/read/translate.[ch] + +16th November 1998, James +------------------------- +4. Added NT (NoTe) and GD (Gap4 Database) line types to the experiment file. + io_lib/exp_file/expFile.[ch] + +24th September 1998, James +-------------------------- +1. The scf reading and writing code now handles traces with zero bases. +Previously this failed after a malloc(0). + io_lib/scf/read_scf.c + io_lib/scf/write_scf.c + +2. The ABI file reading code has been tidied up. It now also supports +conversion of more ABI fields, including RUND, RUNT, SPAC(2), CMNT, LANE and +MTXF. + io_lib/abi/seqIOABI.c + +17th July 1998, James +--------------------- +1. Extract_seq now copes with sequences containing no SQ line (instead of just +SEGV). + io_lib/progs/extract_seq + +9th July 1998, James +-------------------- +1. Enforce IUBC code set in io_lib when converting from trace (any format) to +experiment file. We leave the IUBC 'N' intact. + io_lib/read/translate.c + +28th May 1998, James +-------------------- +1. Added a read_sections() function to io_lib so that programs can state +which bits of a trace file they are interested in. The loading code only +then parses those bits. This can give big increases to things like init_exp +which only wants bases and does not care about the delta-delta format of SCF +trace data. + io_lib/read/Read.h + io_lib/read/translate.c + io_lib/scf/scf.h + io_lib/scf/read_scf.c + io_lib/abi/seqIOABI.c + io_lib/alf/seqIOALF.c + init_exp/init_exp.c + +3. Extract GELN (gel name) from ABI file when converting to SCF. + io_lib/abi/seqIOABI.[ch] + +2. Improved the makeSCF -normalise option. Background subtraction is now +cleaner (and simpler) and it also now scales the heights. Moved it to io_lib +as it's now freely available. + io_lib/progs/makeSCF.c + +23rd March 1998, James +---------------------- +1. Removed the change made on 7th May 1997 to seqIOPlain.c. This code is used +by extract_seq, and so clipping in seqIOPlain causes double clipping (and +hence wrong sections). + io_lib/plain/seqIOPlain.c + +11th March 1998, James +---------------------- +2. Removed the requirement of EXP_FILE_LINE_LENGTH in exp_fread_info(). +This allows for (eg) tags with very long comments to be read in without +being truncated. + io_lib/exp_file/expFileIO.c + +4th March 1998, James +--------------------- +1. Following advice from Leif Hansson <leif.hansson@mbox4.swipnet.se>, the ALF +reading code now reads the "Raw data" subfile when the "Processed data" +subfile is not present, as "Processed data" is apparently an optional output +of the pharmacia software. Raw data is in the same format, although I do not +know what processing takes place to convert it to Processed data. (Looking at +some real traces, apparently none!) + io_lib/alf/seqIOALF.c + +24th February 1998, James +------------------------- +1. Added an ABI in MacBinary format file type detector so that these are +now autodetected. + io_lib/utils/traceType.c + +15th January 1998, James +------------------------ +1. Rewrote the delta_samples1/2 functions to be faster. Times vary between 0.55 +and 0.7 fractions of the original time. + io_lib/scf/misc_scf.c + +4th December 1997, James +------------------------ +1. First post-release bug fix. +Io_lib incorrect sets read->trace_name when reading anything except SCF files. +This means that when outputting to an experiment file no LN line is present. + io_lib/read/Read.c + +1st October 1997, James +----------------------- +1. Allow for SCF files to contain 0 bases. This mainly affects memory +allocation, but also the display widget. + io_lib/scf/read_scf.c + io_lib/utils/read_alloc.c + +28/29th August 1997, James +-------------------------- +2. Added a few changes to make the code more portable for the Mac. Not really +used at present. + Misc/os.h + Misc/files.c + io_lib/utils/traceType.c + io_lib/read/translate.c + io_lib/utils/compress.c + +30th June 1997, James +--------------------- +1. The exp2read function produced invalid rightCutoff values (INT_MAX) when no +QR line is present. It now correctly sets it to 0. + io_lib/read/translate.c +