comparison srf2fastq/io_lib-1.12.2/CHANGES @ 0:d901c9f41a6a default tip

Migrated tool version 1.0.1 from old tool shed archive to new tool shed repository
author dawe
date Tue, 07 Jun 2011 17:48:05 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:d901c9f41a6a
1 Version 1.12.2 (15th Jan 2010)
2 --------------
3
4 * Extra options in srf2fastq: -S to output split regions sequentially
5 to stdout. -r to request a region to be reverse complemented before
6 output.
7
8 * API addition
9 - Added pooled_alloc.h. This is a general purpose mechanism of
10 pooling multiple fixed size memory allocations into fewer malloc()
11 library calls.
12 - HashTables now have a HASH_POOL_ITEMS option to use the above
13 pooling system. This reduces memory wasted and speeds them up.
14
15 * Bug fix: Fixed ztr_add_text() so that is leaves two nul bytes on the
16 end of TEXT chunks instead of one, as documented in the ZTR
17 specification.
18
19 * Bug fix: Fixed buffer overrun in parse region chunks; srf2fastq and
20 srf2fasta.
21
22 * Bug fix: API read_sff_read_data() did not skip ahead to the next
23 8-byte boundary.
24
25
26 Version 1.12.1 (7th August 2009)
27 --------------
28
29 * Fixed the endianness detection in io_lib/os.h when used in
30 conjuction with auto-conf. This fix allows for "fat" binaries to be
31 built on MacOS X.
32
33 * Fixed io_lib-config program to use -lstaden-read instead of -lread.
34
35
36 Version 1.12.0 (29th July 2009)
37 --------------
38
39 * Renamed the library from libread.so to libstaden-read.so. This was
40 already the case for the Fedora bundled RPM.
41
42 * Switched to using libtool to allow building of dynamic libraries.
43 Note that this is tweaked to not use -rpath though. Proper library
44 versioning has been added too.
45
46 * Removed deprecated platform specific tools: illumina2srf,
47 srf2illumina.
48
49 * Srf_info now reports the compressed size of chunks, sorted by type,
50 in addition to their counts. It also correctly sums to over 2Gb now
51 for base-call counting.
52
53 * Various SRF tools have had the maximum sequence length changed from
54 1024 to 10000. This allows for even the most gifting capillary traces.
55
56 * API
57 - The Array functions now take size_t instead of int for the
58 array dimensions. (API CHANGE)
59
60 - Removed the (unused?) pipe2 function from compress.h. This was
61 intended to be internal only, and it now clashes with a new linux
62 kernel function. (API CHANGE)
63
64 - Added iterators to the HashTable* api.
65
66 * Bug fixes
67
68 - Fixed a memory allocation bug in the codes2codeset() function.
69
70 - ztr2read() should now work better on ZTR structs with no BPOS
71 chunk.
72
73 - Fixed various srf tools when facing an SRF file containing zero
74 chunks in the data block header.
75
76 - index_tar handles some GNU tar extensions better (LongLink).
77
78
79 Version 1.11.6.1 (9th December 2008)
80 ----------------
81
82 * Identical except removal of a debugging printf statement in solexa2srf.
83
84
85 Version 1.11.6 (9th December 2008)
86 --------------
87
88 * illumina2srf, srf2illumina, srf2fastq
89 - We no longer change from log-odds to phred when storing data in
90 SRF, instead preferring to just mark it in correct input
91 scale. srf2fastq now honours this scale information and so the
92 conversion from log-odd to phred is done at the export stage
93 instead. (Chris Saunders)
94
95 - Bug fix to srf2illumina qcal conversion. Combined with above
96 changes the qcal output should now be 100% identical to the
97 original data input via illumina2srf.
98
99 * API
100 - New function srf_next_ztr_flags. This is like srf_next_ztr but
101 also returns the SRF flags value (good/bad read, etc).
102
103 * srf_filter, srf2fastq, srf_info (Steven Leonard)
104 - Improved support for multiple index blocks in SRF files, eg from
105 manually concatenated files.
106
107 - srf2fastq now sports options for splitting the output into
108 multiple fastq files when the input data is a paired-end run.
109
110
111 Version 1.11.5 (3rd December 2008)
112 --------------
113
114 * Illumina2srf
115 - Fixed major bug with using *both* -qf and -qr together. The
116 quality values for the reverse strand were shifted by one
117 character.
118
119 - Fixed qcal quality values so they're not shifted down by 64
120 (illumina format fastq).
121
122 - Fixed bugs in parsing directory names if not matching the expected
123 format.
124
125 * Removed major memory leaks from srf_filter.
126
127 * hash_sff now has support for outputting the table of contents to a
128 new file rather than appending to an existing sff file or copying
129 the entire contents to a new file.
130
131 * Various man pages have been added. The list is still incomplete
132 though. Additions are most welcome.
133
134 * New program: srf_list. This lists and/or counts the number of
135 sequences within an SRF file.
136
137
138 Version 1.11.4 (11th September 2008)
139 --------------
140
141 * New "make check" build target to perform some automated tested.
142 Currently limited to testing the SRF tools.
143
144 * Fixed machine endianness issues. Specifically this resolves known Intel
145 MacOS-X problems.
146
147 * New SRF tools
148
149 - srf_info: reports simple metrics on the contents of an SRF file.
150
151 - srf_filter: slices and dices the SRF file to produce a new one
152 with various types of data removed.
153
154 * illumina2srf
155
156 - Minor float/int rounding change when storing int/nse/sig2 data.
157
158 - Improved error detection such that it returns a failure code more
159 often given a parsing issue.
160
161 - Added -pf/pr parameters for storing Phasing files.
162
163 - Reduced memory usage, especially on large numbers of clusters per
164 tile. We may now produce multiple DBH blocks per tile. Also major
165 reduction to memory when handling the .params files.
166
167 - Added storage of 2nd .params file (firecrest).
168
169 - Fixed bug in the automatic base-call version identification.
170
171 - Fixed a bug with using -qf/qr when not providing all tiles (ie not
172 starting from tile number 1).
173
174 - Bug fix with storing the reverse matrix file in paired-end runs; a
175 duplicate of the forward one was being used instead.
176
177 * General SRF
178
179 - Improved error checking in srf_index_hash. It now spots duplicate
180 reads and also has a -c option to check an existing SRF file
181 without writing the index.
182
183 - Fixed a memory leak in srf_next_ztr(), triggered in srf2fastq -C.
184
185 Version 1.11.3 (9th July 2008)
186 --------------
187
188 * illumina2srf change:
189
190 - IMPORTANT bug fix to illumina2srf when using the "-r" flag to
191 store raw (.int and .nse) data. This could often result in
192 corrupting the data ZTR meta-data for the SMP4 chunks resulting in
193 confusion over which trace channels are raw and which are
194 processed.
195
196 Fortunately the corruption is reversable. For more details and a
197 fix see the ssrformat announcement of the issue:
198
199 http://www.bcgsc.ca/pipermail/ssrformat/2008-July/000531.html
200
201 * General SRF changes:
202
203 - Removed a memory leak in ztr_find_chunks().
204
205 - Added SRFB_NULL_INDEX as an SRF block type. This provides a more
206 transparent way to skip over the 8 zero value bytes that may exist
207 at the end of an SRF file missing an index block.
208
209 * Other changes
210
211 - Fixed a bug in extract_seq when operating on multiple files and
212 outputting to a file rather than a pipe. An erroneous seek in the
213 mFILE code lead to it repeatedly truncating the output, resulting
214 in one sequence file at the end instead of multiple files.
215
216
217 Version 1.11.2 (4th June 2008)
218 --------------
219
220 * solexa2srf/srf2solexa changes:
221
222 - Renamed to illumina2srf/srf2illumina.
223
224 - Incorporated support the IPAR format (Come Raczy, Illumina).
225
226 - Added support for qcal format data (Come Raczy).
227
228 - Added -C option to tag data as failing the chastity filter, but it
229 is still included in the SRF output (Camil Toma).
230
231 - Many more additional features added to srf_dump_all provided by
232 Camil Toma. It somewhat overlaps srf2solexa now, but may still
233 have it's own use.
234
235 - Ztr TEXT chunks now output in srf2solexa.
236
237 - Improved ways to specify matrices (-mf/-mr) in solexa2srf.
238
239 - solexa2srf is substantially faster when reading gzipped files.
240
241 - The -N/-n naming scheme options for solexa2srf now default to the
242 same conventions used by GERALD. Added additional %d, %m and %r
243 format rules too.
244
245 - Calibrated confidence values are now output if -qf or -qr
246 paramaters are used, in addition to uncalibrated ones. These are
247 stored in phred scale in a CNF1 ZTR chunk.
248
249 * srf2fastq now has a -c option to output calibrated confidence values
250 (if present). It also supports multiple archives on the command line.
251
252 * SRF fixes:
253
254 - Better handling of full pathnames in solexa2srf.
255
256 - Use binary IO mode; fixes bugs on Windows.
257
258 - Fixed an error where some chunks were not compressed properly
259 (valid still, just not compressed).
260
261 - Removed memory corruption in solexa2srf (in rare cases).
262
263 - Fixed bug with binary formatted read_id suffixes (fixed by
264 Cristian Goina).
265
266 - Initialised memory in hash table code (used in indexing amongst
267 other things).
268
269 - Indexes very occasionally failed to find a trace that did infact
270 exist.
271
272 - Removed memory leak in construct_trace_name (patch from John
273 Emhoff, Helicos).
274
275 - Fixed reading of XML block in srf_read_xml(). From John Emhoff.
276
277
278 * Added SRF= format string to TRACE_PATH to facilitate on-the-fly
279 extraction from indexed SRF files. This means io_lib can now
280 transparently pull traces from an archive or treat it as if it was a
281 directory - eg "foo.srf/IL15_..._123:456".
282
283 * Bug fix (SF-1898427) - now builds on Fedora.
284
285 * Better handling of 64-bit file size sensing in autoconf.
286
287
288 Version 1.11.1 (not officially released - internal testing only)
289 --------------
290
291 Version 1.11.0 (20th February 2008)
292 --------------
293
294 First official release of v1.11.0 and SRF support.
295
296 * Further speed improvements to solexa2srf.
297
298 * Added extract_qual program (analogous to extract_seq).
299
300 * Added new srf2fasta program and also sped up srf2fastq by 25%.
301
302 * Solexa2srf now supports storing the raw .int/.nse trace data instead
303 of or in addition to the processed .sig2 data.
304
305 * Solexa2srf now stores enough to reproduce sufficient firecrest
306 output to rerun the solexa basecaller. Specifically that's a couple
307 matrix files and 'region' data for paired end runs.
308
309 * Minor changes / bug fixes:
310
311 - extract_seq no longer attempts to gzip the output by default if
312 the input was gzipped
313
314 - ztr2read conversion (eg visible in trace_dump) now correctly
315 handles ZTR files with multiple SMP4 chunks.
316
317 - Fixed memory leaks in various bits of SRF code (srf_extract_linear
318 mainly and srf_index_hash).
319
320
321 Version 1.11.0b8 (25th January 2008)
322 ----------------
323
324 (Hopefully final beta test of SRF code before official 1.11.0 release.)
325
326 * Bug fixed the index format. We incorrectly handled null dbhFile and
327 containerFile elements plus incorrectly computing the index size.
328
329 * Improvements for solexa2srf code.
330 - Can store raw vs processed data
331 - Stores matrix and .params contents.
332 - Optional chastity filtering.
333 - Input data may now be gzipped.
334
335 * Minor fixes to output of trace_dump and ztr_dump.
336
337 * Minor srf_index_hash bug fixes (when dealing with concatenated
338 indexed files).
339
340
341 Version 1.11.0b7 (11th January 2008)
342 ----------------
343
344 * IMPORTANT bug fix to the SRF format. The Data Block Header had the
345 blocksize field 4 bytes too large. Now fixed. Old SRF files will not
346 be readable by this new code (as they were in error).
347
348
349 Version 1.11.0b6 (2nd January 2008)
350 ----------------
351
352 * Changes to adhere to SRF v1.3:
353
354 * Removal of the readID counter.
355
356 * Added support for printf style name formatting.
357
358 * Minor index format tweaks (64-bit data, dch/container filenames).
359 Index format is therefore now 1.01.
360
361
362 Version 1.11.0b5 (8th November 2007)
363 ----------------
364
365 * Major reorganisation of directories. All library code is in subdir
366 "io_lib". The code now uses "io_lib/xxx.h" in all include statements
367 too.
368
369 * Fixed memory leaks in ZTR code
370
371 * Various SRF bug fixes and better support for sample OFFS metadata in
372 both ZTR/ZTR.
373
374 * Added srf_extract_hash program to perform random-access on a hash
375 indexed SRF archive.
376
377
378 Version 1.11.0b4 (26th October 2007)
379 ----------------
380
381 * The SRF format now supported adheres to version 1.2.
382
383 * More speedups, in particular focusing on uncompression this time, so
384 srf2solexa is an order of magnitude faster.
385
386 * ztr2read() now honours the read_sections() options and so is much
387 faster when only decoding (say) base and quality values.
388
389 * New program srf2fastq.
390
391 * Internal changes to various ztr data structures. If you use these
392 yourself take note of the new ztr_owns fields to avoid memory leaks.
393
394
395 Version 1.11.0b3 (16th October 2007)
396 ----------------
397
398 * Major speed improvements for compression. solexa2srf is now 30-35x faster.
399
400 * Fixed various buffer overruns and memory leaks reported by valgrind
401 in the new deflate interlaced and SRF code.
402
403
404 Version 1.11.0b2 (2nd October 2007)
405 ----------------
406
407 * Minor version change to fix typoes in Makefile system.
408
409
410 Version 1.11.0b1 (28th September 2007)
411 ----------------
412
413 Beta release 1.
414
415 * Added preliminary SRF support. This consists of a new subdirectory
416 'srf' (yes these all really need merging into a single directory,
417 but that's a later task), a substantial update to ZTR and a variety
418 of SRF tools in progs.
419
420 The old huffman_static.[ch] files were renamed and substantially
421 worked upon to create deflate_interlaced.[ch].
422
423 Added new compression types. xrle2, tshift and qshift. The latter two
424 of these are very specific to trace and quality packings. May need to
425 rename to be more generic.
426
427
428 Version 1.10.3 (???)
429 --------------
430
431 * The HashTable interface now also allows for Bob Jenkins' lookup3
432 64-bit hash function. This allows for substantially larger hash
433 tables.
434
435 * Replaced tempnam() with tmpfile(). On systems without tmpfile
436 (Windows) this is simply a wrapper to use the old tempnam calls.
437
438 * hash_extract bug fix for windows: now operates in binary mode.
439
440 * INCOMPATIBLE CHANGE: On windows we now use semi-colon as the path
441 separator. The reason is that with the MinGW getenv() seems to do
442 "clever things" with PATH variables and consequently ends up
443 corrupting our clumsy attempt of escaping colons in paths.
444
445 * Fasta format is semi-supported in "plain" format. It returns the
446 first entry when reading.
447
448 * Experimental support for static huffman (STHUFF) compression type.
449
450
451 Version 1.10.2 (30th May 2007)
452 --------------
453
454 Primarily this is a bug fix release.
455
456 * Convert_trace now has -signed and -noneg options to control signed
457 vs unsigned issues when shifting trace data about.
458
459 * Include files now have C++ extern "C" style guards around them.
460
461 * Various programs now accept -ztr command line arguments to force ZTR
462 format reading. This is for consistencies sake only and it is
463 recommended that users simply let the programs automatically detect
464 the file formats.
465
466 * Hash_exp now outputs to the same file containing the experiment
467 files (in appended hash-table mode). It also has better Windows
468 handling (stripping ^M and using binary mode).
469
470 * hash_extract bug fix: now only needs at least 1 filename specified
471 when fofn mode is not in use.
472
473 * mFILE emulation: bug fixes when dealing with ftruncate, append mode,
474 checking for read/write flags, new mfcreate_from() function.
475
476 * ZTR: added an experimental ZTR_FORM_STHUFF compression scheme. This
477 uses static huffman encoding on a predefined hard-coded set of
478 huffman tables. The purpose (as yet not put into action) is to allow
479 efficient compression of very small data sets for Illumina, AB
480 SOLiD, etc style traces.
481
482
483 Version 1.10.1 (20th June 2006)
484 --------------
485
486 * Trace files are now opened in read-only mode by default
487 (open_trace_file func).
488
489
490 Version 1.10.0 (15th June 2006)
491 --------------
492
493 * Two new environment variables are used, EXP_PATH and TRACE_PATH, to
494 replace RAWDATA. EXP_PATH is used when the new open_exp_mfile()
495 function is called and TRACE_PATH is used when open_trace_mfile() is
496 called. Both default to using RAWDATA when EXP or TRACE env is now
497 found. Also defined a trace type TT_ANYTR which is analogous to the
498 existing TT_ANY except it will not look for experiment or plain
499 format files.
500
501 Modified the various example programs to use the appropriate open
502 call. This allows for traces and experiment files to have identical
503 names, such as is usually the case when querying named trace objects
504 from a trace server.
505
506 * New program: extract_fastq to generate FASTQ output format.
507
508 * New program: hash_exp. This allows multiple experiment files to be
509 contatenated together and then indexed so io_lib can still treat
510 them as single files.
511
512 * The URL based search path mechanism now by default uses libcurl
513 instead of wget. This makes it considerably faster.
514
515 * If an element in RAWDATA, EXP_PATH or TRACE_PATH now starts with the
516 pipe symbol ("|") then the compressed file extension code is negated
517 for that search element. (This prevents looking for foo.gz, foo.Z,
518 foo.bz2, etc if it fails to find foo.)
519
520 * Added HashTableDel() and HashTableRemove() functions to take items
521 out of a hash table.
522
523 * ZTR's compress_chunk() and uncompress_chunk() functions are now
524 externally callable.
525
526 * New program io_lib-config. This has --version, --cflags and --libs
527 options to query the appropriate configuration when compiling and
528 linking against io_lib. There's also a new io_lib.m4 file which
529 provides an AC_CHECK_IO_LIB autoconf macro to use io_lib-config and
530 generate appropriate Makefile substitutions.
531
532 * Updated the autoconf code to support libcurl searching.
533
534 * Renamed SCF's delta_samples[12] functions to be
535 scf_delta_samples[12]. (From Saul Kravitz)
536
537 * Added a '-error filename' option to convert_trace. (From Saul Kravitz)
538
539 * Bug fix: HashTableAdd() now works properly with non-string keys.
540
541 * Bug fix to read_dup().
542
543 * Bug fix to xrle which could read past the array bounds. It also now
544 handles run-lengths of 256 or more.
545
546 * Bug fix: the fwrite_* functions no longer close the FILE pointer
547 given to them.
548
549 * Bug fix to fdetermine_trace_type(); it now rewinds the file back.
550
551 * Bug fix to mfseek and mrewind; they both now clear the EOF flag.
552
553 * Bug fix to find_file_dir().
554
555
556 Version 1.9.2 (14th December 2005)
557 -------------
558
559 * Added AC_CHECK_LIB calls for the nsl and socket libraries
560 (gethostbyname / socket functions). Needed for Solaris compilations.
561
562 * In extract_seq, used open_trace_mfile instead of
563 open_trace_file. Functionally this is the same, but it is faster.
564
565 * fwrite_reading() now frees the temporary mFILE it created.
566
567 * mfreopen_compressed() no longer closes the original FILE
568 pointer. This brings it back into line with the original
569 functionality provided in 1.8.x. It also cures a bug where the old
570 file pointer was often left opening meaning operates on many files
571 could could cause a resource leak ending in the inability to open
572 more trace files.
573
574 * Added private_data and private_size to the Read struct. Populate
575 these when reading SCF files.
576
577 * Hash_extract now returns an error code to the calling process upon
578 failure.
579
580 * Major overhaul of hash_sff. It no longer loads the entire file into
581 memory. It can now cope with adding a hash index to an archive that
582 already contains an index.
583
584 * Added support for 454's "sorted index" code. NB this is based on the
585 extraction code from their getsff.c code and has not been tested
586 with a genuine indexed SFF file yet.
587
588 * Fixed an uninitialised memory access in mfload().
589
590 * Fixed a bug where hash query searches for items that do not exist
591 and map to an empty bucket could cause hangs or crashes.
592
593 * Fixed a hang in mfload() when reading a zero length file.
594
595
596 Version 1.9.1
597 -------------
598
599 * Implemented the SFF (454) file structure, currently as read-only.
600 This is supported both as an archive containing multiple files and
601 also as a single SFF entry.
602
603 * Allow for SFF=? components in RAWDATA search path.
604
605 * Tar files, SFF archives and hashed archives (eg hashed tar, sff, or
606 "solid" archives) may now be used as part of a pathname. Eg if a
607 tar file foo.tar contains entry xyzzy.ztr then we can ask to fetch
608 trace foo.tar/xyzzy.ztr instead of requiring setting of the
609 RAWDATA environment variable.
610
611 * Changed the HashFile format slightly. It's now format 1.00.
612
613 The key difference is that it has a file footer pointing back to the
614 hashfile header (so the hashfile can be appended to an archive) and
615 it also has an offset in the header to apply to all seeks within the
616 archive itself, so it can be prepending to an archive that's already
617 been indexed without breaking the offsets.
618
619 Extended the hash_tar program to allow control over these header options.
620
621 * Fixed divide-by-zero buf when calling mfread for zero
622
623 * Removed the warning for unknown ZTR chunk types. It now just
624 silently stores them in memory.
625
626 * mfopen now honours binary verses ascii differences (and so updated
627 Read.c calls accordingly) so that Windows works better.
628
629 * Removed file descriptor 'leak' in write_reading().
630
631 * Unset compression_used when opening uncompressed files instead of
632 leaving as the last value.
633
634 * Fixed a file descriptor (and some memory) leak in
635 freopen_compressed. (Bug ID #1289095)
636
637 * Fixed the hash file saving and loading so that it works on all
638 platforms instead of just x86 linux. There were bugs in assuming the
639 size of structures. The assumptions are still there in that I assume
640 they pad the same internally (for ease of coding - we can change it
641 when we finally see a system which operates differently), but the
642 final "boundary" padding has been resolved.
643
644
645 Version 1.9.0
646 -------------
647
648 * ***INCOMPATIBILITIES*** to 1.8.12
649
650 - The Exp_info structure now internally contains an "mFILE *" member
651 instead of "FILE *" member. If you use the experiment file functions
652 for I/O then hopefully it'll still work. However if you directly
653 manipulated the Exp_info yourself using fprintf etc then you will
654 need to modify your code.
655
656 - Some functions no longer have external scope. Most of these did not
657 previously have external function prototypes. If you have a burning
658 need to use one of these, please contact me directly via sourceforge.
659 The full list is:
660
661 ctfType (global variable) ztr_encode_samples_C
662 replace_nl ztr_encode_samples_G
663 ctfDecorrelate ztr_encode_samples_T
664 exp_print_line_ ztr_decode_samples
665 find_file_tar ztr_encode_bases
666 find_file_archive ztr_decode_bases
667 find_file_url ztr_encode_positions
668 ztr_write_header ztr_decode_positions
669 ztr_write_chunk ztr_encode_confidence_1
670 ztr_read_header ztr_decode_confidence_1
671 ztr_read_chunk_hdr ztr_encode_confidence_4
672 compress_chunk ztr_decode_confidence_4
673 uncompress_chunk ztr_encode_text
674 ztr_encode_samples_4 ztr_decode_text
675 ztr_decode_samples_4 ztr_encode_clips
676 ztr_encode_samples_common ztr_decode_clips
677 ztr_encode_samples_A
678
679 - Some external functions have changed prototypes to use mFILE instead
680 of FILE. Most cases of these I've put in place a wrapper function
681 with the old name, but not yet all. Functions changed are:
682
683 ctfFRead write_scf_samples32
684 ctfFWrite write_scf_base
685 exp_print_line write_scf_bases
686 exp_print_mline write_scf_bases3
687 exp_print_seq write_scf_comment
688 read_scf_header fcompress_file
689 read_scf_sample1 fopen_compressed
690 read_scf_samples1 freopen_compressed
691 read_scf_samples31 be_write_int_1
692 read_scf_sample2 be_write_int_2
693 read_scf_samples2 be_write_int_4
694 read_scf_samples32 be_read_int_1
695 read_scf_base be_read_int_2
696 read_scf_bases be_read_int_4
697 read_scf_bases3 le_write_int_1
698 read_scf_comment le_write_int_2
699 write_scf_header le_write_int_4
700 write_scf_sample1 le_read_int_1
701 write_scf_samples1 le_read_int_2
702 write_scf_samples31 le_read_int_4
703 write_scf_samples2 fdetermine_trace_type
704
705 - Removed support for the OLD unix "pack" program as a valid trace
706 compression algorithm.
707
708 - Removed CORBA support. (It wasn't enabled and I've no idea if it
709 even worked as I cannot test it.)
710
711 - The default search order for RAWDATA now has the current working
712 directory at the end of RAWDATA instead of the start.
713
714 * Significant speed ups, particularly when dealing with reading
715 gzipped files or when extracting data from tar files.
716
717 * New external functions for faster access via mFILE (memory-file)
718 structs. These mimic the fread/fwrite calls, but with mfread/mfwrite
719 etc.
720
721 * Numerous minor tweaks and updates to fix compiler warnings on more
722 stricter modes of the Intel C Compiler.
723
724 * Preliminary support for storing pyrosequencing style traces. This
725 has been modeled on the flowgram data from 454, but should be
726 applicable to other platforms. ZTR has been updated to incorporate
727 this too.
728
729 The Read structure also has flow, flow_order, nflows and flow_raw
730 elements too. Code to convert these into the more usual traceA/C/G/T
731 arrays exists currently as part of Trev (in tk_utils in the Staden
732 Package), but this may move into io_lib for the next official release.
733
734 * New hash_tar and hash_extract programs. These replace the index_tar
735 program for rast random access. For RAWDATA include "HASH=hashfile"
736 as an element to get io_lib to use the archive hash. It's possible
737 to create hash files of most archive formats as the hash itself
738 contains the offset and size of each item in the archive. This means
739 that extracting an item does not need to know the format of the
740 original archive.
741
742 Some benchmarks show that on ext3 it's actually faster to extract
743 files from the hash than directly via the directory. This was
744 testing with ~200,000 files, whereupon directory lookups become
745 slow. I'd imagine ResierFS or similar to be faster.
746
747 * Added an XRLE encoding for ZTR. This is similar to the existing RLE
748 mechanism but it copes with run length encoding of items larger than
749 a single byte. It's current use is for storing the 4-base repeating
750 flow order in 454 data.
751
752
753 Version 1.8.12
754 --------------
755
756 * The ABI format code now reads the confidence values from KB (via
757 PCON field).
758
759 * New program: trace_dump. Like scf_dump, but deals with generic input
760 formats.
761
762 * Slightly more sensible average spacing calculation in the ABI
763 reading code. It's still not perfect, but is only used when the real
764 spacing value is negative or zero.
765
766 * Disabled the base-reordering fix for ABI files. We believe the bug
767 causing this no longer exists.
768
769 * Expriment file format: added FT (EMBL feature table) and LF
770 (LiGation; a combination of LI and LE) records.
771
772 * Experiment files: strip out digits from the sequence we read
773 (for better support of EMBL files).
774
775 * Experiment files: fixed a potential buffer overrun in the conversion
776 of binary confidence values to ascii values.
777
778 * Minor improvements to portability (INT_MAX vs MAXINT2) and removal
779 of some compilation warnings.
780
781 * Extract_seq now accepts a -fofn argument.
782
783 * New functions: read_update_base_positions() and
784 read_udpate_confidence_values() to replace read_update_opos().
785 These apply an edit buffer to the sequence details and are used (for
786 example) within Trev for saving edits back to a trace file.
787
788 * Better error handling in fcompress_file().
789
790 * New specifiers in RAWDATA. Added a generic URL format (eg
791 "URL=http://some/where/trace=%s") implemented via use of wget. There
792 is also an ARC= format to make use of the Sanger Trace Archive,
793 although currently this will not work externally.
794
795 * Zero memory used in read_alloc(). Fixes to read_dup().
796
797
798 Version 1.8.11
799 --------------
800
801 * Rewrote the background subtraction in convert_trace to deal with each
802 channel independently.
803
804 * Make install now install the include files (all of them, although not all
805 are strictly required) in $prefix/include/io_lib/.
806
807 * Moved the ABI filter wheel order (FWO) reading from outside the sample
808 reading code into the general reading bit as this is needed for reading the
809 comments too (it also applies to the order of the signal strengths). Hence
810 when the READ_COMMENTS section only is defined it now works correctly.
811
812 * Moved the DataCount #defines into static values and added a
813 abi_set_data_counts function to change these. This allows reading of the raw
814 data from ABI files. This is used within the new convert_trace -abi_data
815 option.
816
817 * Removed a one-byte write buffer overflow in the CTF writing code.
818
819 * New Experiment file records WL and WR for indicating clip points within a WT
820 trace.
821
822 * Removed the saved copy of fp for exp_fread_info in 'e' structure as it
823 doesn't belong to us. (If we do store it there then the exp_destroy_info
824 function will free it and this causes bugs.). POTENTIAL INCOMPATIBILITY:
825 if you assumed that exp_destroy_info closed the files that you opened and
826 passed into exp_fread_info, then this is no longer true.
827
828 * New function read_dup() to copy a Read structure.
829
830 * get_read_conf() now deals with loading confidence values from any suitable
831 format and not just SCF.
832
833 * Fixed memory leak in ztr (ztr->text_segments).
834
835
836 Version 1.8.10
837 --------------
838
839 * Added Steven Leonard's changes to index_tar. It no longer adds index entries
840 for directories, unless -d is specified. It also now supports longer names
841 using the @LongLink tar extension.
842
843 * Fixed a bug in exp2read where the base positions were random if experiment
844 files are loaded without referencing a trace and without having ON lines.
845
846 * New program get_comment. This queries and extracts text fields held within
847 the Read 'info' section
848
849 * Overhaul of convert_trace to support the makeSCF options (normalise etc).
850
851
852 Version 1.8.9
853 -------------
854
855 Sorry this isn't a proper changes-by-source listing. Any suggestions for how I
856 collate the 'cvs log' output into something more concise? The below text is
857 simply a list of changes, but more complete than in the NEWS file.
858
859 * ZTR spec updated to v1.2. The chebyshev predictor has been rewritten in
860 integer format. The old chebyshev still has a format type allocated to it
861 (73), but the new ICHEB format (74) is now the default. The old floating
862 point method was potentially unstable (eg when running on non IEEE fp
863 systems). The new method also seems to save a bit more space.
864
865 * The docs and code disagreed for CNF4 storage. Changed the docs to reflect
866 the code (which does as intended).
867
868 * ZTR speed increase. Follow1 is substantially faster, increasing write
869 times by about 10%.
870
871 * New named formats types. ZTR1, ZTR2 and ZTR3. ZTR defaults to ZTR2, but we
872 can explicitly ask for another compression level if desired. Also explicit
873 statement of format (TT_ZTR instead of TT_ANY) removes the need for
874 a rewind() call and so ZTR can now work through a pipe.
875
876 * General tidy up to remove a few compilation warnings (missing include files,
877 signed vs unsigned issues, etc).
878
879 * Initial support is included for BioLIMS integration, but this is not
880 complete. (Unfortunately it requires access to a non-public library.)
881
882 * New function compress_str2int - opposite of existing compress_int2str.
883
884 * (Steven Leonard). Uses zlib for gzip compression and decompression.
885
886
887
888
889
890 These are extracts from the full Staden Package change log. They may not be
891 immediately obvious when taken out of context, but we feel this information
892 may still be useful to the users of io_lib.
893
894 23rd August 2000, James
895 -----------------------
896 1. Removed find_trace_file and added an open_trace_file function.
897 The idea is that searching for a files existance is better done by attempting
898 to open it. This in turn allows for more possibilities of file searching.
899 Makefile
900 utils/open_trace_file.c
901 read/Read.c
902 read/scf_extras.c
903 read/translate.[ch]
904 progs/extract_seq.c
905
906 2. Added a TAR option to RAWDATA. We can now read trace files directly from
907 tar files (although they cannot be written to directly).
908 utils/open_trace_file.c
909 utils/tar_format.h
910
911 3. Created an index_tar program to optimise tar reading, although it is not
912 mandatory.
913 progs/index_tar.c
914 progs/Makefile
915
916 4. Fixed a bug when dealing with plain text files containing spaces.
917 plain/seqIOPlain.c
918
919
920 31st July 2000, James
921 ---------------------
922 1. Renamed TTFF to be ZTR.
923 read/Read.[ch]
924 utils/traceType.c
925 utils/compress.c
926 ttff/* -> ztr/*
927 README
928
929 2. ZTR reading will now stop when it spots a ZTR magic number. This allows
930 concatenation of ZTR files.
931 ztr/ztr.[ch]
932
933
934 15th June 2000, James
935 ---------------------
936 1. Added a TTFF_FOLLOW filter type to TTFF. This is enabled with compression
937 level 2 for the chromatogram data.
938 io_lib/ttff/ttff.[ch]
939 io_lib/ttff/compression.[ch]
940
941 9th June 2000, James
942 --------------------
943 * RELEASED 1.8.4 */
944
945 1. Added zlib bits to windows compilation.
946 io_lib/mk/windows.mk
947
948 2. Updated convert_trace. It can now reduce sample-size to 8-bit (with the
949 "-8" option) and the formats may now be specified as either integer or text
950 format. The text format is case insensitive.
951 io_lib/progs/convert_trace.c
952 io_lib/utils/traceType.c
953
954 3. More windows binary vs ascii fixes. When reading we switch to binary mode
955 before attempting fdetermine_trace_type, otherwise it fails to auto-detect
956 TTFF (which includes a newline as part of the magic number). Also added a
957 _setmode() call to the fwrite_reading code too.
958 io_lib/read/Read.c
959
960 4. Changed the default compression technique of TTFF to that used in 1.8.2. I
961 accidently left it set to the experimental dynamic-delta method in 1.8.3,
962 which currently doesn't have the uncompression function! Also removed lots of
963 debugging output.
964 io_lib/ttff/ttff.c
965 io_lib/ttff/ttff_translate.c
966
967 5. Bug fix to exp2read - when no right hand quality cutoff is specified we
968 were defaulting to the left end of the trace, instead of the right end. (This
969 only happens when opening experiment files which do not have clip points.)
970 io_lib/read/translate.c
971
972 6. Changed the strftime() format in ABI reading code to use %H:%M:%S instead
973 of %T, as %T doesn't appear to be part of ANSI (I think it's probably
974 XPG4-UNIX). It worked on Unix machines, but not on MS Windows.
975 io_lib/abi/seqIOABI.c
976
977
978 8th June 2000, James
979 --------------------
980 * RELEASED 1.8.3 */
981
982 1. Updated the CTF support so that it includes a couple of new block
983 types. This allows for base positions being non-sequentially ordered, as is
984 possible in severe compressions.
985 io_lib/ctf/ctfCompress.c
986
987 2. Overhaul of TTFF format - now more PNG based in style. Still highly
988 experimental.
989 io_lib/ttff/*
990
991
992 16th May 2000, James
993 --------------------
994 * RELEASED 1.8.0 */
995
996 1. Added szip support. Szip generally gives better compression ratios than
997 gzip and often marginally better than bzip2, but is generally considerably
998 slower at decompression.
999 io_lib/utils/compress.[ch]
1000
1001 2. Merged in Jean Thierre-Mieg's CTF code. This is a compressed trace format
1002 which holds the same data as SCF, but in reduce space.
1003 io_lib/read/Read.[ch]
1004 io_lib/utils/traceType.c
1005 io_lib/ctf/*
1006
1007 3. Added my own highly experimental TTFF format. (Thanks to Jean Thierre-Mieg
1008 for re-awakining my interest in this.) TTFF files are typically equivalent in
1009 size to bzip2'ed SCF files, but are much quicker to write than any of the
1010 currently supported compressed formats. Depends on zlib.
1011 io_lib/read/Read.[ch]
1012 io_lib/utils/traceType.c
1013 io_lib/ttff/*
1014
1015 4. Reorganised the Makefiles for easier building.
1016 */Makefile
1017
1018 5. New program "convert_trace". Primarily a test tool at present as it needs
1019 a friendlier interface.
1020 progs/convert_trace.c
1021
1022
1023 20th April 2000, James
1024 ----------------------
1025 1. Removed a file-descriptor leak in extract_seq.
1026 io_lib/progs/extract_seq.c
1027
1028 22nd March 2000, James
1029 ----------------------
1030 1. Fixed bug in time formatting from ABI files. We used strftime code
1031 %a without setting tm.tm_wday (number of days since sunday). It's not
1032 easy to work that out, so we convert from struct tm to time_t, which
1033 resets any errornous elements of struct tm. Also fixed a silly error
1034 where the end time was set to the start time (incorrectly).
1035 io_lib/abi/seqIOABI.c
1036
1037 25th February 2000, James
1038 -------------------------
1039 2. Added checks for QR <= QL in the exp2read conversion function. This caused
1040 trev to display incorrectly (blanking incorrect screen portions) when dealing
1041 with inconsistent experiment files. Also changed qclip so that it doesn't
1042 create this inconsistent case.
1043 io_lib/read/translate.c
1044
1045 1st February 2000, Kathryn
1046 --------------------------
1047 1. Fixed bug which caused init_exp to crash when QL was more than 5 digits.
1048 Increased it to handle 15 digits.
1049 io_lib/read/translate.c
1050
1051 27th January 2000, James
1052 ------------------------
1053 1. Moved Gap4's copy of scf_extras into io_lib, and renamed io_liub's
1054 scf_bits to be scf_extras (to avoid editing too many #include statements).
1055 Without this we were getting errors due to dynamic linking using odd
1056 copies. Eg loading libread.so and then libgap.so meant that
1057 find_trace_file called from edUtils2.c (libgap.so) would pick up the first
1058 copy from libread.so, despite the fact that there's also a copy in the
1059 same libgap.so.
1060 gap4/scf_extras.[ch]
1061 io_lib/scf_bits.[ch]
1062
1063 25th January 2000, Kathryn
1064 --------------------------
1065 1. Fixed crash in qclip due to insufficent arguments being passed to
1066 find_trace_file and also fixed an array bounds error in scan_right of qclip.c
1067 io_lib/read/scf_bits.c
1068
1069 19th January 2000, James
1070 ------------------------
1071 4. Copied bits of the fakii and cap2/3 scf/expFile reading code into
1072 io_lib. Not all of this is in there, just the things which seem to be
1073 common and sensibly fit there. This also helps qclip to build on Windows.
1074 FIXME: We should now remove some of this code from Gap4.
1075 Also fixed a small memory leak in fopen_compressed() - it wasn't freeing
1076 the result of tempnam().
1077 io_lib/read/translate.c
1078 io_lib/read/scf_bits.[ch]
1079 io_lib/read/seqInfo.[ch]
1080 io_lib/utils/files.c
1081 io_lib/utils/compress.c
1082
1083 31st August 1999, James
1084 -----------------------
1085 1. -fasta_out mode of extract_seq now changes - to N.
1086 io_lib/progs/extract_seq.c
1087
1088 27th August 1999, James
1089 -----------------------
1090 1. The order of information items added by the abi to scf code has
1091 changed, to make it more sensible. Also fixed a bug in the textual (rather
1092 than numerical) date output, and wrote this to the DATE field.
1093 io_lib/abi/seqIOABI.c
1094
1095 2. makeSCF no longer adds a MACH field, as this was redudant.
1096 io_lib/abi/makeSCF.c
1097
1098 3. Extract_seq now has proper use of CL and CR when using -cosmid_only. It
1099 was assuming they were the same as QL/QR and SL/SR, which is not the case
1100 (rather it's like having a CS line of `CL`..`CR`). Extract_seq also now
1101 has a -fasta_out format option and can handle multiple files, which makes
1102 it easier to produce a fasta file from multiple experiment files.
1103 io_lib/progs/extract_seq.c
1104
1105 4th August 1999, James
1106 ----------------------
1107 1. The exp2read() function in io_lib now initialises the confidence arrays
1108 (eg r->prob_A) to zero, or to the experiment file AV line.
1109 io_lib/read/translate.c
1110
1111 2nd June 1999, James
1112 --------------------
1113 1. The MegaBACE sequencer creates ABI files. However it does so in a odd way.
1114 Sometimes the samples arrays are truncated such that bases are positioned
1115 above samples which are not stored in the ABI file. We now realloc the samples
1116 array in such cases and fill out the remainder with blank data. This removes a
1117 crash in trev when viewing such data.
1118 io_lib/abi/seqIOABI.c
1119
1120 2. Fixed a memory corruption of io-lib compression. The switch to use tempnam
1121 (for Windows) implies that the filename returned is no longer allocated by us.
1122 Unfortunately we forgot to remove the xfree(fname) calls.
1123 src/io_lib/utils/compress.c
1124
1125 18th May 1999, James
1126 --------------------
1127 1. Fixed the trace rescaling option of makeSCF. We now go through the rescale
1128 function twice. Once to work out the maximum value, and again to do the
1129 rescaling. This fixes a bug where the maximum value after rescaling was
1130 sometimes above 65536 and hence cause "trace wraparound" effects.
1131 io_lib/progs/makeSCF.c
1132
1133 26th April 1999, JohnT
1134 ----------------------
1135 1. Allow : to be entered in RAW_DATA by using ::
1136 Misc/find.c
1137 io_lib/utils/find.c
1138
1139 2. Support for fetching trace files using Corba
1140 Modified:
1141 Misc/find.c
1142 mk/misc.mk
1143 io_lib/utils/find.c
1144 init_exp/init_exp.c
1145 io_lib/read/Makefile
1146 io_lib/utils/find.c
1147 io_lib/utils/compress.c
1148 io_lib/utils/Makefile
1149 mk/global.mk
1150 Added:
1151 io_lib/utils/corba.cpp
1152 io_lib/utils/stcorba.h
1153 Generated from IDL:
1154 io_lib/utils/trace.h
1155 io_lib/utils/trace.cpp
1156 io_lib/utils/basicServer.h
1157 io_lib/utils/basicServer.cpp
1158
1159
1160 3. Added ABI utility progs to NT port
1161 mk/abi.mk
1162
1163 4. Added Windows 95 support
1164 io_lib/utils/compress.c
1165 mk/WINNT.mk
1166
1167 5th March 1999, JohnT
1168 ---------------------
1169 Various changes for WINNT support as follows:
1170 io_lib/utils - Don't redirect to /dev/null on WINNT
1171
1172 3rd February 1999, James
1173 ------------------------
1174 1. Fixed problems reported by Insure on Windows NT.
1175 These are mainly lack of prototypes (malloc/memcpy) and not returning properly
1176 from 'int' functions. However one fix to seqed_translate.c (find_line_start3)
1177 was a array read overflow.
1178 io_lib/progs/makeSCF.c
1179
1180 18th January 1999, James
1181 ------------------------
1182 1. Changed the read2exp io_lib translation function so that it can accept
1183 lowercase a,c,g,t. Oddly enough it was already coded to accept lowercase IUB
1184 codes, but we missed out a,c,g and t!
1185 io_lib/read/translate.c
1186
1187 15th January 1999, JohnT
1188 -----------------------
1189 Modified files thoughout for Windows NT Compatibility as follows:
1190
1191 8. need to explicitly set text or binary file mode under WINNT
1192 io_lib/exp_file/expFileIO.c
1193
1194 18. need to include stddef.h for size_t with Visual C++
1195 io_lib/utils/array.h
1196
1197 19. need to have target LIBS (not LIB) and correct ordering for correct make
1198 on WINNT. Also need additional abstractions to allow for different compile
1199 and link calling conventions with Visual C++, and have rules for building
1200 Windows .def files.
1201 io_lib/abi/Makefile
1202 io_lib/alf/Makefile
1203 io_lib/exp_file/Makefile
1204 io_lib/plain/Makefile
1205 io_lib/progs/Makefile
1206 io_lib/read/Makefile
1207 io_lib/scf/Makefile
1208 io_lib/utils/Makefile
1209
1210 18th December 1998, James
1211 -------------------------
1212 1. Added bzip2 recognition to the (de)compression code of io_lib. This is now
1213 the latest bzip, and is recognised by phred (unlike bzip version 1). Bzip2 is
1214 approx the same as bzip1, but more or less twice as fast for decompression.
1215 io_lib/utils/compress.c
1216
1217 27th November 1998, James
1218 -------------------------
1219 1. Fixed the trace file searching mechanism in io_lib. When loading an
1220 experiment file with LN/LT lines, we now first search for the trace file
1221 relative to the location of the experiment file.
1222 io_lib/read/Read.c
1223 io_lib/read/translate.[ch]
1224
1225 16th November 1998, James
1226 -------------------------
1227 4. Added NT (NoTe) and GD (Gap4 Database) line types to the experiment file.
1228 io_lib/exp_file/expFile.[ch]
1229
1230 24th September 1998, James
1231 --------------------------
1232 1. The scf reading and writing code now handles traces with zero bases.
1233 Previously this failed after a malloc(0).
1234 io_lib/scf/read_scf.c
1235 io_lib/scf/write_scf.c
1236
1237 2. The ABI file reading code has been tidied up. It now also supports
1238 conversion of more ABI fields, including RUND, RUNT, SPAC(2), CMNT, LANE and
1239 MTXF.
1240 io_lib/abi/seqIOABI.c
1241
1242 17th July 1998, James
1243 ---------------------
1244 1. Extract_seq now copes with sequences containing no SQ line (instead of just
1245 SEGV).
1246 io_lib/progs/extract_seq
1247
1248 9th July 1998, James
1249 --------------------
1250 1. Enforce IUBC code set in io_lib when converting from trace (any format) to
1251 experiment file. We leave the IUBC 'N' intact.
1252 io_lib/read/translate.c
1253
1254 28th May 1998, James
1255 --------------------
1256 1. Added a read_sections() function to io_lib so that programs can state
1257 which bits of a trace file they are interested in. The loading code only
1258 then parses those bits. This can give big increases to things like init_exp
1259 which only wants bases and does not care about the delta-delta format of SCF
1260 trace data.
1261 io_lib/read/Read.h
1262 io_lib/read/translate.c
1263 io_lib/scf/scf.h
1264 io_lib/scf/read_scf.c
1265 io_lib/abi/seqIOABI.c
1266 io_lib/alf/seqIOALF.c
1267 init_exp/init_exp.c
1268
1269 3. Extract GELN (gel name) from ABI file when converting to SCF.
1270 io_lib/abi/seqIOABI.[ch]
1271
1272 2. Improved the makeSCF -normalise option. Background subtraction is now
1273 cleaner (and simpler) and it also now scales the heights. Moved it to io_lib
1274 as it's now freely available.
1275 io_lib/progs/makeSCF.c
1276
1277 23rd March 1998, James
1278 ----------------------
1279 1. Removed the change made on 7th May 1997 to seqIOPlain.c. This code is used
1280 by extract_seq, and so clipping in seqIOPlain causes double clipping (and
1281 hence wrong sections).
1282 io_lib/plain/seqIOPlain.c
1283
1284 11th March 1998, James
1285 ----------------------
1286 2. Removed the requirement of EXP_FILE_LINE_LENGTH in exp_fread_info().
1287 This allows for (eg) tags with very long comments to be read in without
1288 being truncated.
1289 io_lib/exp_file/expFileIO.c
1290
1291 4th March 1998, James
1292 ---------------------
1293 1. Following advice from Leif Hansson <leif.hansson@mbox4.swipnet.se>, the ALF
1294 reading code now reads the "Raw data" subfile when the "Processed data"
1295 subfile is not present, as "Processed data" is apparently an optional output
1296 of the pharmacia software. Raw data is in the same format, although I do not
1297 know what processing takes place to convert it to Processed data. (Looking at
1298 some real traces, apparently none!)
1299 io_lib/alf/seqIOALF.c
1300
1301 24th February 1998, James
1302 -------------------------
1303 1. Added an ABI in MacBinary format file type detector so that these are
1304 now autodetected.
1305 io_lib/utils/traceType.c
1306
1307 15th January 1998, James
1308 ------------------------
1309 1. Rewrote the delta_samples1/2 functions to be faster. Times vary between 0.55
1310 and 0.7 fractions of the original time.
1311 io_lib/scf/misc_scf.c
1312
1313 4th December 1997, James
1314 ------------------------
1315 1. First post-release bug fix.
1316 Io_lib incorrect sets read->trace_name when reading anything except SCF files.
1317 This means that when outputting to an experiment file no LN line is present.
1318 io_lib/read/Read.c
1319
1320 1st October 1997, James
1321 -----------------------
1322 1. Allow for SCF files to contain 0 bases. This mainly affects memory
1323 allocation, but also the display widget.
1324 io_lib/scf/read_scf.c
1325 io_lib/utils/read_alloc.c
1326
1327 28/29th August 1997, James
1328 --------------------------
1329 2. Added a few changes to make the code more portable for the Mac. Not really
1330 used at present.
1331 Misc/os.h
1332 Misc/files.c
1333 io_lib/utils/traceType.c
1334 io_lib/read/translate.c
1335 io_lib/utils/compress.c
1336
1337 30th June 1997, James
1338 ---------------------
1339 1. The exp2read function produced invalid rightCutoff values (INT_MAX) when no
1340 QR line is present. It now correctly sets it to 0.
1341 io_lib/read/translate.c
1342