bamUtil diff
The diff option on the bamUtil executable prints the difference between two coordinate sorted SAM/BAM files. This can be used to compare the outputs of running a SAM/BAM through different tools/versions of tools. The diff tool compares records that have the same Read Name and Fragment (from the flag). If a matching ReadName & Fragment is not found, the record is considered to be different. diff assumes the files are coordinate sorted and uses this assumption for determining how long to store a record before determining that the other file does not contain a matching ReadName/Fragment. If the files are not coordinate sorted, this logic does not work. By default, just the chromosome/position and cigar are compared for each record. Note: The headers are not compared.
Options are available to compare:
- all fields - flags - mapping quality - mate chromosome/position - insert size - sequence - base quality - specified tags - all tags - turn off position comparison - turn off cigar comparison
Choice of 2 Output Formats:
**Diff Format** There are 2 types of differences. ReadName/Fragment combo is in one file, but not in the other file within the window set by recPoolSize & posDiff ReadName/Fragment combo is in both files, but at least one of the specified fields to diff is different Each difference output consists of 2 or 3 lines. If the record only appears in one of the files, the diff is 2 lines, if it appears in both files, the diff is 3 lines. The first line of the difference output is just the read name. The 2nd and 3rd line (if present) begin with either a '<' or a '>'. If the record is from the first file (--in1), it begins with a '<'. If the record is from the 2nd file (--in2), it begins with a '>'. The 2nd line is the flag followed by the diff'd fields from one of the records. The 3rd line (if a matching record was found) is the flag followed by the diff'd fields from the matching record. The diff'd record lines are tab separated, and are in the following order if --onlyDiffs is not specified:: - '<' or '>' - flag - chrom:pos (chromosome name ':' 1 based position) - if --noPos is not specified - cigar - if --noCigar is not specified - mapping quality - if --mapq or --all is specified - mate chrom:pos (chromosome name ':' 1 based position) - if --mate or --all is specified - insert size - if --isize or --all is specified - sequence - if --seq or --all is specified - base quality - if --baseQual or --all is specified - tag:type:value - for each tag:type specified in --tags or for every tag if --all or --everyTag specified **BAM Format** In SAM/BAM format there will be 3 output files:: 1. the specified name with record diffs 2. specified name with _only_<in1>.sam/bam with records only in the in1 file 3. specified name with _only_<in2>.sam/bam with records only in the in2 file Records that are identical in the two files are not written in any of these output files. When a record is found in both input files, but a difference is found, the record from the first file is written with additional tags to indicate the values from the second file, using the following tags:: - ZF - Flag - ZP - Chromosome:1-based Position - ZC - Cigar - ZM - Mapping Quality - ZN - Chromosome:1-based Mate Position - ZI - Insert Size - ZS - Sequence - ZQ - Base Quality - ZT - Tags If --onlyDiffs is not specified, all fields that were compared will be printed in the tags. If --onlyDiffs is specified, then only the differing compared fields will be printed in the tags.