Galaxy | Tool Preview

BamUtil diff (version 1.0.15+galaxy1)

bamUtil diff

The diff option on the bamUtil executable prints the difference between two coordinate sorted SAM/BAM files. This can be used to compare the outputs of running a SAM/BAM through different tools/versions of tools. The diff tool compares records that have the same Read Name and Fragment (from the flag). If a matching ReadName & Fragment is not found, the record is considered to be different. diff assumes the files are coordinate sorted and uses this assumption for determining how long to store a record before determining that the other file does not contain a matching ReadName/Fragment. If the files are not coordinate sorted, this logic does not work. By default, just the chromosome/position and cigar are compared for each record. Note: The headers are not compared.

Options are available to compare:

- all fields
- flags
- mapping quality
- mate chromosome/position
- insert size
- sequence
- base quality
- specified tags
- all tags
- turn off position comparison
- turn off cigar comparison
Inputs
Two BAM or SAM alignment files
Outputs

Choice of 2 Output Formats:

**Diff Format**
There are 2 types of differences.
ReadName/Fragment combo is in one file, but not in the other file within the window set by recPoolSize & posDiff
ReadName/Fragment combo is in both files, but at least one of the specified fields to diff is different
Each difference output consists of 2 or 3 lines. If the record only appears in one of the files, the diff is 2 lines, if it appears in both files, the diff is 3 lines.
The first line of the difference output is just the read name.
The 2nd and 3rd line (if present) begin with either a '<' or a '>'. If the record is from the first file (--in1), it begins with a '<'. If the record is from the 2nd file (--in2), it begins with a '>'.
The 2nd line is the flag followed by the diff'd fields from one of the records.
The 3rd line (if a matching record was found) is the flag followed by the diff'd fields from the matching record.
The diff'd record lines are tab separated, and are in the following order if --onlyDiffs is not specified::

  - '<' or '>'
  - flag
  - chrom:pos (chromosome name ':' 1 based position) - if --noPos is not specified
  - cigar - if --noCigar is not specified
  - mapping quality - if --mapq or --all is specified
  - mate chrom:pos (chromosome name ':' 1 based position) - if --mate or --all is specified
  - insert size - if --isize or --all is specified
  - sequence - if --seq or --all is specified
  - base quality - if --baseQual or --all is specified
  - tag:type:value - for each tag:type specified in --tags or for every tag if --all or --everyTag specified


**BAM Format**
In SAM/BAM format there will be 3 output files::

  1. the specified name with record diffs
  2. specified name with _only_<in1>.sam/bam with records only in the in1 file
  3. specified name with _only_<in2>.sam/bam with records only in the in2 file

Records that are identical in the two files are not written in any of these output files.
When a record is found in both input files, but a difference is found, the record from the first file is written with additional tags to indicate the values from the second file, using the following tags::

  - ZF - Flag
  - ZP - Chromosome:1-based Position
  - ZC - Cigar
  - ZM - Mapping Quality
  - ZN - Chromosome:1-based Mate Position
  - ZI - Insert Size
  - ZS - Sequence
  - ZQ - Base Quality
  - ZT - Tags

If --onlyDiffs is not specified, all fields that were compared will be printed in the tags. If --onlyDiffs is specified, then only the differing compared fields will be printed in the tags.

https://genome.sph.umich.edu/wiki/BamUtil:_diff