HISAT is a fast and sensitive spliced alignment program. As part of HISAT, we have developed a new indexing scheme based on the Burrows-Wheeler transform (BWT) and the FM index, called hierarchical indexing, that employs two types of indexes: (1) one global FM index representing the whole genome, and (2) many separate local FM indexes for small regions collectively covering the genome. Our hierarchical index for the human genome (about 3 billion bp) includes ~48,000 local FM indexes, each representing a genomic region of ~64,000bp. As the basis for non-gapped alignment, the FM index is extremely fast with a low memory footprint, as demonstrated by Bowtie. In addition, HISAT provides several alignment strategies specifically designed for mapping different types of RNA-seq reads. All these together, HISAT enables extremely fast and sensitive alignment of reads, in particular those spanning two exons or more. As a result, HISAT is much faster >50 times than TopHat2 with better alignment quality. Although it uses a large number of indexes, the memory requirement of HISAT is still modest, approximately 4.3 GB for human. HISAT uses the Bowtie2 implementation to handle most of the operations on the FM index. In addition to spliced alignment, HISAT handles reads involving indels and supports a paired-end alignment mode. Multiple processors can be used simultaneously to achieve greater alignment speed. HISAT outputs alignments in SAM format, enabling interoperation with a large number of other tools (e.g. SAMtools, GATK) that use SAM. HISAT is distributed under the GPLv3 license, and it runs on the command line under Linux, Mac OS X and Windows.
The reporting mode governs how many alignments HISAT looks for, and how to report them.
In general, when we say that a read has an alignment, we mean that it has a valid alignment. When we say that a read has multiple alignments, we mean that it has multiple alignments that are valid and distinct from one another.
Two alignments for the same individual read are "distinct" if they map the same read to different places. Specifically, we say that two alignments are distinct if there are no alignment positions where a particular read offset is aligned opposite a particular reference offset in both alignments with the same orientation. E.g. if the first alignment is in the forward orientation and aligns the read character at read offset 10 to the reference character at chromosome 3, offset 3,445,245, and the second alignment is also in the forward orientation and also aligns the read character at read offset 10 to the reference character at chromosome 3, offset 3,445,245, they are not distinct alignments.
Two alignments for the same pair are distinct if either the mate 1s in the two paired-end alignments are distinct or the mate 2s in the two alignments are distinct or both.
HISAT searches for up to N distinct, primary alignments for each read, where N equals the integer specified with the -k parameter. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments. It is possible that multiple distinct alignments whave the same score. That is, if -k 2 is specified, HISAT will search for at most 2 distinct alignments. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. See the SAM specification for details.
HISAT does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, HISAT does not gaurantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than where exactly it originated.
When HISAT finishes running, it prints messages summarizing what happened. These messages are printed to the "standard error" ("stderr") filehandle. For datasets consisting of unpaired reads, the summary might look like this:
20000 reads; of these: 20000 (100.00%) were unpaired; of these: 1247 (6.24%) aligned 0 times 18739 (93.69%) aligned exactly 1 time 14 (0.07%) aligned >1 times 93.77% overall alignment rate
For datasets consisting of pairs, the summary might look like this:
10000 reads; of these: 10000 (100.00%) were paired; of these: 650 (6.50%) aligned concordantly 0 times 8823 (88.23%) aligned concordantly exactly 1 time 527 (5.27%) aligned concordantly >1 times ---- 650 pairs aligned concordantly 0 times; of these: 34 (5.23%) aligned discordantly 1 time ---- 616 pairs aligned 0 times concordantly or discordantly; of these: 1232 mates make up the pairs; of these: 660 (53.57%) aligned 0 times 571 (46.35%) aligned exactly 1 time 1 (0.08%) aligned >1 times 96.70% overall alignment rate
The indentation indicates how subtotals relate to totals.