Documentation : https://gitlab.com/mcfrith/last
LAST finds similar regions between sequences.
The main technical innovation is that LAST finds initial matches based on their multiplicity, instead of using a fixed length (e.g. BLAST uses 11-mers). To find these variable-length matches, it uses a suffix array (inspired by Vmatch). To achieve high sensitivity, it uses a spaced suffix array (or subset suffix array), analogous to spaced seeds (or subset seeds).
- LAST can:
- Handle big sequence data, e.g:
- Compare two vertebrate genomes.
- Align billions of DNA reads to a genome.
- Indicate the reliability of each aligned column.
- Use sequence quality data properly.
- Compare DNA to proteins, with frameshifts.
- Compare PSSMs to sequences.
- Calculate the likelihood of chance similarities between random sequences.
- Do split and spliced alignment.
- Train alignment parameters for unusual kinds of sequence (e.g. nanopore).