What is does
LASTZ is designed to preprocess one sequence or set of sequences (which we collectively call the TARGET) and then align several QUERY sequences to it. It was developed by Bob Harris in the lab of Webb Miller at Penn State.
Read documentation before proceeding. LASTZ is a complex tool with many parameter options. Fortunately, there is a great manual maintained by its author. Default parameters may be sufficient to obtain the initial idea about how similar your sequences are, but to produce reliable alignments you may need to tweak the parameters. Read the manual.
Galaxy version of LASTZ sets --ambiguous=iupac as default (see Scoring section). This prevents LASTZ from erroring out if one of the DNA inputrs contains "non-standard" nucleotides.
About LASTZ parameters
Galaxy's version of LASTZ has nine parameter sections (Where to look, Scoring, Seeding, HSPs, Chaining, Gapped extension, Filtering, Interpolation, and Output). These sections closely follow parameter description in the manual.
Defaults
here are defaults for some of the most important parameters:
--seed=<pattern> set seed pattern (12of19, 14of22, or general pattern) (default is 1110100110010101111) SEE "Seeding" SECTION -> "Select seed type" --[no]transition allow (or don't) one transition in a seed hit (by default a transition is allowed) SEE "Seeding" SECTION -> "Allow transitions" --[no]chain perform chaining (by default no chaining is performed) SEE "Chaining" SECTION --[no]gapped perform gapped alignment (instead of gap-free) (by default gapped alignment is performed) SEE "Gapped extension" SECTION --strand=both search both strands --strand=plus search + strand only (matching strand of query spec) (by default both strands are searched) SEE "Where to look" SECTION --scores=<file> read substitution and gap scores from a file SEE "Scoring" SECTION --xdrop=<score> set x-drop threshold (default is 10sub[A][A]) SEE "HSPs" SECTION --ydrop=<score> set y-drop threshold (default is open+300extend) SEE "Gapped extension" SECTION --hspthresh=<score> set threshold for high scoring pairs (default is 3000) ungapped extensions scoring lower are discarded <score> can also be a percentage or base count SEE "HSPs" SECTION --gappedthresh=<score> set threshold for gapped alignments gapped extensions scoring lower are discarded <score> can also be a percentage or base count (default is to use same value as --hspthresh) SEE "Gapped extension" SECTION
Substitution matrix
By default the HOXD70 substitution scores are used (from Chiaromonte et al. 2002):
bad_score = X:-1000 # used for sub['X'][*] and sub[*]['X'] fill_score = -100 # used when sub[*][*] is not defined gap_open_penalty = 400 gap_extend_penalty = 30 A C G T A 91 -114 -31 -123 C -114 100 -125 -31 G -31 -125 100 -114 T -123 -31 -114 91
Matrix can be supplied as an input to Read the substitution scores parameter in Scoring section. Substitution matrix can be inferred from your data using another LASTZ-based tool (LASTZ_D: Infer substitution scores).
Output
This version of LASTZ produces one output by default: a BAM alignment file. Other formats as well as a Dot Plot can be configured in Output section. This incarnation of LASTZ produces outputs without comment line starting with '#'. To learn identity of each column, consult formats section of LASTZ manual.