Galaxy | Tool Preview

NextAlign (version 2.7.0+galaxy0)
Select genome from the list
Outputs stripped insertions relative to reference as CSV
If a sequence is shorter than that, alignment will not be attempted and a warning will be emitted. When adjusting this parameter, note that alignment of short sequences can be unreliable.
If zero, all gaps regardless of length incur the same penalty.
A higher penalty results in fewer gaps and more mismatches. Should be less than the penalty value of opening a gap in frame to avoid gaps in genes.
Should be greater than the penalty of opening a and less than penalty of opening a gap out of frame, to avoid gaps in genes, but favor gaps that align with codons.
Should be greater than the penalty for opening gaps in-frame to favor gaps that align with codons.
Note that this is redundantly parameterized with score match.
Note that this is redundantly parameterized with mismatch penalty.
Alignments with long indels are slow to compute and require substantial memory in the current implementation. Alignment of sequences with indels that are longer than this value will not be attempted and a warning will be emitted.
Seeds should be long enough to be unique, but short enough to match with high probability.
Relevant for short sequences. In long sequences, the number of seeds is determined by nucleotide seed spacing. This should be a positive integer.
Maximum number of mismatching nucleotides allowed for a seed to be considered a match.

What it does

Nextalign is a viral genome sequence alignment algorithm used in Nextclade.

It will perform a pairwise alignment of provided sequences against a given reference sequence using banded local alignment algorithm with affine gap-cost. Band width and rough relative positions are determined through seed matching.

Nextalign will strip insertions relative to the reference and output them in a separate CSV file.

Optionally, when provided with a gene map and a list of genes, Nextalign can perform translation of these genes.

Currently Nextalign primarily focuses on SARS-CoV-2 genome, but it can be used on any virus, given a sufficiently similar reference sequence (less than a 5% divergence).