Galaxy |

NextAlign (version 2.7.0+galaxy0)

Choose the source for the reference genome:

Using reference genome:

Select genome from the list

FASTA file with input sequences:

Output insertion sequences?:

Outputs stripped insertions relative to reference as CSV

Translate annotated genes based on GFF and gene list?:

Comma separated list of genes to translate.:

GFF3 file containing custom gene map:

Minimum length of nucleotide sequence to consider for alignment:

If a sequence is shorter than that, alignment will not be attempted and a warning will be emitted. When adjusting this parameter, note that alignment of short sequences can be unreliable.

Penalty for extending a gap.:

If zero, all gaps regardless of length incur the same penalty.

Penalty for opening of a gap.:

A higher penalty results in fewer gaps and more mismatches. Should be less than the penalty value of opening a gap in frame to avoid gaps in genes.

Penalty for opening gaps at the beginning of a codon.:

Should be greater than the penalty of opening a and less than penalty of opening a gap out of frame, to avoid gaps in genes, but favor gaps that align with codons.

Penalty for opening gaps in the body of a codon.:

Should be greater than the penalty for opening gaps in-frame to favor gaps that align with codons.

Penalty for aligned nucleotides or aminoacids that differ in state during alignment:

Note that this is redundantly parameterized with score match.

Score for encouraging aligned nucelotides or aminoacids with matching state.:

Note that this is redundantly parameterized with mismatch penalty.

Maximum length of insertions or deletions allowed to proceed with alignment.:

Alignments with long indels are slow to compute and require substantial memory in the current implementation. Alignment of sequences with indels that are longer than this value will not be attempted and a warning will be emitted.

Seed length for nucleotide alignment.:

Seeds should be long enough to be unique, but short enough to match with high probability.

Minimum number of seeds to search for during nucleotide alignment.:

Relevant for short sequences. In long sequences, the number of seeds is determined by nucleotide seed spacing. This should be a positive integer.

Spacing between seeds during nucleotide alignment.:

Maximum number of mismatching nucleotides.:

Maximum number of mismatching nucleotides allowed for a seed to be considered a match.

What it does

Nextalign is a viral genome sequence alignment algorithm used in Nextclade.

It will perform a pairwise alignment of provided sequences against a given reference sequence using banded local alignment algorithm with affine gap-cost. Band width and rough relative positions are determined through seed matching.

Nextalign will strip insertions relative to the reference and output them in a separate CSV file.

Optionally, when provided with a gene map and a list of genes, Nextalign can perform translation of these genes.

Currently Nextalign primarily focuses on SARS-CoV-2 genome, but it can be used on any virus, given a sufficiently similar reference sequence (less than a 5% divergence).