Galaxy | Tool Preview

Kc-Align (version 1.0.2+galaxy1)
Single FASTA reference sequence to be aligned
Multi-FASTA of seqeunces to be aligned with the reference
The 1-indexed start position of the gene of interest in the reference sequence (For multi-segmented sequences, input each start site separated by a comma ex: 12562,12591)
The 1-indexed end position of the gene of interest in the reference sequence (For multi-segmented sequences, input each end site separated by a comma ex: 12592,13905)
Sequences that are more than an empirically determined distance from the reference are discarded before the final alignment
Compress identical sequences

Kc-Align

Kc-Algin is a codon-aware multiple aligner that uses Kalgin2 to produce in-frame gapped codon alignments for selection analysis of small genomes (mostly viral and some smaller bacterial genomes). Takes nucleotide seqeunces as inputs, converts them to their in-frame amino acid sequences, performs multiple alignment with Kalign, and then converts the alignments back to their original codon sequence while preserving the gaps. Produces two outputs: the gapped nucleotide alignments in FASTA format and in CLUSTAL format.

Kc-Align will also attempt to detect any frameshift mutations in the input reads. If a frameshift is detected, that sequence will not be included in the multiple alignment and its ID will be printed to stdout.

Kc-Align also has functionality for genes that are are composed of more than one continuous sequence (currently only support for two segments). This can be achieved by entering each segments start coordinate in the Start Position parameter separated by a comma and then doing the same for each segments end coordinate in the End Position parameter (Ex: Start Postion: 12562,12591 End Position: 12592,13905)

Modes:

Kc-Align can be run in three different modes, depending on your input data.

  • In genome mode, the "reference" and "reads" input parameters are all full genome FASTA files. This mode also requires the 1-based start and end position numbers corresponding to the gene you are interested in aligning from the reference input.
  • If both the "reference" and "reads" inputs are already in-frame genes, the gene mode should be used. This mode does not require start and end position parameters as the reference is already in-frame.
  • For the case when your "reference" is an in-frame gene while the "reads" are whole genomes, the mixed mode can be used. Like gene mode, this mode does not require the start and end point position parameters.