What it does
Takes an input file of aligned protein sequences (typically FASTA or Clustal format), and a matching file of unaligned nucleotide sequences (FASTA format, using the same identifiers), and threads the nucleotide sequences onto the protein alignment to produce a codon aware nucleotide alignment - which can be viewed as a back translation.
If you specify one of the standard NCBI genetic codes (recommended), then the translation is verified. This will allow fuzzy matching if stop codons in the protein sequence have been reprented as X, and will allow for a trailing stop codon present in the nucleotide sequences but not the protein.
Note - the protein and nucleotide sequences must use the same identifers.
Note - If no translation table is specified, the provided nucleotide sequences should be exactly three times the length of the protein sequences (exluding the gaps).
Note - the nucleotide FASTA file may contain extra sequences not in the protein alignment, they will be ignored. This can be useful if for example you have a nucleotide FASTA file containing all the genes in an organism, while the protein alignment is for a specific gene family.
Example
Given this protein alignment in FASTA format:
>Alpha DEER >Beta DE-R >Gamma D--R
and this matching unaligned nucleotide FASTA file:
>Alpha GATGAGGAACGA >Beta GATGAGCGU >Gamma GATCGG
the tool would return this nucleotide alignment:
>Alpha GATGAGGAACGA >Beta GATGAG---CGU >Gamma GAT------CGG
Notice that all the gaps are multiples of three in length.
Citation
This tool uses Biopython, so if you use this Galaxy tool in work leading to a scientific publication please cite the following paper:
Cock et al (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. https://doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
This tool is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/align_back_trans