Galaxy |

What does this do?

This tools screens an alignment of sequences for evidence of recombination in one or more sequences. The main idea is that if sufficient recombination has occurred, then no single phylogenetic tree will properly fit the entire length of the alignment and instead a separate tree will be preferred for each nonrecombinant segment.

Brief description

This analysis implements a heuristic approach to screening alignments of sequences for recombination, by using the CHC genetic algorithm (GA) to search for phylogenetic incongruence among different partitions of the data. The number of partitions is determined using a step-up procedure, while the placement of breakpoints is searched for with the GA. The best fitting model (based on c-AIC) is returned; and additional post-hoc tests run to distinguish topological incongruence from rate-variation.

For each identified breakpoint, the support for its placement is calculated, and for each non-recombinant fragment, a phylogenetic tree is inferred (using neighbor joining) and returned.

Input

A FASTA sequence alignment

Output

A JSON file with analysis results (http://hyphy.org/resources/json-fields.pdf).

A custom visualization module for viewing these results is available (see http://vision.hyphy.org/GARD for an example)

Tool options

--type type of alignment to screen
Nucleotide [default].
Assumes aligned nucleotide data and screens the alignment using
the general time reversible model of sequence evolution.
This is the fastest option
Protein
Assumes aligned aminoacid sequences. One of several protein
substitution models may be used to screen the alignment.
Codon
Assumes an in-frame coding sequence alignment.
The Muse-Gaut 94 (GTR) model will be used to screen the alignment.
Selecting this option will dramatically increase run times.

--code Genetic code/translation table to use (for codon alignments).
Default value: Universal

--model The substitution model to use (for protein alignments).
default value: JTT

--rv The discrete distribution to use for modeling site to site rate variation.

None [default]
No rate variation. This is the fastest option in terms of run time, but
using it can result in false positives if there is significant site-to-site
rate variation
GDD
Use the general discrete distribution on N bins
Beta-Gamma
Use a discretized gamma with weights partitioned by a discretized beta
(see doi.org/10.1093/molbev/msi009)

--rate-classes How many site rate classes to use (if GDD or Beta-Gamma are selected)
default value: 4

GARD : Genetic Algorithms for Recombination Detection.

What does this do?

Brief description

Input

Output

Tool options