This tools screens an alignment of sequences for evidence of recombination in one or more sequences. The main idea is that if sufficient recombination has occurred, then no single phylogenetic tree will properly fit the entire length of the alignment and instead a separate tree will be preferred for each nonrecombinant segment.
This analysis implements a heuristic approach to screening alignments of sequences for recombination, by using the CHC genetic algorithm (GA) to search for phylogenetic incongruence among different partitions of the data. The number of partitions is determined using a step-up procedure, while the placement of breakpoints is searched for with the GA. The best fitting model (based on c-AIC) is returned; and additional post-hoc tests run to distinguish topological incongruence from rate-variation.
For each identified breakpoint, the support for its placement is calculated, and for each non-recombinant fragment, a phylogenetic tree is inferred (using neighbor joining) and returned.
A FASTA sequence alignment
A JSON file with analysis results (http://hyphy.org/resources/json-fields.pdf).
A custom visualization module for viewing these results is available (see http://vision.hyphy.org/GARD for an example)
--type type of alignment to screen Nucleotide [default]. Assumes aligned nucleotide data and screens the alignment using the general time reversible model of sequence evolution. This is the fastest option Protein Assumes aligned aminoacid sequences. One of several protein substitution models may be used to screen the alignment. Codon Assumes an in-frame coding sequence alignment. The Muse-Gaut 94 (GTR) model will be used to screen the alignment. Selecting this option will dramatically increase run times. --code Genetic code/translation table to use (for codon alignments). Default value: Universal --model The substitution model to use (for protein alignments). default value: JTT --rv The discrete distribution to use for modeling site to site rate variation. None [default] No rate variation. This is the fastest option in terms of run time, but using it can result in false positives if there is significant site-to-site rate variation GDD Use the general discrete distribution on N bins Beta-Gamma Use a discretized gamma with weights partitioned by a discretized beta (see doi.org/10.1093/molbev/msi009) --rate-classes How many site rate classes to use (if GDD or Beta-Gamma are selected) default value: 4