Is there evidence that some sites in the alignment have been subject to positive diversifying selection, either pervasive (throughout the evolutionary tree) or episodic (only on some lineages)? In other words, BUSTED asks whether a given gene has been subject to positive, diversifying selection at any site, at any time. If a priori information about lineages of interest is available (e.g., due to migration, change in the environment, etc.), then BUSTED can be restricted to test for selection only on a subset of tree lineages, potentially boosting power.
BUSTED (Branch-site Unrestricted Statistical Test for Episodic Diversification) is a powerful tool for detecting gene-wide evidence of episodic positive selection. It works by fitting a codon model to the data and comparing a null model, which does not allow for positive selection, to an alternative model that does. If the alternative model provides a statistically significant better fit to the data, then we can conclude that there is evidence for positive selection.
The core of BUSTED is a random effects branch-site model. This model allows the selection pressure (represented by the omega ratio, dN/dS) to vary both among sites in the alignment and across branches in the phylogenetic tree. The model includes three rate classes for omega: one for negative/purifying selection (omega < 1), one for neutral evolution (omega = 1), and one for positive/diversifying selection (omega > 1).
BUSTED tests for positive selection by comparing a constrained model (where omega is not allowed to be greater than 1) to an unconstrained model (where omega can be greater than 1). A likelihood ratio test is used to determine if the unconstrained model is a significantly better fit to the data. If it is, then there is evidence for positive selection acting on the gene.
BUSTED can also incorporate models of selection on synonymous substitutions (MSS models). This is a new comparative framework for estimating selection on synonymous substitutions. These models account for selection by partitioning synonymous substitutions into multiple classes and estimating relative substitution rates for each, while also considering confounders like mutation bias. This framework allows for the study of selection on synonymous substitutions in diverse taxa without prior assumptions about the driving forces. For more information, please see the source publication: http://pubmed.ncbi.nlm.nih.gov/40129111/
Note: the names of sequences in the alignment must match the names of the sequences in the tree.
A JSON file with analysis results (http://hyphy.org/resources/json-fields.pdf).
For each tested branch the analysis will infer the appropriate number of selective regimes, and whether or not there is statistical evidence of positive selection on that branch.
A custom visualization module for viewing these results is available (see http://vision.hyphy.org/BUSTED for an example)
--code Which genetic code to use
--alignment An in-frame codon alignment in one of the formats supported by HyPhy.
--tree A phylogenetic tree (optionally annotated with {}).
--branches Which branches should be tested for selection?
All [default] : test all branches
Internal : test only internal branches (suitable for
intra-host pathogen evolution for example, where terminal branches
may contain polymorphism data)
Leaves: test only terminal (leaf) branches
Unlabeled: if the Newick string is labeled using the {} notation,
test only branches without explicit labels
(see http://hyphy.org/tutorials/phylotree/)
--kill-zero-lengths Automatically delete internal zero-length branches for computational efficiency.
Advanced parameters
...................
--srv Include synonymous rate variation in the model.
--grid-size The number of points in the initial distributional guess for likelihood fitting.
--starting-points The number of initial random guesses to seed rate values optimization.
--syn-rates The number of synonymous rate classes to include in the model [1-10, default 3].
--rates The number of non-synonymous rate classes to include in the model [1-10, default 3].
--multiple-hits Include support for multiple nucleotide substitutions.
None: No correction.
Double: Allow double substitutions.
Double+Triple: Allow double and triple substitutions.
--error-sink [Advanced experimental setting] Include a rate class to capture misalignment artifacts.
--mss Include support for multiple synonymous rate class substitutions.
--mss-type How to partition synonymous codons into classes.
Full: Each set of codons mapping to the same amino-acid class have a separate substitution rate (Valine == neutral)
SynREV: Each set of codons mapping to the same amino-acid class have a separate substitution rate (mean = 1)
SynREV2: Each pair of synonymous codons mapping to the same amino-acid class and separated by a transition have a separate substitution rate (no rate scaling))
SynREV2g: Each pair of synonymous codons mapping to the same amino-acid class and separated by a transition have a separate substitution rate (Valine == neutral). All between-class synonymous substitutions share a rate.
SynREVCodon: Each codon pair that is exchangeable gets its own substitution rate (fully estimated, mean = 1)
Random: Random partition (specify how many classes; largest class = neutral)
Empirical: Load a TSV file with an empirical rate estimate for each codon pair
File: Load a TSV partition from file (prompted for neutral class)
Codon-file: Load a TSV partition for pairs of codons from a file (prompted for neutral class)
--mss-file File defining the model partition.
--mss-reference-rate Normalize relative to these rates.
--mss-classes How many codon rate classes.
--mss-neutral Designation for the neutral substitution rate.