Galaxy | Tool Preview

HyPhy-BUSTED (version 2.5.83+galaxy0)
An in-frame codon alignment in one of the formats supported by HyPhy
A phylogenetic tree (optionally annotated with {})
Which genetic code should be used
Branches to test
Advanced Options
Advanced Options 0

BUSTED : Bayesian UnresTricted Test of Episodic Diversification

What question does this method answer?

Is there evidence that some sites in the alignment have been subject to positive diversifying selection, either pervasive (throughout the evolutionary tree) or episodic (only on some lineages)? In other words, BUSTED asks whether a given gene has been subject to positive, diversifying selection at any site, at any time. If a priori information about lineages of interest is available (e.g., due to migration, change in the environment, etc.), then BUSTED can be restricted to test for selection only on a subset of tree lineages, potentially boosting power.

Brief description

BUSTED (Branch-site Unrestricted Statistical Test for Episodic Diversification) is a powerful tool for detecting gene-wide evidence of episodic positive selection. It works by fitting a codon model to the data and comparing a null model, which does not allow for positive selection, to an alternative model that does. If the alternative model provides a statistically significant better fit to the data, then we can conclude that there is evidence for positive selection.

The core of BUSTED is a random effects branch-site model. This model allows the selection pressure (represented by the omega ratio, dN/dS) to vary both among sites in the alignment and across branches in the phylogenetic tree. The model includes three rate classes for omega: one for negative/purifying selection (omega < 1), one for neutral evolution (omega = 1), and one for positive/diversifying selection (omega > 1).

BUSTED tests for positive selection by comparing a constrained model (where omega is not allowed to be greater than 1) to an unconstrained model (where omega can be greater than 1). A likelihood ratio test is used to determine if the unconstrained model is a significantly better fit to the data. If it is, then there is evidence for positive selection acting on the gene.

MSS Methodology

BUSTED can also incorporate models of selection on synonymous substitutions (MSS models). This is a new comparative framework for estimating selection on synonymous substitutions. These models account for selection by partitioning synonymous substitutions into multiple classes and estimating relative substitution rates for each, while also considering confounders like mutation bias. This framework allows for the study of selection on synonymous substitutions in diverse taxa without prior assumptions about the driving forces. For more information, please see the source publication: http://pubmed.ncbi.nlm.nih.gov/40129111/

Input

  1. A FASTA sequence alignment.
  2. A phylogenetic tree in the Newick format

Note: the names of sequences in the alignment must match the names of the sequences in the tree.

Output

A JSON file with analysis results (http://hyphy.org/resources/json-fields.pdf).

For each tested branch the analysis will infer the appropriate number of selective regimes, and whether or not there is statistical evidence of positive selection on that branch.

A custom visualization module for viewing these results is available (see http://vision.hyphy.org/BUSTED for an example)

Tool options

--code              Which genetic code to use

--alignment         An in-frame codon alignment in one of the formats supported by HyPhy.

--tree              A phylogenetic tree (optionally annotated with {}).

--branches          Which branches should be tested for selection?
                        All [default] : test all branches

                        Internal : test only internal branches (suitable for
                        intra-host pathogen evolution for example, where terminal branches
                        may contain polymorphism data)

                        Leaves: test only terminal (leaf) branches

                        Unlabeled: if the Newick string is labeled using the {} notation,
                        test only branches without explicit labels
                        (see http://hyphy.org/tutorials/phylotree/)

--kill-zero-lengths Automatically delete internal zero-length branches for computational efficiency.

Advanced parameters
...................

--srv               Include synonymous rate variation in the model.

--grid-size         The number of points in the initial distributional guess for likelihood fitting.

--starting-points   The number of initial random guesses to seed rate values optimization.

--syn-rates         The number of synonymous rate classes to include in the model [1-10, default 3].

--rates             The number of non-synonymous rate classes to include in the model [1-10, default 3].

--multiple-hits     Include support for multiple nucleotide substitutions.
                    None: No correction.
                    Double: Allow double substitutions.
                    Double+Triple: Allow double and triple substitutions.

--error-sink        [Advanced experimental setting] Include a rate class to capture misalignment artifacts.

--mss               Include support for multiple synonymous rate class substitutions.

--mss-type          How to partition synonymous codons into classes.
                    Full: Each set of codons mapping to the same amino-acid class have a separate substitution rate (Valine == neutral)
                    SynREV: Each set of codons mapping to the same amino-acid class have a separate substitution rate (mean = 1)
                    SynREV2: Each pair of synonymous codons mapping to the same amino-acid class and separated by a transition have a separate substitution rate (no rate scaling))
                    SynREV2g: Each pair of synonymous codons mapping to the same amino-acid class and separated by a transition have a separate substitution rate (Valine == neutral). All between-class synonymous substitutions share a rate.
                    SynREVCodon: Each codon pair that is exchangeable gets its own substitution rate (fully estimated, mean = 1)
                    Random: Random partition (specify how many classes; largest class = neutral)
                    Empirical: Load a TSV file with an empirical rate estimate for each codon pair
                    File: Load a TSV partition from file (prompted for neutral class)
                    Codon-file: Load a TSV partition for pairs of codons from a file (prompted for neutral class)

--mss-file          File defining the model partition.

--mss-reference-rate Normalize relative to these rates.

--mss-classes       How many codon rate classes.

--mss-neutral       Designation for the neutral substitution rate.