Galaxy integration Victor Mataigne and ABIMS TEAM.
Contact support.abims@sb-roscoff.fr for any questions or concerns about the Galaxy implementation of this tool.
CompCodeML (from paml package)
A few help is detailed below ; full and detailed codeml readme can be found on the paml website.
Due to their high number, some parameters incompatibility can remain.
This Galaxy implementation :
We recommand to have a look at the full paml manual before looking at the advanced parameters, in order to spot parameters incompatibilities and to know what each model does. If you choose by mistake incompatible parameters, the output files will be empty, except the log file ("run_codeml" output) which will normally explicit the error.
Known incompatibilities:
Description
codeML finds positive selection within branches or codons within a tree and a set of sequences.
Input files
Parameters
Several models are available.
Basically, this tool write a configfile called codeml.ctl with the specified parameters and then launches codeml.
Branch models allow the omega ratio to vary among branches in the phylogeny and are useful for detecting positive selection acting on particular lineages. Sites models allow the omega ratio to vary among sites (codons or amino acids).
Two pairs of models appear to be particularly useful, forming two likelihood ratio tests of positive selection. The first compares M1a ('NearlyNeutral', NSsites=1) and M2a ('PositiveSelection', NSsites=2), while the second compares M7 ('beta', NSsites=7) and M8 ('beta&ω', NSsites=8).
Other examples of model
How to run the branch-site models (A & B in Yang & Nielsen 2002 MBE) ? The options are :
How to run the M0 (one-ratio) model :
model = 0, NSsites= = 0.
Advanced Parameters
See paml complete manual and FAQ on the paml website.
Details of some parameters :
'kappa' denotes the transition/transversion rate ratio.
'fix_kappa' specifies whether kappa in K80, F84, or HKY85 is given at a fixed value or is to be estimated by iteration from the data.
-> If fix_kappa = 1 (fixed), the value of kappa is the given value
-> If fix_kappa = 0 (estimated) the value of kappa is used as the initial estimate for iteration.
'alpha' refers to the shape parameter alpha of the gamma distribution for variable substitution rates across sites (Yang 1994a).
'fix_alpha' works in a similar way that fix_kappa.
-> The model of a single rate for all sites is specified as fix_alpha = 1 and alpha = 0 (0 means infinity)
-> The (discrete-) gamma model is specified by a positive value for alpha, and 'ncatG' is then the number of categories for the discrete-gamma model. Values such as 5, 4, 8, or 10 are reasonable.
fix_rho and rho work in a similar way and concern independence or correlation of rates at adjacent sites, where rho is the correlation parameter of the auto-discrete-gamma model (Yang 1995).
-> The model of independent rates for sites is specified as fix_rho = 1 and rho = 0; choosing alpha = 0 further means a constant rate for all sites.
-> The auto-discrete-gamma model is specified by positive values for both alpha and rho.
-> The model of a constant rate for sites is a special case of the (discrete) gamma model with alpha = 0 (means infinity).
-> The model of independent rates for sites is a special case of the auto-discrete-gamma model with rho = 0.
Output files
How to edit manually the tree file : Branch or node labels
Some models implemented in codeml allow several groups of branches on the tree, which are assigned different parameters of interest.
The following trees are equivalent: - ((rabbit, rat) $1, human), goat_cow, marsupial); - (((rabbit #1, rat #1) #1, human), goat_cow, marsupial);
$ is the symbol for clade labels.
Rules concerning nested clade labels : The symbol # takes precedence over the symbol $, and clade labels close to the tips take precedence over clade labels for ancestral nodes close to the root.