Galaxy | Tool Preview

Phylocatenator (version 1.0.0)
Minimum genes per species
Minimum length of an aligned gene family to be included
Minimum speices per gene
Only species in the last can be retained in concatenated file
To partition data by model (protein, dna, binary, etc) according to a LUT (lookup table)

Generate a concatenated dataset from a phytab data table for phylogenetic analysis.


What it does

This tool produces a concatenated data set for phylogenetics when not all genes are sampled for all species.


Basic Example

The input data must be in phytab column format. Column 1 is species name, C2 is genefamily, C3 individual gene name, C4 is sequence. Sequences of each gene family must be aligned:

species1      gene1   genenameA       acgttagcgcgctatagc
species2      gene1   genenameB       acgttag--cgctataaa
species3      gene1   genenameC       acgttagcgcgctatagc
species4      gene1   genenameD       acgttagcgcgctatagc
species1      gene2   genenameE       --gttagtttgcta
species3      gene2   genenameF       gtgttagtttgcta

Two variables are $gene and $species. These set thresholds for inclusion of data. $species is the minimum number of species that contain a particular gene. $gene sets a minimum number of gene families that a species must have to be included in the dataset.

Running phylocatenator on the above data with 0 for genes and 0 for species yields:

4 32
species1      acgttagcgcgctatagc--gttagtttgcta
species2      acgttag--cgctataaa??????????????
species3      acgttagcgcgctatagcgtgttagtttgcta
species4      acgttagcgcgctatagc??????????????

Optional Functionality

I. You may enter a list of species. Species not in this list will not be written to the output file. For example, a species list of:

species1
species2

Would change the above output to:

species1      acgttagcgcgctatagc--gttagtttgcta
species2      acgttag--cgctataaa??????????????
  1. Table of partition models

You may enter a table of models for each gene family/partition. Phylocatenator will then sort all the data to put all data for the same models together. It will then create the appropriate partition file, which will specify each model in raxml. Currently, it is only possible to partiion data into valid raxml models.

The format is a tab-delimited file as follows:

gene1 WAG
gene2 JTT
gene3 DNA
gene4 WAG

Valid models include the following:

BIN = binary morphological data
MULTI = multistate morphological data
DNA = DNA data
WAG = one of several protein models listed in raxml help documents
  1. Attribute

You may enter a table with an attribute/value for each gene family/partition. Phylocatenator will then select the data based on that value.

The format is a tab-delimited file as follows:

gene1 3.1
gene2 2.2
gene3 0.9
gene4 6.5

You can choose gene partitions based on the attribute value. For example, if the numbers above represent rate of evolution, you could choose to include 'slow' genes with a rate less than 2.5