Galaxy | Tool Preview

Cactus (version 2.7.1+galaxy0)
The taxonomic relationship between input genomes. If genomes are from multiple individuals of the same species, select 'Within-species'
Phylogenetic tree in Newick format. Required by Cactus to achieve linear scaling with number of input genomes
Input genomes
Input genome 0

What it does

Cactus is a reference-free whole-genome multiple alignment program. It can be used to progressively align a large number of genomes.


Usage

Between-species mode (Progressive Cactus)

If you are aligning genomes from multiple species, you need to provide a guide tree in Newick format. Cactus uses the guide tree to progressively align genomes, meaning that it doesn’t need to align all possible pairs of genomes.

A Newick-formatted tree for human, chimp and gorilla genomes looks like this:

(((human:0.006,chimp:0.006667):0.0022,gorilla:0.008825):0.0096,orang:0.01831);

The numbers are the branch lengths.

Within-species mode (Minigraph-Cactus)

You can also run Cactus in pangenome mode to align genomes of multiple individuals from the same species. In this mode you will not use a guide tree. Cactus will use minigraph to generate a graph of the input genomes and then use the graph to order the alignments. To use pangenome mode, select ‘Within-species’ in the ‘Alignment mode’ dropdown.

Unlike Between-species mode, Within-species mode depends on a predetermined reference genome.


Input

The developers recommend soft-masking your genomes with RepeatMasker before running Cactus. RepeatMasker is available on Galaxy.

If you’re using Between-species mode, you need to provide labels for the fasta files that match the leaves on the guide tree. In the example above, you would use the label ‘human’ for the human fasta file.


Output

The main output of Cactus is in HAL format. You can use the Cactus: export tool to convert the Cactus output to a VG or Multiple Alignment Format (MAF) file.