Galaxy | Tool Preview

CheckM lineage_wf (version 1.2.3+galaxy0)
Bin placement in the genome tree and marker gene identifications
Bin placement in the genome tree and marker gene identification 0
Bin lineage-specific marker set inferences
Bin lineage-specific marker set inference 0
Bin assessments
Bin assessment 0

What it does

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. Assessment of genome quality can also be examined using plots depicting key genomic characteristics (e.g., GC, coding density) which highlight sequences outside the expected distributions of a typical genome. CheckM also provides tools for identifying genome bins that are likely candidates for merging based on marker set compatibility, similarity in genomic characteristics, and proximity within a reference genome tree.

This command runs the recommended workflow for assessing the completeness and contamination of genome bins is to use lineage-specific marker sets. This workflow consists of 4 mandatory (M) steps and 1 recommended (R) step:

    1. The tree command places genome bins into a reference genome tree
    1. The tree_qa command indicates the number of phylogenetically informative marker genes found in each genome bin along with a taxonomic string indicating its approximate placement in the tree.

    If desired, genome bins with few phylogenetically marker genes may be removed in order to reduce the computational requirements of the following commands. Alternatively, if only genomes from a particular taxonomic group are of interest these can be moved to a new directory and analyzed separately.

    1. The lineage_set command creates a marker file indicating lineage-specific marker sets suitable for evaluating each genome.
    1. The analyze command identifies marker genes and estimates the completeness and contamination of each genome bin.
    1. The qa command can be used to produce different tables summarizing the quality of each genome bin.