Galaxy | Tool Preview

CheckM qa (version 1.2.5+galaxy0)
Each marker to exclude should be listed on a separate line of the file. Output of the CheckM lineage_set or taxon_set tools
Output of the CheckM analyze tool
Output of the CheckM analyze tool
Output of the CheckM analyze tool
It will ignore co-located set structure
Generated by the coverage command

What it does

CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. Assessment of genome quality can also be examined using plots depicting key genomic characteristics (e.g., GC, coding density) which highlight sequences outside the expected distributions of a typical genome. CheckM also provides tools for identifying genome bins that are likely candidates for merging based on marker set compatibility, similarity in genomic characteristics, and proximity within a reference genome tree.

This command identifies marker genes in bins and calculates genome statistics

Adjacent called genes matching the same marker gene may indicate a true duplication event, a gene calling error, or an assembly error. If adjacent genes hit distinct regions of the same marker gene HMM, CheckM assumes a gene calling error has occurred and concatenate the two genes. When this occurs, CheckM concatenates the gene ids of the two genes with a pair of ampersands (&&).

Outputs

Output in function of selection output format

  1. Summary of bin completeness, contamination, and strain heterogeneity
    Bin Id: bin identifier derived from input FASTA file Marker lineage: indicates lineage used for inferring marker set (a precise indication of where a bin was placed in CheckM's reference tree can be obtained with the tree_qa command) No. genomes: number of reference genomes used to infer marker set No. markers: number of inferred marker genes No. marker sets: number of inferred co-located marker sets 0-5+: number of times each marker gene is identified Completeness: estimated completeness Contamination: estimated contamination Strain heterogeneity: estimated strain heterogeneity
  2. Extended summary of bin quality (includes GC, genome size, coding density, ...)
  3. Summary of bin quality for increasingly basal lineage-specific marker sets
    Node Id: unique id of internal node in genome tree from which lineage-specific markers were inferred
  4. ist of marker genes for each bin along with the number of times each marker was identified
    Node Id: unique id of internal node in genome tree from which lineage-specific markers were inferred Marker lineage: indicates lineage used for inferring marker set Useful for identifying lineage-specific gene loss or duplication
  5. List of bin id, marker gene id, and called gene id for each identified marker gene
  6. List of marker genes present multiple times in a bin
  7. List of marker genes present multiple times on the same scaffold
    Useful for identifying true gene duplication events, gene calling errors, or assembly errors. See note below.
  8. List indicating the position of each marker genes within a bin
  9. Marker genes identified in each bin and their sequence