Galaxy | Tool Preview

Smudgeplot (version 0.2.5+galaxy3)
File inputs
File inputs 0
Assume no heterozygosity in the genome - plotting a paralog structure.

What it does

This tool extracts heterozygous kmer pairs from kmer count databases and performs gymnastics with them. We are able to disentangle genome structure by comparing the sum of kmer pair coverages (CovA + CovB) to their relative coverage (CovB / (CovA + CovB)). Such an approach also allows us to analyze obscure genomes with duplications, various ploidy levels, etc.

Smudgeplots are computed from raw or even better from trimmed reads and show the haplotype structure using heterozygous kmer pairs. For example:

Example smudgeplot graph

Every haplotype structure has a unique smudge on the graph and the heat of the smudge indicates how frequently the haplotype structure is represented in the genome compared to the other structures. The image above is an ideal case, where the sequencing coverage is sufficient to beautifully separate all the smudges, providing very strong and clear evidence of triploidy.

Please see Smudgeplot on GitHub for further documentation and tutorials.

Inputs

You have two choices when running Smudgeplot in Galaxy:

  1. Input reads file(s) for default kmer-counting with Jellyfish

This should be at least one file which providing coverage of your genome of interest. The tool accepts compressed (.gz) inputs. If choosing this option, you can (optionally) specify manual cutoff values for the kmer dump step. The Smudgeplot docs suggest that you can use GenomeScope on a kmer histogram in order to choose reasonable lower and upper cutoff values.

  1. Input your own kmer dump file for more control of kmer counting parameters

This file would be created by running jellyfish count and then jellyfish dump - the process is well described on GitHub.

Outputs

Default operation

If choosing reads as the input, a default kmer counting procedure will be used to create a kmer dump. This default process is summarized as follows:

The kmer dump file is then used to create a smudgeplot: