Galaxy | Tool Preview

Purge overlaps (version 1.2.6+galaxy0)
PBCSTAT options
PBCSTAT options 0
Calcuts options
Calcuts options 0
Histogram plot options
Histogram plot options 0

Purpose

The purge_dups tools are designed to remove haplotigs and contig overlaps in a de novo assembly based on read depth. purge_dups can significantly improve genome assemblies by removing overlaps and haplotigs caused by sequence divergence in heterozygous regions. This both removes false duplications in primary draft assemblies while retaining completeness and sequence integrity, and can improve scaffolding.


Pipeline Guide

Given a primary assembly, and an alternative assembly (optional, if you have one), follow the steps shown below to build your own purge_dups pipeline, steps with same number can be run simultaneously. Among all the steps, although step 5 is optional, we highly recommend our users to do so, because assemblers may produce overrepresented sequences. In such a case, the final step 5 can be applied to remove those seqeuences.

  • Step 1: Calculate the coverage cutoffs and base coverages.
  • Step 2: Split an assembly with the split_fasfa function and do a self-self alignment by using minimap2.
  • Step 3: Purge haplotigs and overlaps with the purge_dups function.
  • Step 4: Get purged primary and haplotig sequences from the draft assembly with the get_seqs function.
  • Step 5: Merge hap.fa file, generated in the previous step, and the alternate assembly, and redo the above steps to get a decent haplotig set.

Limitations


Purged assembly validation

There are many ways to validate the purged assembly. One way is to make a coverage plot for it, the 2nd way is to run BUSCO. A thid option is to use Merqury