Repository dedup_hash
Name: dedup_hash
Owner: mvdbeek
Synopsis: This is a commandline utility to remove exact duplicate reads from paired-end fastq files.
This is a commandline utility to remove exact duplicate reads from
paired-end fastq files. Reads are assumed to be in 2 separate files. Read
sequences are then concatenated and a short hash is calculated on the
concatenated sequence. If the hash has been previsouly seen the read will
be dropped from the output file. This means that reads that have the same
start and end coordinate, but differ in lengths will not be removed (but
those will be "flattened" to at most 1 occurence).  This algorithm is
very simple and fast, and saves memory as compared to reading the whole
fastq file into memory, such as fastuniq does.
Development repository: https://github.com/mvdbeek/dedup_hash
Type: unrestricted
Revision: 0:f33e9e6a6c88
This revision can be installed: True
Times cloned / installed: 333

Contents of this repository

Name Description Version Minimum Galaxy Version
with fast and memory-efficient sequence hashes 0.1.1 any

Categories
Fastq Manipulation - Tools for manipulating fastq data