Galaxy |

Repository dedup_hash

Name: dedup_hash

Owner: mvdbeek

Synopsis: This is a commandline utility to remove exact duplicate reads from paired-end fastq files.

Detailed description:

This is a commandline utility to remove exact duplicate reads from
paired-end fastq files. Reads are assumed to be in 2 separate files. Read
sequences are then concatenated and a short hash is calculated on the
concatenated sequence. If the hash has been previsouly seen the read will
be dropped from the output file. This means that reads that have the same
start and end coordinate, but differ in lengths will not be removed (but
those will be "flattened" to at most 1 occurence). This algorithm is
very simple and fast, and saves memory as compared to reading the whole
fastq file into memory, such as fastuniq does.

Content homepage: https://github.com/mvdbeek/dedup_hash

Development repository: https://github.com/mvdbeek/dedup_hash

Link to this repository: https://toolshed.g2.bx.psu.edu/view/mvdbeek/dedup_hash/f33e9e6a6c88

Clone this repository: hg clone https://toolshed.g2.bx.psu.edu/repos/mvdbeek/dedup_hash

Type: unrestricted

Revision: 0:f33e9e6a6c88

This revision can be installed: True

Times cloned / installed: 381

Contents of this repository

Valid tools - click the name to preview the tool and use the pop-up menu to inspect all metadata
Name	Description	Version	Minimum Galaxy Version
Deduplicate FASTQ files View tool metadata	with fast and memory-efficient sequence hashes	0.1.1	any

Valid tools - click the name to preview the tool and use the pop-up menu to inspect all metadata

Name

Description

Version

Minimum Galaxy Version

View tool metadata

with fast and memory-efficient sequence hashes

0.1.1

any