diff README.rst @ 0:f33e9e6a6c88 draft default tip

planemo upload for repository https://github.com/mvdbeek/dedup_hash commit 367da560c5924d56c39f91ef9c731e523825424b-dirty
author mvdbeek
date Wed, 23 Nov 2016 07:49:05 -0500
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst	Wed Nov 23 07:49:05 2016 -0500
@@ -0,0 +1,26 @@
+.. image:: https://travis-ci.org/mvdbeek/dedup_hash.svg?branch=master
+    :target: https://travis-ci.org/mvdbeek/dedup_hash
+
+dedup_hash
+----------------------------
+
+
+This is a commandline utility to remove exact duplicate reads
+from paired-end fastq files. Reads are assumed to be in 2 separate
+files. Read sequences are then concatenated and a short hash is calculated on
+the concatenated sequence. If the hash has been previsouly seen the read will
+be dropped from the output file.  This means that reads that have the same
+start and end coordinate, but differ in lengths will not be removed (but those
+will be "flattened" to at most 1 occurence).
+
+This algorithm is very simple and fast, and saves memory as compared to
+reading the whole fastq file into memory, such as fastuniq does.
+
+Installation
+------------
+
+depdup_city relies on the cityhash python package,
+which supports python-2.7 exclusively.
+
+``pip install dedup_hash``
+