# HG changeset patch
# User nick
# Date 1448334381 18000
# Node ID d2e46adc199ec81772bb0de575039a903eaaf2bb
planemo upload commit 35b743e6492923c0e2b1e5e434eaf4e56d268108
diff -r 000000000000 -r d2e46adc199e align_families.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/align_families.xml Mon Nov 23 22:06:21 2015 -0500
@@ -0,0 +1,64 @@
+
+
+ from duplex sequencing data
+
+ mafft
+ duplex
+ DUPLEX_DIR
+
+ python \$DUPLEX_DIR/align_families.py $input > $output
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+This is for processing duplex sequencing data. It does a multiple sequence alignment on each (single-stranded) family of reads.
+
+-----
+
+**Input**
+
+This expects the output format of the "Make families" tool.
+
+-----
+
+**Output**
+
+The output is a tabular file where each line corresponds to a (single) read.
+
+The columns are::
+
+ 1: barcode (both tags)
+ 2: tag order in barcode ("ab" or "ba")
+ 3: read mate ("1" or "2")
+ 4: read name
+ 5: read sequence, aligned ("-" for gaps)
+ 6: read quality scores, aligned (" " for gaps)
+
+-----
+
+**Alignments**
+
+The alignments are done using MAFFT, specifically the command
+::
+
+ $ mafft --nuc --quiet family.fa > family.aligned.fa
+
+
+
diff -r 000000000000 -r d2e46adc199e duplex.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/duplex.xml Mon Nov 23 22:06:21 2015 -0500
@@ -0,0 +1,61 @@
+
+
+ from duplex sequencing data
+
+ duplex
+ DUPLEX_DIR
+
+ duplex.fa
+ && awk -f \$DUPLEX_DIR/utils/outconv.awk -v target=1 duplex.fa > $output1
+ && awk -f \$DUPLEX_DIR/utils/outconv.awk -v target=2 duplex.fa > $output2
+ ]]>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ keep_sscs
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+This is for processing duplex sequencing data. It creates single-strand and duplex consensus reads from aligned read families.
+
+-----
+
+**Input**
+
+This expects the output format of the "Align families" tool.
+
+-----
+
+**Output**
+
+This will output final, duplex consensus reads in two FASTA files (first and second reads in the pairs). Optionally, you can save the single-strand reads too, in a separate FASTA file.
+
+
+
diff -r 000000000000 -r d2e46adc199e make_families.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/make_families.xml Mon Nov 23 22:06:21 2015 -0500
@@ -0,0 +1,83 @@
+
+
+ from duplex sequencing data
+
+ duplex
+ DUPLEX_DIR
+
+ paste $fastq1 $fastq2
+ | paste - - - -
+ | awk -f \$DUPLEX_DIR/make-barcodes.awk -v TAG_LEN=$taglen -v INVARIANT=$invariant
+ | sort
+ > $output
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+This tool is for processing raw duplex sequencing data, removing the barcodes and grouping by them into families of reads from the same fragment.
+
+-----
+
+**Output**
+
+The output will be a tabular file where each line corresponds to a pair of input reads.
+
+The columns are::
+
+ 1: barcode (both tags joined and ordered)
+ 2: tag order in barcode ("ab" or "ba")
+ 3: read1 name
+ 4: read1 sequence (minus the tag and invariant sequences)
+ 5: read1 quality scores (minus the same tag and invariant)
+ 6: read2 name
+ 7: read2 sequence (minus the tag and invariant sequences)
+ 8: read2 quality scores (minus the same tag and invariant)
+
+-----
+
+**Barcode creation**
+
+For each pair, the tool will remove the tag at the beginning of each read and create a barcode by concatenating the two tags. The order of the tags is determined by a string comparison so that it will make an identical barcode from pairs of either order. The original tag order will be noted in the second column.
+
+Since pairs from opposite strands will have the same tags, but in the reverse order, this produces the same barcode for reads from the same fragment, regardless of strand. Then a simple sort will group all reads from the same strand together, separated into strands by the different "order" values.
+
+Examples::
+
+ +---------------+-----------------+
+ | input tags | output |
+ +-------+-------+-------+---------+
+ | read1 | read2 | order | barcode |
+ +-------+-------+-------+---------+
+ | ATG | CCT | ab | ATGCCT |
+ +-------+-------+-------+---------+
+ | CCT | ATG | ba | ATGCCT |
+ +-------+-------+-------+---------+
+
+
+
diff -r 000000000000 -r d2e46adc199e tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_dependencies.xml Mon Nov 23 22:06:21 2015 -0500
@@ -0,0 +1,22 @@
+
+
+
+
+
+
+
+
+ https://github.com/makrutenko/duplex/archive/master.tar.gz
+ make
+
+ .
+ $INSTALL_DIR
+
+
+ $INSTALL_DIR
+ $INSTALL_DIR
+
+
+
+
+