Galaxy | Tool Preview

Du Novo: Make families (version 0.6)
length of each random barcode on the ends of the fragments
length of the sequence between the tag and actual sample sequence (the restriction site, normally)

What it does

This tool is for processing raw duplex sequencing data, removing the barcodes and grouping by them into families of reads from the same fragment.


Output

The output will be a tabular file where each line corresponds to a pair of input reads.

The columns are:

1: barcode (both tags joined and ordered)
2: tag order in barcode ("ab" or "ba")
3: read1 name
4: read1 sequence (minus the tag and invariant sequences)
5: read1 quality scores (minus the same tag and invariant)
6: read2 name
7: read2 sequence (minus the tag and invariant sequences)
8: read2 quality scores (minus the same tag and invariant)

Barcode creation

For each pair, the tool will remove the tag at the beginning of each read and create a barcode by concatenating the two tags. The order of the tags is determined by a string comparison so that it will make an identical barcode from pairs of either order. The original tag order will be noted in the second column.

Since pairs from opposite strands will have the same tags, but in the reverse order, this produces the same barcode for reads from the same fragment, regardless of strand. Then a simple sort will group all reads from the same strand together, separated into strands by the different "order" values.

Examples:

+---------------+-----------------+
|  input tags   |     output      |
+-------+-------+-------+---------+
| read1 | read2 | order | barcode |
+-------+-------+-------+---------+
|  ATG  |  CCT  |  ab   | ATGCCT  |
+-------+-------+-------+---------+
|  CCT  |  ATG  |  ba   | ATGCCT  |
+-------+-------+-------+---------+