Mercurial > repos > nick > duplex

<?xml version="1.0"?>
<tool id="make_families" name="Du Novo: Make families" version="0.6">
  <description>of duplex sequencing reads</description>
  <requirements>
    <requirement type="package" version="0.6">duplex</requirement>
    <requirement type="set_environment">DUPLEX_DIR</requirement>
  </requirements>
  <!-- TODO: Add dependency on coreutils to get paste? -->
  <command>paste '$fastq1' '$fastq2'
    | paste - - - -
    | awk -f "\$DUPLEX_DIR/make-barcodes.awk" -v TAG_LEN=$taglen -v INVARIANT=$invariant
    | sort
    &gt; '$output'
  </command>
  <inputs>
    <param name="fastq1" type="data" format="fastq" label="Sequencing reads, mate 1"/>
    <param name="fastq2" type="data" format="fastq" label="Sequencing reads, mate 2"/>
    <param name="taglen" type="integer" value="12" min="0" label="Tag length" help="length of each random barcode on the ends of the fragments"/>
    <param name="invariant" type="integer" value="5" min="0" label="Invariant sequence length" help="length of the sequence between the tag and actual sample sequence (the restriction site, normally)"/>
  </inputs>
  <outputs>
    <data name="output" format="tabular"/>
  </outputs>
  <tests>
    <test>
      <param name="fastq1" value="smoke_1.fq"/>
      <param name="fastq2" value="smoke_2.fq"/>
      <param name="taglen" value="5"/>
      <param name="invariant" value="1"/>
      <output name="output" file="smoke.families.tsv"/>
    </test>
    <test>
      <param name="fastq1" value="smoke_1.fq"/>
      <param name="fastq2" value="smoke_2.fq"/>
      <param name="taglen" value="5"/>
      <param name="invariant" value="0"/>
      <output name="output" file="smoke.families.i0.tsv"/>
    </test>
  </tests>
  <citations>
    <citation type="bibtex">@article{Stoler2016,
      author = {Stoler, Nicholas and Arbeithuber, Barbara and Guiblet, Wilfried and Makova, Kateryna D and Nekrutenko, Anton},
      doi = {10.1186/s13059-016-1039-4},
      issn = {1474-760X},
      journal = {Genome biology},
      number = {1},
      pages = {180},
      pmid = {27566673},
      publisher = {Genome Biology},
      title = {{Streamlined analysis of duplex sequencing data with Du Novo.}},
      url = {http://www.ncbi.nlm.nih.gov/pubmed/27566673},
      volume = {17},
      year = {2016}
    }</citation>
  </citations>
  <help>

**What it does**

This tool is for processing raw duplex sequencing data, removing the barcodes and grouping by them into families of reads from the same fragment.

-----

**Output**

The output will be a tabular file where each line corresponds to a pair of input reads.

The columns are::

  1: barcode (both tags joined and ordered)
  2: tag order in barcode ("ab" or "ba")
  3: read1 name
  4: read1 sequence (minus the tag and invariant sequences)
  5: read1 quality scores (minus the same tag and invariant)
  6: read2 name
  7: read2 sequence (minus the tag and invariant sequences)
  8: read2 quality scores (minus the same tag and invariant)

-----

**Barcode creation**

For each pair, the tool will remove the tag at the beginning of each read and create a barcode by concatenating the two tags. The order of the tags is determined by a string comparison so that it will make an identical barcode from pairs of either order. The original tag order will be noted in the second column.

Since pairs from opposite strands will have the same tags, but in the reverse order, this produces the same barcode for reads from the same fragment, regardless of strand. Then a simple sort will group all reads from the same strand together, separated into strands by the different "order" values.

Examples::

  +---------------+-----------------+
  |  input tags   |     output      |
  +-------+-------+-------+---------+
  | read1 | read2 | order | barcode |
  +-------+-------+-------+---------+
  |  ATG  |  CCT  |  ab   | ATGCCT  |
  +-------+-------+-------+---------+
  |  CCT  |  ATG  |  ba   | ATGCCT  |
  +-------+-------+-------+---------+

    </help>
</tool>
author	nick
date	Mon, 06 Feb 2017 23:39:11 -0500
parents	4bc49a5769ee
children