Mercurial > repos > artbio > small_rna_signatures

<tool id="overlapping_reads" name="Get overlapping reads" version="0.9.4">
    <description />
    <requirements>
        <requirement type="package" version="0.11.2.1=py27_0">pysam</requirement>
    </requirements>
    <stdio>
        <exit_code range="1:" level="fatal" description="Tool exception" />
    </stdio>
      <command detect_errors="exit_code"><![CDATA[
        samtools index '$input' &&
        python '$__tool_directory__'/overlapping_reads.py
           --input '$input'
           --minquery '$minquery'
           --maxquery '$maxquery'
           --mintarget '$mintarget'
           --maxtarget '$maxtarget'
           --overlap '$overlap'
           --output '$output'
    ]]></command>
    <inputs>
        <param format="bam" label="Compute signature from this bowtie standard output" name="input" type="data" />
        <param help="'23' = 23 nucleotides" label="Min size of query small RNAs" name="minquery" size="3" type="integer" value="23" />
        <param help="'29' = 29 nucleotides" label="Max size of query small RNAs" name="maxquery" size="3" type="integer" value="29" />
        <param help="'23' = 23 nucleotides" label="Min size of target small RNAs" name="mintarget" size="3" type="integer" value="23" />
        <param help="'29' = 29 nucleotides" label="Max size of target small RNAs" name="maxtarget" size="3" type="integer" value="29" />
        <param help="'10' = 10 nucleotides overlap" label="Overlap (in nt)" name="overlap" size="3" type="integer" value="10" />
    </inputs>
    <outputs>
        <data format="fasta" label="pairable reads" name="output" />
    </outputs>
    <tests>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="23" />
            <param name="maxquery" value="29" />
            <param name="mintarget" value="23" />
            <param name="maxtarget" value="29" />
            <param name="overlap" value="10" />
            <output file="paired.fa" ftype="fasta" name="output" />
        </test>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="20" />
            <param name="maxquery" value="22" />
            <param name="mintarget" value="23" />
            <param name="maxtarget" value="29" />
            <param name="overlap" value="10" />
            <output file="paired_2.fa" ftype="fasta" name="output" />
        </test>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="23" />
            <param name="maxquery" value="29" />
            <param name="mintarget" value="20" />
            <param name="maxtarget" value="22" />
            <param name="overlap" value="10" />
            <output file="paired_3.fa" ftype="fasta" name="output" />
        </test>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="20" />
            <param name="maxquery" value="22" />
            <param name="mintarget" value="20" />
            <param name="maxtarget" value="22" />
            <param name="overlap" value="10" />
            <output file="paired_4.fa" ftype="fasta" name="output" />
        </test>
    </tests>
    <help>

**What it does**

Extract reads with overlap signatures of the specified overlap (in nt) and
return a fasta file of these "pairable" reads.

See `Antoniewski (2014)`_ for background and details

.. _Antoniewski (2014): https://link.springer.com/protocol/10.1007%2F978-1-4939-0931-5_12

**Input**

*A **sorted** BAM alignment file.*

*Query and target sizes:*

The algorithm search for each *query* reads (of specified size) in the bam alignment if
there are *target* reads (of specified size) that align on the opposite strand with a 10 nt
overlap.

Searching query reads of 20-22 nt that overlap by 10 nt with target
reads of 23-29 nt is equivalent to searching query reads of 23-29 nt that overlap by 10 nt
with target reads of 20-22 nt. i.e, searching for siRNAs that pair with piRNAs is equivalent
to searching for siRNAs that pairs with piRNAs. In contrast, searching query reads of 20-22 nt
that overlap by 10 nt with target reads of 23-29 nt is different from searching query reads of
23-29 nt that overlap by 10 nt with target reads of 23-29 nt, since the number of "heterotypic"
pairs of reads is likely to be different from the number of "homotypic" pairs of reads.

*Overlap*
The number of nucleotides by which the pairs of sequences will overlap


**Outputs**

a fasta file of pairable reads such as :

>FBgn0000004_17.6|coord=5839|strand -|size=26|nreads=1

TTTTCGTCAATTGTGCCAAATAGGTA

>FBgn0000004_17.6|coord=5855|strand +|size=23|nreads=1

TTGACGAAAATGATCGAGTGGAT


where FBgn0000004_17.6 stands for the chromosome, 5839 stands for the 1-based read position,
'strand -' stands for lower strand of chromosome, 26 stands for the size of the sequence and
nreads=1 stands for the number of reads of the sequence in the dataset.

the second sequence in this example corresponds to 1 read that overlap by 10 nt with
1 read of the first sequence.

        </help>
    <citations>
            <citation type="doi">10.1007/978-1-4939-0931-5_12</citation>
    </citations>
</tool>
author	artbio
date	Sat, 09 Sep 2017 11:57:39 -0400
parents	20d28cfdeefe
children	4da23f009c9e