Mercurial > repos > artbio > small_rna_signatures
view overlapping_reads.xml @ 6:4da23f009c9e draft
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/small_rna_signatures commit 6c727f4b2288c9b2517b28addf1eed6409d682a4
author | artbio |
---|---|
date | Sun, 10 Sep 2017 10:27:19 -0400 |
parents | a7fd04208764 |
children | 07771982ef9b |
line wrap: on
line source
<tool id="overlapping_reads" name="Get overlapping reads" version="0.9.5"> <description /> <requirements> <requirement type="package" version="0.11.2.1=py27_0">pysam</requirement> </requirements> <stdio> <exit_code range="1:" level="fatal" description="Tool exception" /> </stdio> <command detect_errors="exit_code"><![CDATA[ samtools index '$input' && python '$__tool_directory__'/overlapping_reads.py --input '$input' --minquery '$minquery' --maxquery '$maxquery' --mintarget '$mintarget' --maxtarget '$maxtarget' --overlap '$overlap' --output '$output' ]]></command> <inputs> <param format="bam" label="Compute signature from this bowtie standard output" name="input" type="data" /> <param help="'23' = 23 nucleotides" label="Min size of query small RNAs" name="minquery" size="3" type="integer" value="23" /> <param help="'29' = 29 nucleotides" label="Max size of query small RNAs" name="maxquery" size="3" type="integer" value="29" /> <param help="'23' = 23 nucleotides" label="Min size of target small RNAs" name="mintarget" size="3" type="integer" value="23" /> <param help="'29' = 29 nucleotides" label="Max size of target small RNAs" name="maxtarget" size="3" type="integer" value="29" /> <param help="'10' = 10 nucleotides overlap" label="Overlap (in nt)" name="overlap" size="3" type="integer" value="10" /> </inputs> <outputs> <data format="fasta" label="pairable reads" name="output" /> </outputs> <tests> <test> <param ftype="bam" name="input" value="sr_bowtie.bam" /> <param name="minquery" value="23" /> <param name="maxquery" value="29" /> <param name="mintarget" value="23" /> <param name="maxtarget" value="29" /> <param name="overlap" value="10" /> <output file="paired.fa" ftype="fasta" name="output" /> </test> <test> <param ftype="bam" name="input" value="sr_bowtie.bam" /> <param name="minquery" value="20" /> <param name="maxquery" value="22" /> <param name="mintarget" value="23" /> <param name="maxtarget" value="29" /> <param name="overlap" value="10" /> <output file="paired_2.fa" ftype="fasta" name="output" /> </test> <test> <param ftype="bam" name="input" value="sr_bowtie.bam" /> <param name="minquery" value="23" /> <param name="maxquery" value="29" /> <param name="mintarget" value="20" /> <param name="maxtarget" value="22" /> <param name="overlap" value="10" /> <output file="paired_3.fa" ftype="fasta" name="output" /> </test> <test> <param ftype="bam" name="input" value="sr_bowtie.bam" /> <param name="minquery" value="20" /> <param name="maxquery" value="22" /> <param name="mintarget" value="20" /> <param name="maxtarget" value="22" /> <param name="overlap" value="10" /> <output file="paired_4.fa" ftype="fasta" name="output" /> </test> </tests> <help> **What it does** Extract reads with overlap signatures of the specified overlap (in nt) and return a fasta file of these "pairable" reads. See `Antoniewski (2014)`_ for background and details .. _Antoniewski (2014): https://link.springer.com/protocol/10.1007%2F978-1-4939-0931-5_12 **Input** *A **sorted** BAM alignment file.* *Query and target sizes:* The algorithm search for each *query* reads (of specified size) in the bam alignment if there are *target* reads (of specified size) that align on the opposite strand with a 10 nt overlap. Searching query reads of 20-22 nt that overlap by 10 nt with target reads of 23-29 nt is equivalent to searching query reads of 23-29 nt that overlap by 10 nt with target reads of 20-22 nt. i.e, searching for siRNAs that pair with piRNAs is equivalent to searching for siRNAs that pairs with piRNAs. In contrast, searching query reads of 20-22 nt that overlap by 10 nt with target reads of 23-29 nt is different from searching query reads of 23-29 nt that overlap by 10 nt with target reads of 23-29 nt, since the number of "heterotypic" pairs of reads is likely to be different from the number of "homotypic" pairs of reads. *Overlap* The number of nucleotides by which the pairs of sequences will overlap **Outputs** a fasta file of pairable reads such as : >FBgn0000004_17.6|coord=5839|strand -|size=26|nreads=1 TTTTCGTCAATTGTGCCAAATAGGTA >FBgn0000004_17.6|coord=5855|strand +|size=23|nreads=1 TTGACGAAAATGATCGAGTGGAT where FBgn0000004_17.6 stands for the chromosome, 5839 stands for the 1-based read position, 'strand -' stands for lower strand of chromosome, 26 stands for the size of the sequence and nreads=1 stands for the number of reads of the sequence in the dataset. the second sequence in this example corresponds to 1 read that overlap by 10 nt with 1 read of the first sequence. </help> <citations> <citation type="doi">10.1007/978-1-4939-0931-5_12</citation> </citations> </tool>