Mercurial > repos > malex > bayesase

<tool id="base_sam2bed" name="SAM to BED"  version="21.1.13">
    <description>converts a SAM file to a BED file for use in BayesASE</description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements"/>
    <command><![CDATA[
        awk
            -v readLen=`awk '{if(NR%4==2 && length) {count++; bases += length}} END {print bases/count}' ${fastq} | awk '{printf "%.0f", $1}'`
            -v OFS='	'
            '{print $3,$4,$4+readLen}'
            '$SAMFILE'
        > '$AWKTMP'
]]>
    </command>
    <inputs>
        <param name="fastq" format="fastq" type="data" label="Select FASTQ files" help= "FASTQ files will be used to calculate average read length." />
        <param name="SAMFILE" format="sam" type="data" label="SAM files to convert" help="Input SAM files to be converted into BED files."  />
    </inputs>
    <outputs>
        <data name="AWKTMP" format="bed" label="${tool.name} on ${on_string}: SAM2BED" />
    </outputs>
    <tests>
        <test>
            <param name="fastq" value="align_and_counts_test_data/W55_M_1_1.fastq" ftype="fastq"/>
            <param name="SAMFILE" value="align_and_counts_test_data/W1118_G1_unique_sam_for_BASE.sam" ftype="sam"/>
            <output name="AWKTMP" file="align_and_counts_test_data/W1118_SAM2BED_for_BASE.bed" ftype="bed"/>
        </test>
    </tests>
    <help><![CDATA[
**Tool Description**

The SAM to BED tool converts a SAM file containing uniquely mapping reads from the BWASplitSAM step into a 3-column BED file containing the start and end positions of every uniquely mapping read. This BED file is required for entry into step 6, BEDTools Intersect Intervals, to identify uniquely mapping reads that overalp with genic features of interest.

The tool calculates the average read length from the input FASTQ file and creates a resulting BED file where “chrom” is the chrom field from the SAM file, “start” is the POS field from the SAM file and “end” is the POS field plus the average read length.

**Inputs**

    Two input datasets are required:

**FASTQ File**
    -The original input FASTQ file containing F1 reads. The tool will use this file to calculate the average read length.

**Unique SAM File**
    -The SAM file containing uniquely mapping reads.  This file can be created by the **BWASplitSAM** tool.


**Output**

The tool outputs a 3-column BED file containing the following fields for each uniquely mapping read in the SAM file:
    (1) Chrom: chromosome
    (2) Start: the 5 prime start position of a read
    (3) End: the 5 prime start position + the read length


]]></help>
    <citations>
            <citation type="bibtex">@ARTICLE{Miller20BASE,
            author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
            title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
            journal = {????},
            year = {submitted for publication}
            }</citation>
        </citations>
</tool>
author	malex
date	Thu, 14 Jan 2021 21:51:36 +0000
parents
children