Mercurial > repos > nilesh > rseqc

<tool id="rseqc_infer_experiment" name="Infer Experiment" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@GALAXY_VERSION@">
    <description>speculates how RNA-seq were configured</description>
    <macros>
        <import>rseqc_macros.xml</import>
    </macros>
    <expand macro="bio_tools"/>
    <expand macro="requirements"/>
    <expand macro="stdio"/>
    <version_command><![CDATA[infer_experiment.py --version]]></version_command>
    <command><![CDATA[
        @BAM_SAM_INPUTS@
        infer_experiment.py -i 'input.${extension}' -r '${refgene}'
            --sample-size ${sample_size}
            --mapq ${mapq}
            > '${output}'
            ]]>
    </command>
    <inputs>
        <expand macro="bam_param"/>
        <expand macro="refgene_param"/>
        <expand macro="sample_size_param"/>
        <expand macro="mapq_param"/>
    </inputs>
    <outputs>
        <data format="txt" name="output" label="${tool.name} on ${on_string}: RNA-seq experiment configuration"/>
    </outputs>
    <tests>
        <test expect_num_outputs="1">
            <param name="input" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/>
            <param name="refgene" value="hg19_RefSeq_chr1_1-100000.bed" ftype="bed12"/>
            <output name="output" file="output.infer_experiment.txt"/>
        </test>
    </tests>
    <help><![CDATA[
infer_experiment.py
+++++++++++++++++++

This program is used to speculate how RNA-seq sequencing were configured, especially how
reads were stranded for strand-specific RNA-seq data, through comparing reads' mapping
information to the underneath gene model.


Inputs
++++++++++++++

Input BAM/SAM file
    Alignment file in BAM/SAM format.

Reference gene model
    Gene model in BED format.

Number of usable sampled reads (default=200000)
    Number of usable reads sampled from SAM/BAM file. More reads will give more accurate estimation, but make program little slower.

Outputs
+++++++

For pair-end RNA-seq, there are two different
ways to strand reads (such as Illumina ScriptSeq protocol):

1. 1++,1--,2+-,2-+

* read1 mapped to '+' strand indicates parental gene on '+' strand
* read1 mapped to '-' strand indicates parental gene on '-' strand
* read2 mapped to '+' strand indicates parental gene on '-' strand
* read2 mapped to '-' strand indicates parental gene on '+' strand

2. 1+-,1-+,2++,2--

* read1 mapped to '+' strand indicates parental gene on '-' strand
* read1 mapped to '-' strand indicates parental gene on '+' strand
* read2 mapped to '+' strand indicates parental gene on '+' strand
* read2 mapped to '-' strand indicates parental gene on '-' strand

For single-end RNA-seq, there are also two different ways to strand reads:

1. ++,--

* read mapped to '+' strand indicates parental gene on '+' strand
* read mapped to '-' strand indicates parental gene on '-' strand

2. +-,-+

* read mapped to '+' strand indicates parental gene on '-' strand
* read mapped to '-' strand indicates parental gene on '+' strand


Example Output
++++++++++++++

**Example1** ::

    =========================================================
    This is PairEnd Data ::

    Fraction of reads explained by "1++,1--,2+-,2-+": 0.4992
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.5008
    Fraction of reads explained by other combinations: 0.0000
    =========================================================

*Conclusion*: We can infer that this is NOT a strand specific because 50% of reads can be explained by "1++,1--,2+-,2-+", while the other 50% can be explained by "1+-,1-+,2++,2--".

**Example2** ::

    ============================================================
    This is PairEnd Data

    Fraction of reads explained by "1++,1--,2+-,2-+": 0.9644 ::
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.0356
    Fraction of reads explained by other combinations: 0.0000
    ============================================================

*Conclusion*: We can infer that this is a strand-specific RNA-seq data. strandness of read1 is consistent with that of gene model, while strandness of read2 is opposite to the strand of reference gene model.

**Example3** ::

    =========================================================
    This is SingleEnd Data ::

    Fraction of reads explained by "++,--": 0.9840 ::
    Fraction of reads explained by "+-,-+": 0.0160
    Fraction of reads explained by other combinations: 0.0000
    =========================================================

*Conclusion*: This is single-end, strand specific RNA-seq data. Strandness of reads are concordant with strandness of reference gene.

@ABOUT@

]]>
    </help>
    <expand macro="citations"/>
</tool>
author	iuc
date	Tue, 09 Apr 2024 11:24:55 +0000
parents	5968573462fa
children