comparison infer_experiment.xml @ 43:378d05d35705 draft

Uploaded
author lparsons
date Wed, 23 Jul 2014 10:58:28 -0400
parents
children
comparison
equal deleted inserted replaced
42:6dceb276bb35 43:378d05d35705
1 <tool id="rseqc_infer_experiment" name="Infer Experiment" version="2.3.9">
2 <description>speculates how RNA-seq were configured</description>
3 <requirements>
4 <requirement type="package" version="1.7.1">numpy</requirement>
5 <requirement type="package" version="2.3.9">rseqc</requirement>
6 </requirements>
7 <command>
8 infer_experiment.py -i $input -r $refgene
9 #if $sample_size.boolean
10 -s $sample_size.size
11 #end if
12
13 > $output
14 </command>
15 <stdio>
16 <exit_code range="1:" level="fatal" description="An error occured during execution, see stderr and stdout for more information" />
17 <regex match="[Ee]rror" source="both" description="An error occured during execution, see stderr and stdout for more information" />
18 </stdio>
19 <inputs>
20 <param name="input" type="data" format="bam,sam" label="Input BAM/SAM file" />
21 <param name="refgene" type="data" format="bed" label="Reference gene model in bed format" />
22 <conditional name="sample_size">
23 <param name="boolean" type="boolean" label="Modify usable sampled reads" value="false" />
24 <when value="true">
25 <param name="size" type="integer" label="Number of usable sampled reads (default = 200000)" value="200000" />
26 </when>
27 </conditional>
28 </inputs>
29 <outputs>
30 <data format="txt" name="output" />
31 </outputs>
32 <help>
33 infer_experiment.py
34 +++++++++++++++++++
35
36 This program is used to speculate how RNA-seq sequencing were configured, especially how
37 reads were stranded for strand-specific RNA-seq data, through comparing reads' mapping
38 information to the underneath gene model.
39
40
41 Inputs
42 ++++++++++++++
43
44 Input BAM/SAM file
45 Alignment file in BAM/SAM format.
46
47 Reference gene model
48 Gene model in BED format.
49
50 Number of usable sampled reads (default=200000)
51 Number of usable reads sampled from SAM/BAM file. More reads will give more accurate estimation, but make program little slower.
52
53 Outputs
54 +++++++
55
56 For pair-end RNA-seq, there are two different
57 ways to strand reads (such as Illumina ScriptSeq protocol):
58
59 1. 1++,1--,2+-,2-+
60
61 * read1 mapped to '+' strand indicates parental gene on '+' strand
62 * read1 mapped to '-' strand indicates parental gene on '-' strand
63 * read2 mapped to '+' strand indicates parental gene on '-' strand
64 * read2 mapped to '-' strand indicates parental gene on '+' strand
65
66 2. 1+-,1-+,2++,2--
67
68 * read1 mapped to '+' strand indicates parental gene on '-' strand
69 * read1 mapped to '-' strand indicates parental gene on '+' strand
70 * read2 mapped to '+' strand indicates parental gene on '+' strand
71 * read2 mapped to '-' strand indicates parental gene on '-' strand
72
73 For single-end RNA-seq, there are also two different ways to strand reads:
74
75 1. ++,--
76
77 * read mapped to '+' strand indicates parental gene on '+' strand
78 * read mapped to '-' strand indicates parental gene on '-' strand
79
80 2. +-,-+
81
82 * read mapped to '+' strand indicates parental gene on '-' strand
83 * read mapped to '-' strand indicates parental gene on '+' strand
84
85
86 Example Output
87 ++++++++++++++
88
89 **Example1** ::
90
91 =========================================================
92 This is PairEnd Data ::
93
94 Fraction of reads explained by "1++,1--,2+-,2-+": 0.4992
95 Fraction of reads explained by "1+-,1-+,2++,2--": 0.5008
96 Fraction of reads explained by other combinations: 0.0000
97 =========================================================
98
99 *Conclusion*: We can infer that this is NOT a strand specific because 50% of reads can be explained by "1++,1--,2+-,2-+", while the other 50% can be explained by "1+-,1-+,2++,2--".
100
101 **Example2** ::
102
103 ============================================================
104 This is PairEnd Data
105
106 Fraction of reads explained by "1++,1--,2+-,2-+": 0.9644 ::
107 Fraction of reads explained by "1+-,1-+,2++,2--": 0.0356
108 Fraction of reads explained by other combinations: 0.0000
109 ============================================================
110
111 *Conclusion*: We can infer that this is a strand-specific RNA-seq data. strandness of read1 is consistent with that of gene model, while strandness of read2 is opposite to the strand of reference gene model.
112
113 **Example3** ::
114
115 =========================================================
116 This is SingleEnd Data ::
117
118 Fraction of reads explained by "++,--": 0.9840 ::
119 Fraction of reads explained by "+-,-+": 0.0160
120 Fraction of reads explained by other combinations: 0.0000
121 =========================================================
122
123 *Conclusion*: This is single-end, strand specific RNA-seq data. Strandness of reads are concordant with strandness of reference gene.
124
125
126 -----
127
128 About RSeQC
129 +++++++++++
130
131 The RSeQC_ package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. "Basic modules" quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while "RNA-seq specific modules" investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation.
132
133 The RSeQC package is licensed under the GNU GPL v3 license.
134
135 .. image:: http://rseqc.sourceforge.net/_static/logo.png
136
137 .. _RSeQC: http://rseqc.sourceforge.net/
138
139
140 </help>
141 </tool>