comparison read_distribution.xml @ 31:cc5eaa9376d8

Lance's updates
author nilesh
date Wed, 02 Oct 2013 02:20:04 -0400
parents 6a354a3248b6
children 580ee0c4bc4e
comparison
equal deleted inserted replaced
30:b5d2f575ccb6 31:cc5eaa9376d8
1 <tool id="read_distribution" name="Read Distribution"> 1 <tool id="read_distribution" name="Read Distribution" version="1.1">
2 <description>calculates how mapped reads were distributed over genome feature</description> 2 <description>calculates how mapped reads were distributed over genome feature</description>
3 <requirements> 3 <requirements>
4 <requirement type="package" version="1.7.1">numpy</requirement>
4 <requirement type="package" version="2.3.7">rseqc</requirement> 5 <requirement type="package" version="2.3.7">rseqc</requirement>
5 </requirements> 6 </requirements>
6 <command interpreter="python"> read_distribution.py -i $input -r $refgene > $output 7 <command> read_distribution.py -i $input -r $refgene > $output
7 </command> 8 </command>
8 <inputs> 9 <inputs>
9 <param name="input" type="data" format="bam,sam" label="input bam/sam file" /> 10 <param name="input" type="data" format="bam,sam" label="input bam/sam file" />
10 <param name="refgene" type="data" format="bed" label="reference gene model" /> 11 <param name="refgene" type="data" format="bed" label="reference gene model" />
11 </inputs> 12 </inputs>
12 <outputs> 13 <outputs>
13 <data format="txt" name="output" /> 14 <data format="txt" name="output" />
14 </outputs> 15 </outputs>
16 <stdio>
17 <exit_code range="1:" level="fatal" description="An error occured during execution, see stderr and stdout for more information" />
18 <regex match="[Ee]rror" source="both" description="An error occured during execution, see stderr and stdout for more information" />
19 </stdio>
15 <help> 20 <help>
16 .. image:: https://code.google.com/p/rseqc/logo?cct=1336721062 21 read_distribution.py
22 ++++++++++++++++++++
17 23
18 ----- 24 Provided a BAM/SAM file and reference gene model, this module will calculate how mapped
25 reads were distributed over genome feature (like CDS exon, 5'UTR exon, 3' UTR exon, Intron,
26 Intergenic regions). When genome features are overlapped (e.g. a region could be annotated
27 as both exon and intron by two different transcripts) , they are prioritize as:
28 CDS exons > UTR exons > Introns > Intergenic regions, for example, if a read was mapped to
29 both CDS exon and intron, it will be assigned to CDS exons.
19 30
20 About RSeQC 31 * "Total Reads": This does NOT include those QC fail,duplicate and non-primary hit reads
21 +++++++++++ 32 * "Total Tags": reads spliced once will be counted as 2 tags, reads spliced twice will be counted as 3 tags, etc. And because of this, "Total Tags" >= "Total Reads"
33 * "Total Assigned Tags": number of tags that can be unambiguously assigned the 10 groups (see below table).
34 * Tags assigned to "TSS_up_1kb" were also assigned to "TSS_up_5kb" and "TSS_up_10kb", tags assigned to "TSS_up_5kb" were also assigned to "TSS_up_10kb". Therefore, "Total Assigned Tags" = CDS_Exons + 5'UTR_Exons + 3'UTR_Exons + Introns + TSS_up_10kb + TES_down_10kb.
35 * When assign tags to genome features, each tag is represented by its middle point.
22 36
23 The RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. “Basic modules” quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while “RNA-seq specific modules” investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation. 37 RSeQC cannot assign those reads that:
24 38
25 The RSeQC package is licensed under the GNU GPL v3 license. 39 * hit to intergenic regions that beyond region starting from TSS upstream 10Kb to TES downstream 10Kb.
40 * hit to regions covered by both 5'UTR and 3' UTR. This is possible when two head-to-tail transcripts are overlapped in UTR regions.
41 * hit to regions covered by both TSS upstream 10Kb and TES downstream 10Kb.
42
26 43
27 Inputs 44 Inputs
28 ++++++++++++++ 45 ++++++++++++++
29 46
30 Input BAM/SAM file 47 Input BAM/SAM file
34 Gene model in BED format. 51 Gene model in BED format.
35 52
36 Sample Output 53 Sample Output
37 ++++++++++++++ 54 ++++++++++++++
38 55
39 :: 56 Output:
40 57
41 Total Read: 44,826,454 :: 58 =============== ============ =========== ===========
59 Group Total_bases Tag_count Tags/Kb
60 =============== ============ =========== ===========
61 CDS_Exons 33302033 20002271 600.63
62 5'UTR_Exons 21717577 4408991 203.01
63 3'UTR_Exons 15347845 3643326 237.38
64 Introns 1132597354 6325392 5.58
65 TSS_up_1kb 17957047 215331 11.99
66 TSS_up_5kb 81621382 392296 4.81
67 TSS_up_10kb 149730983 769231 5.14
68 TES_down_1kb 18298543 266161 14.55
69 TES_down_5kb 78900674 729997 9.25
70 TES_down_10kb 140361190 896882 6.39
71 =============== ============ =========== ===========
42 72
43 Total Tags: 50,023,249 :: 73 -----
44 74
45 Total Assigned Tags: 36,057,402 :: 75 About RSeQC
76 +++++++++++
46 77
47 Group Total_bases Tag_count Tags/Kb 78 The RSeQC_ package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. "Basic modules" quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while "RNA-seq specific modules" investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation.
48 CDS_Exons 33302033 20022538 601.24
49 5'UTR_Exons 21717577 4414913 203.29
50 3'UTR_Exons 15347845 3641689 237.28
51 Introns 1132597354 6312099 5.57
52 TSS_up_1kb 17957047 215220 11.99
53 TSS_up_5kb 81621382 392192 4.81
54 TSS_up_10kb 149730983 769210 5.14
55 TES_down_1kb 18298543 266157 14.55
56 TES_down_5kb 78900674 730072 9.25
57 TES_down_10kb 140361190 896953 6.39
58 79
59 Note: 80 The RSeQC package is licensed under the GNU GPL v3 license.
60 - "Total Reads": This does NOT include those QC fail,duplicate and non-primary hit reads 81
61 - "Total Tags": reads spliced once will be counted as 2 tags, reads spliced twice will be counted as 3 tags, etc. And because of this, "Total Fragments" >= "Total Reads" 82 .. image:: http://rseqc.sourceforge.net/_static/logo.png
62 - "Total Assigned Tags": number of tags that can be unambiguously assigned the 10 groups (above table). 83
63 - Tags assigned to "TSS_up_1kb" were also assigned to "TSS_up_5kb" and "TSS_up_10kb", tags assigned to "TSS_up_5kb" were also assigned to "TSS_up_10kb". Therefore, "Total Assigned Tags" = CDS_Exons + 5'UTR_Exons + 3'UTR_Exons + Introns + TSS_up_10kb + TES_down_10kb. 84 .. _RSeQC: http://rseqc.sourceforge.net/
64 - When assigning tags to genome features, each tag is represented by its middle point. 85
65 - RSeQC cannot assign those reads that: 1) hit to intergenic regions that beyond region starting from TSS upstream 10Kb to TES downstream 10Kb. 2) hit to regions covered by both 5'UTR and 3' UTR. This is possible when two head-to-tail transcripts are overlapped in UTR regions. 3) hit to regions covered by both TSS upstream 10Kb and TES downstream 10Kb.
66 86
67 87
68 </help> 88 </help>
69 </tool> 89 </tool>