annotate read_distribution.xml @ 31:cc5eaa9376d8

Lance's updates
author nilesh
date Wed, 02 Oct 2013 02:20:04 -0400
parents 6a354a3248b6
children 580ee0c4bc4e
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
1 <tool id="read_distribution" name="Read Distribution" version="1.1">
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
2 <description>calculates how mapped reads were distributed over genome feature</description>
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
3 <requirements>
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
4 <requirement type="package" version="1.7.1">numpy</requirement>
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
5 <requirement type="package" version="2.3.7">rseqc</requirement>
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
6 </requirements>
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
7 <command> read_distribution.py -i $input -r $refgene > $output
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
8 </command>
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
9 <inputs>
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
10 <param name="input" type="data" format="bam,sam" label="input bam/sam file" />
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
11 <param name="refgene" type="data" format="bed" label="reference gene model" />
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
12 </inputs>
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
13 <outputs>
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
14 <data format="txt" name="output" />
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
15 </outputs>
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
16 <stdio>
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
17 <exit_code range="1:" level="fatal" description="An error occured during execution, see stderr and stdout for more information" />
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
18 <regex match="[Ee]rror" source="both" description="An error occured during execution, see stderr and stdout for more information" />
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
19 </stdio>
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
20 <help>
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
21 read_distribution.py
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
22 ++++++++++++++++++++
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
23
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
24 Provided a BAM/SAM file and reference gene model, this module will calculate how mapped
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
25 reads were distributed over genome feature (like CDS exon, 5'UTR exon, 3' UTR exon, Intron,
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
26 Intergenic regions). When genome features are overlapped (e.g. a region could be annotated
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
27 as both exon and intron by two different transcripts) , they are prioritize as:
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
28 CDS exons > UTR exons > Introns > Intergenic regions, for example, if a read was mapped to
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
29 both CDS exon and intron, it will be assigned to CDS exons.
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
30
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
31 * "Total Reads": This does NOT include those QC fail,duplicate and non-primary hit reads
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
32 * "Total Tags": reads spliced once will be counted as 2 tags, reads spliced twice will be counted as 3 tags, etc. And because of this, "Total Tags" >= "Total Reads"
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
33 * "Total Assigned Tags": number of tags that can be unambiguously assigned the 10 groups (see below table).
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
34 * Tags assigned to "TSS_up_1kb" were also assigned to "TSS_up_5kb" and "TSS_up_10kb", tags assigned to "TSS_up_5kb" were also assigned to "TSS_up_10kb". Therefore, "Total Assigned Tags" = CDS_Exons + 5'UTR_Exons + 3'UTR_Exons + Introns + TSS_up_10kb + TES_down_10kb.
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
35 * When assign tags to genome features, each tag is represented by its middle point.
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
36
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
37 RSeQC cannot assign those reads that:
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
38
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
39 * hit to intergenic regions that beyond region starting from TSS upstream 10Kb to TES downstream 10Kb.
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
40 * hit to regions covered by both 5'UTR and 3' UTR. This is possible when two head-to-tail transcripts are overlapped in UTR regions.
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
41 * hit to regions covered by both TSS upstream 10Kb and TES downstream 10Kb.
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
42
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
43
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
44 Inputs
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
45 ++++++++++++++
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
46
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
47 Input BAM/SAM file
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
48 Alignment file in BAM/SAM format.
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
49
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
50 Reference gene model
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
51 Gene model in BED format.
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
52
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
53 Sample Output
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
54 ++++++++++++++
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
55
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
56 Output:
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
57
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
58 =============== ============ =========== ===========
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
59 Group Total_bases Tag_count Tags/Kb
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
60 =============== ============ =========== ===========
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
61 CDS_Exons 33302033 20002271 600.63
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
62 5'UTR_Exons 21717577 4408991 203.01
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
63 3'UTR_Exons 15347845 3643326 237.38
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
64 Introns 1132597354 6325392 5.58
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
65 TSS_up_1kb 17957047 215331 11.99
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
66 TSS_up_5kb 81621382 392296 4.81
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
67 TSS_up_10kb 149730983 769231 5.14
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
68 TES_down_1kb 18298543 266161 14.55
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
69 TES_down_5kb 78900674 729997 9.25
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
70 TES_down_10kb 140361190 896882 6.39
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
71 =============== ============ =========== ===========
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
72
31
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
73 -----
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
74
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
75 About RSeQC
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
76 +++++++++++
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
77
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
78 The RSeQC_ package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. "Basic modules" quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while "RNA-seq specific modules" investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation.
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
79
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
80 The RSeQC package is licensed under the GNU GPL v3 license.
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
81
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
82 .. image:: http://rseqc.sourceforge.net/_static/logo.png
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
83
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
84 .. _RSeQC: http://rseqc.sourceforge.net/
cc5eaa9376d8 Lance's updates
nilesh
parents: 23
diff changeset
85
23
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
86
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
87
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
88 </help>
6a354a3248b6 Uploaded
nilesh
parents:
diff changeset
89 </tool>