# HG changeset patch # User nilesh # Date 1373559916 14400 # Node ID 6a354a3248b6793390628e9c2e9d3a0f1057356f # Parent d064a3014efd87425a0ed42e5f3f90fed8d236d8 Uploaded diff -r d064a3014efd -r 6a354a3248b6 read_distribution.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/read_distribution.xml Thu Jul 11 12:25:16 2013 -0400 @@ -0,0 +1,69 @@ + + calculates how mapped reads were distributed over genome feature + + rseqc + + read_distribution.py -i $input -r $refgene > $output + + + + + + + + + +.. image:: https://code.google.com/p/rseqc/logo?cct=1336721062 + +----- + +About RSeQC ++++++++++++ + +The RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. “Basic modules” quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while “RNA-seq specific modules” investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation. + +The RSeQC package is licensed under the GNU GPL v3 license. + +Inputs +++++++++++++++ + +Input BAM/SAM file + Alignment file in BAM/SAM format. + +Reference gene model + Gene model in BED format. + +Sample Output +++++++++++++++ + +:: + + Total Read: 44,826,454 :: + + Total Tags: 50,023,249 :: + + Total Assigned Tags: 36,057,402 :: + + Group Total_bases Tag_count Tags/Kb + CDS_Exons 33302033 20022538 601.24 + 5'UTR_Exons 21717577 4414913 203.29 + 3'UTR_Exons 15347845 3641689 237.28 + Introns 1132597354 6312099 5.57 + TSS_up_1kb 17957047 215220 11.99 + TSS_up_5kb 81621382 392192 4.81 + TSS_up_10kb 149730983 769210 5.14 + TES_down_1kb 18298543 266157 14.55 + TES_down_5kb 78900674 730072 9.25 + TES_down_10kb 140361190 896953 6.39 + +Note: +- "Total Reads": This does NOT include those QC fail,duplicate and non-primary hit reads +- "Total Tags": reads spliced once will be counted as 2 tags, reads spliced twice will be counted as 3 tags, etc. And because of this, "Total Fragments" >= "Total Reads" +- "Total Assigned Tags": number of tags that can be unambiguously assigned the 10 groups (above table). +- Tags assigned to "TSS_up_1kb" were also assigned to "TSS_up_5kb" and "TSS_up_10kb", tags assigned to "TSS_up_5kb" were also assigned to "TSS_up_10kb". Therefore, "Total Assigned Tags" = CDS_Exons + 5'UTR_Exons + 3'UTR_Exons + Introns + TSS_up_10kb + TES_down_10kb. +- When assigning tags to genome features, each tag is represented by its middle point. +- RSeQC cannot assign those reads that: 1) hit to intergenic regions that beyond region starting from TSS upstream 10Kb to TES downstream 10Kb. 2) hit to regions covered by both 5'UTR and 3' UTR. This is possible when two head-to-tail transcripts are overlapped in UTR regions. 3) hit to regions covered by both TSS upstream 10Kb and TES downstream 10Kb. + + + +