Mercurial > repos > bgruening > deeptools_compute_gc_bias
diff computeGCBias.xml @ 3:12a3082cf023 draft
planemo upload for repository https://github.com/fidelram/deepTools/tree/master/galaxy/wrapper/ commit 2e8510e4f4015f51f7726de5697ba2de9b4e2f4c
author | bgruening |
---|---|
date | Wed, 09 Mar 2016 18:13:26 -0500 |
parents | e74853730716 |
children | 1c9d626635b4 |
line wrap: on
line diff
--- a/computeGCBias.xml Thu Feb 18 11:48:32 2016 -0500 +++ b/computeGCBias.xml Wed Mar 09 18:13:26 2016 -0500 @@ -49,7 +49,7 @@ </command> <inputs> <param name="bamInput" format="bam" type="data" label="BAM file" - help="The BAM file must be sorted."/> + help=""/> <expand macro="reference_genome_source" /> <expand macro="effectiveGenomeSize" /> @@ -127,29 +127,29 @@ <help> <![CDATA[ What it does ------------------- +------------ This tool computes the GC bias using the method proposed in Benjamini and Speed (2012) Nucleic Acids Res. (see below for further details). The output is used to plot the results and can also be used later on to correct the bias with the tool ``correctGCbias``. There are two plots produced by the tool: a boxplot showing the absolute read numbers per GC-content bin and an x-y plot depicting the ratio of observed/expected reads per GC-content bin. Output files --------------- +------------ - Diagnostic plots: - - box plot of absolute read numbers per GC-content bin - - x-y plot of observed/expected read ratios per GC-content bin + - box plot of absolute read numbers per GC-content bin + - x-y plot of observed/expected read ratios per GC-content bin - Tabular file: to be used for GC correction with ``correctGCbias`` .. image:: $PATH_TO_IMAGES/computeGCBias_output.png - :width: 600 - :height: 455 + :width: 600 + :height: 455 ---------------------------------------------- +----- -Background -------------- +Theoretical Background +---------------------- ``computeGCBias`` is based on a paper by `Benjamini and Speed <http://nar.oxfordjournals.org/content/40/10/e72>`_. The basic assumption of the GC bias diagnosis is that an ideal sample should show a uniform distribution of sequenced reads across the genome, i.e. all regions of the genome should have similar numbers of reads, regardless of their base-pair composition. @@ -172,8 +172,8 @@ Now, let's have a look at **real-life data** from genomic DNA sequencing. Panels A and B can be clearly distinguished and the major change that took place between the experiments underlying the plots was that the samples in panel A were prepared with too many PCR cycles and a standard polymerase whereas the samples of panel B were subjected to very few rounds of amplification using a high fidelity DNA polymerase. .. image:: $PATH_TO_IMAGES/QC_GCplots_input.png - :width: 600 - :height: 452 + :width: 600 + :height: 452 **Note:** The expected GC profile depends on the reference genome as different organisms have very different GC contents. For example, one would expect more fragments with GC fractions between 30% to 60% in mouse samples (average GC content of the mouse genome: 45 %) than for genome fragments from, for example, *Plasmodium falciparum* (average genome GC content *P. falciparum*: 20%).