view tools/metag_tools/short_reads_figure_score.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
line wrap: on
line source

<tool id="quality_score_distribution" name="Build base quality distribution" version="1.0.2">
<description></description>

<command interpreter="python">short_reads_figure_score.py $input1 $output1 </command>

<inputs>
<page>
    <param name="input1" type="data" format="qualsolexa, qual454" label="Quality score file" help="No dataset? Read tip below"/>
</page>
</inputs>

<outputs>
  	<data name="output1" format="png" />
</outputs> 
<requirements>
	<requirement type="python-module">rpy</requirement>
</requirements>
<tests>
	<test>
		<param name="input1" value="solexa.qual" ftype="qualsolexa" />
  		<output name="output1" file="solexaScore.png" ftype="png" />
	</test>
	<test>
		<param name="input1" value="454.qual" ftype="qual454" />
		<output name="output1" file="454Score.png" ftype="png" />
	</test>
</tests>
<help>

.. class:: warningmark

To use this tool, your dataset needs to be in the *Quality Score* format. Click the pencil icon next to your dataset to set the datatype to *Quality Score* (see below for examples).

-----

**What it does**

This tool takes Quality Files generated by Roche (454), Illumina (Solexa), or ABI SOLiD machines and builds a graph showing score distribution like the one below. Such graph allows you to perform initial evaluation of data quality in a single pass.

-----

**Examples of Quality Data**

Roche (454) or ABI SOLiD data::

	&gt;seq1
	23 33 34 25 28 28 28 32 23 34 27 4 28 28 31 21 28

Illumina (Solexa) data::

 	-40 -40 40 -40	 -40 -40 -40 40	 
 
-----

**Output example**

Quality scores are summarized as boxplot (Roche 454 FLX data):

.. image:: ./static/images/short_reads_boxplot.png

where the **X-axis** is coordinate along the read and the **Y-axis** is quality score adjusted to comply with the Phred score metric. Units on the X-axis depend on whether your data comes from Roche (454) or Illumina (Solexa) and ABI SOLiD machines:

  - For Roche (454) X-axis (shown above) indicates **relative** position (in %) within reads as this technology produces reads of different lengths;
  - For Illumina (Solexa) and ABI SOLiD X-axis shows **absolute** position in nucleotides within reads.
  
Every box on the plot shows the following values::

       o     &lt;---- Outliers
       o
      -+-    &lt;---- Upper Extreme Value that is no more 
       |           than box length away from the box   
       |
    +--+--+  &lt;---- Upper Quartile
    |     |
    +-----+  &lt;---- Median
    |     |
    +--+--+  &lt;---- Lower Quartile 
       |
       |
      -+-    &lt;---- Lower Extreme Value that is no more
                   than box length away from the box
       o     &lt;---- Outlier
 
 
     
</help>
</tool>