0
|
1 <tool id="quality_score_distribution" name="Build base quality distribution" version="1.0.2">
|
|
2 <description></description>
|
|
3
|
|
4 <command interpreter="python">short_reads_figure_score.py $input1 $output1 </command>
|
|
5
|
|
6 <inputs>
|
|
7 <page>
|
|
8 <param name="input1" type="data" format="qualsolexa, qual454" label="Quality score file" help="No dataset? Read tip below"/>
|
|
9 </page>
|
|
10 </inputs>
|
|
11
|
|
12 <outputs>
|
|
13 <data name="output1" format="png" />
|
|
14 </outputs>
|
|
15 <requirements>
|
|
16 <requirement type="python-module">rpy</requirement>
|
|
17 </requirements>
|
|
18 <tests>
|
|
19 <test>
|
|
20 <param name="input1" value="solexa.qual" ftype="qualsolexa" />
|
|
21 <output name="output1" file="solexaScore.png" ftype="png" />
|
|
22 </test>
|
|
23 <test>
|
|
24 <param name="input1" value="454.qual" ftype="qual454" />
|
|
25 <output name="output1" file="454Score.png" ftype="png" />
|
|
26 </test>
|
|
27 </tests>
|
|
28 <help>
|
|
29
|
|
30 .. class:: warningmark
|
|
31
|
|
32 To use this tool, your dataset needs to be in the *Quality Score* format. Click the pencil icon next to your dataset to set the datatype to *Quality Score* (see below for examples).
|
|
33
|
|
34 -----
|
|
35
|
|
36 **What it does**
|
|
37
|
|
38 This tool takes Quality Files generated by Roche (454), Illumina (Solexa), or ABI SOLiD machines and builds a graph showing score distribution like the one below. Such graph allows you to perform initial evaluation of data quality in a single pass.
|
|
39
|
|
40 -----
|
|
41
|
|
42 **Examples of Quality Data**
|
|
43
|
|
44 Roche (454) or ABI SOLiD data::
|
|
45
|
|
46 >seq1
|
|
47 23 33 34 25 28 28 28 32 23 34 27 4 28 28 31 21 28
|
|
48
|
|
49 Illumina (Solexa) data::
|
|
50
|
|
51 -40 -40 40 -40 -40 -40 -40 40
|
|
52
|
|
53 -----
|
|
54
|
|
55 **Output example**
|
|
56
|
|
57 Quality scores are summarized as boxplot (Roche 454 FLX data):
|
|
58
|
|
59 .. image:: ./static/images/short_reads_boxplot.png
|
|
60
|
|
61 where the **X-axis** is coordinate along the read and the **Y-axis** is quality score adjusted to comply with the Phred score metric. Units on the X-axis depend on whether your data comes from Roche (454) or Illumina (Solexa) and ABI SOLiD machines:
|
|
62
|
|
63 - For Roche (454) X-axis (shown above) indicates **relative** position (in %) within reads as this technology produces reads of different lengths;
|
|
64 - For Illumina (Solexa) and ABI SOLiD X-axis shows **absolute** position in nucleotides within reads.
|
|
65
|
|
66 Every box on the plot shows the following values::
|
|
67
|
|
68 o <---- Outliers
|
|
69 o
|
|
70 -+- <---- Upper Extreme Value that is no more
|
|
71 | than box length away from the box
|
|
72 |
|
|
73 +--+--+ <---- Upper Quartile
|
|
74 | |
|
|
75 +-----+ <---- Median
|
|
76 | |
|
|
77 +--+--+ <---- Lower Quartile
|
|
78 |
|
|
79 |
|
|
80 -+- <---- Lower Extreme Value that is no more
|
|
81 than box length away from the box
|
|
82 o <---- Outlier
|
|
83
|
|
84
|
|
85
|
|
86 </help>
|
|
87 </tool>
|