3
|
1 <tool id="fastqc" name="FastQC" version="0.63">
|
|
2 <description>Read Quality reports</description>
|
|
3 <requirements>
|
|
4 <requirement type="package" version="0.11.2">FastQC</requirement>
|
|
5 </requirements>
|
|
6 <stdio>
|
|
7 <exit_code range="1:" />
|
|
8 <exit_code range=":-1" />
|
|
9 <regex match="Error:" />
|
|
10 <regex match="Exception:" />
|
|
11 </stdio>
|
|
12 <command interpreter="python">
|
|
13 rgFastQC.py
|
2
|
14 -i "$input_file"
|
|
15 -d "$html_file.files_path"
|
|
16 -o "$html_file"
|
|
17 -t "$text_file"
|
|
18 -f "$input_file.ext"
|
|
19 -j "$input_file.name"
|
|
20 -e "\$FASTQC_JAR_PATH/fastqc"
|
|
21 #if $contaminants.dataset and str($contaminants) > ''
|
|
22 -c "$contaminants"
|
|
23 #end if
|
|
24 #if $limits.dataset and str($limits) > ''
|
|
25 -l "$limits"
|
|
26 #end if
|
3
|
27 </command>
|
|
28 <inputs>
|
|
29 <param format="fastqsanger,fastq,bam,sam" name="input_file" type="data" label="Short read data from your current history" />
|
|
30 <param name="contaminants" type="data" format="tabular" optional="true" label="Contaminant list"
|
|
31 help="tab delimited file with 2 columns: name and sequence. For example: Illumina Small RNA RT Primer CAAGCAGAAGACGGCATACGA"/>
|
|
32 <param name="limits" type="data" format="txt" optional="true" label="Submodule and Limit specifing file"
|
|
33 help="a file that specifies which submodules are to be executed (default=all) and also specifies the thresholds for the each submodules warning parameter" />
|
|
34 </inputs>
|
|
35 <outputs>
|
|
36 <data format="html" name="html_file" label="${tool.name} on ${on_string}: Webpage" />
|
|
37 <data format="txt" name="text_file" label="${tool.name} on ${on_string}: RawData" />
|
|
38 </outputs>
|
|
39 <tests>
|
|
40 <test>
|
|
41 <param name="input_file" value="1000gsample.fastq" />
|
|
42 <param name="contaminants" value="fastqc_contaminants.txt" ftype="tabular" />
|
|
43 <output name="html_file" file="fastqc_report.html" ftype="html" lines_diff="100"/>
|
|
44 <output name="text_file" file="fastqc_data.txt" ftype="txt" lines_diff="100"/>
|
|
45 </test>
|
|
46 <test>
|
|
47 <param name="input_file" value="1000gsample.fastq" />
|
|
48 <param name="limits" value="fastqc_customlimits.txt" ftype="txt" />
|
|
49 <output name="html_file" file="fastqc_report2.html" ftype="html" lines_diff="100"/>
|
|
50 <output name="text_file" file="fastqc_data2.txt" ftype="txt" lines_diff="100"/>
|
|
51 </test>
|
|
52 </tests>
|
|
53 <help>
|
0
|
54
|
|
55 .. class:: infomark
|
|
56
|
|
57 **Purpose**
|
|
58
|
|
59 FastQC aims to provide a simple way to do some quality control checks on raw
|
|
60 sequence data coming from high throughput sequencing pipelines.
|
|
61 It provides a modular set of analyses which you can use to give a quick
|
|
62 impression of whether your data has any problems of
|
|
63 which you should be aware before doing any further analysis.
|
|
64
|
|
65 The main functions of FastQC are:
|
|
66
|
|
67 - Import of data from BAM, SAM or FastQ files (any variant)
|
|
68 - Providing a quick overview to tell you in which areas there may be problems
|
|
69 - Summary graphs and tables to quickly assess your data
|
|
70 - Export of results to an HTML based permanent report
|
|
71 - Offline operation to allow automated generation of reports without running the interactive application
|
|
72
|
|
73
|
|
74 -----
|
|
75
|
|
76
|
|
77 .. class:: infomark
|
|
78
|
|
79 **FastQC**
|
|
80
|
|
81 This is a Galaxy wrapper. It merely exposes the external package FastQC_ which is documented at FastQC_
|
|
82 Kindly acknowledge it as well as this tool if you use it.
|
|
83 FastQC incorporates the Picard-tools_ libraries for sam/bam processing.
|
|
84
|
|
85 The contaminants file parameter was borrowed from the independently developed
|
|
86 fastqcwrapper contributed to the Galaxy Community Tool Shed by J. Johnson.
|
1
|
87 Adaption to version 0.11.2 by T. McGowan.
|
0
|
88
|
|
89 -----
|
|
90
|
|
91 .. class:: infomark
|
|
92
|
|
93 **Inputs and outputs**
|
|
94
|
|
95 FastQC_ is the best place to look for documentation - it's very good.
|
|
96 A summary follows below for those in a tearing hurry.
|
|
97
|
|
98 This wrapper will accept a Galaxy fastq, sam or bam as the input read file to check.
|
|
99 It will also take an optional file containing a list of contaminants information, in the form of
|
1
|
100 a tab-delimited file with 2 columns, name and sequence. As another option the tool takes a custom
|
|
101 limits.txt file that allows setting the warning thresholds for the different modules and also specifies
|
|
102 which modules to include in the output.
|
0
|
103
|
1
|
104 The tool produces a basic text and a HTML output file that contain all of the results, including the following:
|
0
|
105
|
|
106 - Basic Statistics
|
|
107 - Per base sequence quality
|
|
108 - Per sequence quality scores
|
|
109 - Per base sequence content
|
|
110 - Per base GC content
|
|
111 - Per sequence GC content
|
|
112 - Per base N content
|
|
113 - Sequence Length Distribution
|
|
114 - Sequence Duplication Levels
|
|
115 - Overrepresented sequences
|
|
116 - Kmer Content
|
|
117
|
|
118 All except Basic Statistics and Overrepresented sequences are plots.
|
|
119 .. _FastQC: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
|
|
120 .. _Picard-tools: http://picard.sourceforge.net/index.shtml
|
|
121
|
2
|
122 </help>
|
|
123 <citations>
|
|
124 <citation type="bibtex">
|
|
125 @ARTICLE{andrews_s,
|
|
126 author = {Andrews, S.},
|
|
127 keywords = {bioinformatics, ngs, qc},
|
|
128 priority = {2},
|
|
129 title = {{FastQC A Quality Control tool for High Throughput Sequence Data}},
|
|
130 url = {http://www.bioinformatics.babraham.ac.uk/projects/fastqc/}
|
|
131 }
|
|
132 </citation>
|
|
133 </citations>
|
0
|
134 </tool>
|