comparison rgFastQC.xml @ 1:39b1c10532a4 draft

Remove intermediate directory
author iuc
date Sat, 18 Jan 2014 22:33:24 -0500
parents
children
comparison
equal deleted inserted replaced
0:a1472b1151d4 1:39b1c10532a4
1 <tool name="FastQC: Comprehensive QC" id="fastqc" version="0.53">
2 <description>reporting for short read sequence</description>
3 <command interpreter="python">
4 rgFastQC.py -i "$input_file" -d "$html_file.files_path" -o "$html_file" -n "$out_prefix" -f "$input_file.ext" -j "$input_file.name"
5 #if $contaminants.dataset and str($contaminants) > ''
6 -c "$contaminants"
7 #end if
8 -e fastqc
9 </command>
10 <requirements>
11 <requirement type="package" version="0.10.1">fastqc_dist_0_10_1</requirement>
12 </requirements>
13 <inputs>
14 <param format="fastqsanger,fastq,bam,sam" name="input_file" type="data" label="Short read data from your current history" />
15 <param name="out_prefix" value="FastQC" type="text" label="Title for the output file - to remind you what the job was for" size="80"
16 help="Letters and numbers only please - other characters will be removed">
17 <sanitizer invalid_char="">
18 <valid initial="string.letters,string.digits"/>
19 </sanitizer>
20 </param>
21 <param name="contaminants" type="data" format="tabular" optional="true" label="Contaminant list"
22 help="tab delimited file with 2 columns: name and sequence. For example: Illumina Small RNA RT Primer CAAGCAGAAGACGGCATACGA"/>
23 </inputs>
24 <outputs>
25 <data format="html" name="html_file" label="${out_prefix}_${input_file.name}.html" />
26 </outputs>
27 <tests>
28 <test>
29 <param name="input_file" value="1000gsample.fastq" />
30 <param name="out_prefix" value="fastqc_out" />
31 <param name="contaminants" value="fastqc_contaminants.txt" ftype="tabular" />
32 <output name="html_file" file="fastqc_report.html" ftype="html" lines_diff="100"/>
33 </test>
34 </tests>
35 <help>
36
37 .. class:: infomark
38
39 **Purpose**
40 Quote from FastQC_
41
42 FastQC aims to provide a simple way to do some quality control checks on raw
43 sequence data coming from high throughput sequencing pipelines.
44 It provides a modular set of analyses which you can use to give a quick
45 impression of whether your data has any problems of
46 which you should be aware before doing any further analysis.
47
48 The main functions of FastQC are:
49
50 - Import of data from BAM, SAM or FastQ files (any variant)
51 - Providing a quick overview to tell you in which areas there may be problems
52 - Summary graphs and tables to quickly assess your data
53 - Export of results to an HTML based permanent report
54 - Offline operation to allow automated generation of reports without running the interactive application
55
56 FastQC_ is the best place to look for documentation - it's very good.
57 Some features of the Galaxy wrapper you are using are described below.
58
59 -----
60
61 .. class:: infomark
62
63 **This Galaxy Tool**
64 You are using FastQC_ in Galaxy.
65 This is easy because it has been packaged into a Galaxy tool by the Intergalactic Utilities Commission.
66 It exposes the external package FastQC_ which is documented at FastQC_
67 Kindly acknowledge it as well as this tool if you use it.
68 FastQC incorporates the Picard-tools_ libraries for sam/bam processing.
69
70 The contaminants file parameter was borrowed from the independently developed
71 fastqcwrapper contributed to the Galaxy Community Tool Shed by Jim Johnson.
72
73 -----
74
75 .. class:: infomark
76
77 **Inputs and outputs**
78
79 This wrapper will accept a Galaxy fastq, sam or bam as the input read file to check.
80 It will also take an optional file containing a list of contaminants information, in the form of
81 a tab-delimited file with 2 columns, name and sequence.
82
83 FastQC_ produces a single HTML output file which is slightly adjusted so it looks good in Galaxy that contains all of the results, including the following:
84
85 - Basic Statistics
86 - Per base sequence quality
87 - Per sequence quality scores
88 - Per base sequence content
89 - Per base GC content
90 - Per sequence GC content
91 - Per base N content
92 - Sequence Length Distribution
93 - Sequence Duplication Levels
94 - Overrepresented sequences
95 - Kmer Content
96
97 All except Basic Statistics and Overrepresented sequences are plots.
98 .. _FastQC: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
99 .. _Picard-tools: http://picard.sourceforge.net/index.shtml
100
101 </help>
102 </tool>