Mercurial > repos > shians > shrnaseq
annotate hairpinTool.xml @ 10:8923d4ea858b
- Added check for zero library size, will now filter out zero library size
samples and generate report of filtered samples in html output
author | shian_su <registertonysu@gmail.com> |
---|---|
date | Tue, 12 Aug 2014 14:42:27 +1000 |
parents | f1076bfb0ed1 |
children | c0a76e30d61b |
rev | line source |
---|---|
10
8923d4ea858b
- Added check for zero library size, will now filter out zero library size
shian_su <registertonysu@gmail.com>
parents:
9
diff
changeset
|
1 <tool id="shRNAseq" name="shRNAseq Tool" version="1.0.12"> |
2 | 2 <description> |
3 Analyse hairpin differential representation using edgeR | |
4 </description> | |
5 | |
6 <requirements> | |
9
f1076bfb0ed1
- Fixed tool to actually make use of barcode location options
shian_su <registertonysu@gmail.com>
parents:
8
diff
changeset
|
7 <requirement type="R-module" version="3.6.2">edgeR</requirement> |
f1076bfb0ed1
- Fixed tool to actually make use of barcode location options
shian_su <registertonysu@gmail.com>
parents:
8
diff
changeset
|
8 <requirement type="R-module" version="3.20.7">limma</requirement> |
f1076bfb0ed1
- Fixed tool to actually make use of barcode location options
shian_su <registertonysu@gmail.com>
parents:
8
diff
changeset
|
9 <requirement type="package" version="3.0.3">R_3_0_3</requirement> |
2 | 10 </requirements> |
11 | |
12 <stdio> | |
13 <exit_code range="1:" level="fatal" description="Tool exception" /> | |
14 </stdio> | |
15 | |
16 <command interpreter="Rscript"> | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
17 hairpinTool.R $inputOpt.inputType |
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
18 #if $inputOpt.inputType=="fastq": |
2 | 19 #for $i, $fas in enumerate($inputOpt.fastq): |
20 fastq::$fas.file | |
21 #end for | |
22 | |
23 $inputOpt.hairpin | |
24 $inputOpt.samples | |
25 | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
26 #if $inputOpt.positions.posOption=="yes": |
2 | 27 $inputOpt.positions.barstart |
28 $inputOpt.positions.barend | |
29 $inputOpt.positions.hpstart | |
30 $inputOpt.positions.hpend | |
31 #else: | |
32 1 | |
33 5 | |
34 37 | |
35 57 | |
36 #end if | |
37 #else: | |
38 $inputOpt.counts | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
39 $inputOpt.hairpin |
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
40 $inputOpt.samples |
2 | 41 0 0 0 |
42 #end if | |
43 | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
44 #if $filterCPM.filtOption=="yes": |
2 | 45 $filterCPM.cpmReq |
46 $filterCPM.sampleReq | |
47 #else: | |
48 -Inf | |
49 -Inf | |
50 #end if | |
51 | |
52 $fdr | |
53 $lfc | |
54 $workMode.mode | |
55 $outFile | |
56 $outFile.files_path | |
57 | |
58 #if $workMode.mode=="classic": | |
59 "$workMode.pair1" | |
60 "$workMode.pair2" | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
61 #elif $workMode.mode=="glm": |
2 | 62 "$workMode.contrast" |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
63 $workMode.roast.roastOption |
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
64 #if $workMode.roast.roastOption=="yes": |
2 | 65 $workMode.roast.hairpinReq |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
66 $workMode.roast.select.selOption |
2 | 67 "$workMode.roast.select.selection" |
68 #else: | |
69 0 | |
70 0 | |
71 0 | |
72 #end if | |
73 #end if | |
74 </command> | |
75 | |
76 <inputs> | |
77 <conditional name="inputOpt"> | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
78 <param name="inputType" type="select" label="Input File Type"> |
2 | 79 <option value="fastq">FastQ File</option> |
80 <option value="counts">Table of Counts</option> | |
81 </param> | |
82 | |
83 <when value="fastq"> | |
84 <param name="hairpin" type="data" format="tabular" | |
85 label="Hairpin Annotation"/> | |
86 | |
87 | |
88 <param name="samples" type="data" format="tabular" | |
89 label="Sample Annotation"/> | |
90 | |
91 <repeat name="fastq" title="FastQ Files"> | |
92 <param name="file" type="data" format="fastq"/> | |
93 </repeat> | |
94 | |
95 <conditional name="positions"> | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
96 <param name="posOption" type="select" |
2 | 97 label="Specify Barcode and Hairpin Locations?" |
98 help="Default Positions: Barcode: 1 to 5, Hairpin: 37 to 57."> | |
99 <option value="no" selected="True">No</option> | |
100 <option value="yes">Yes</option> | |
101 </param> | |
102 | |
103 <when value="yes"> | |
104 <param name="barstart" type="integer" value="1" | |
105 label="Barcode Starting Position"/> | |
106 <param name="barend" type="integer" value="5" | |
107 label="Barcode Ending Position"/> | |
108 | |
109 <param name="hpstart" type="integer" value="37" | |
110 label="Hairpin Starting Position"/> | |
111 | |
112 <param name="hpend" type="integer" value="57" | |
113 label="Hairpin Ending Position"/> | |
114 </when> | |
115 | |
116 <when value="no"/> | |
117 </conditional> | |
118 </when> | |
119 | |
120 <when value="counts"> | |
121 <param name="counts" type="data" format="tabular" label="Counts Table"/> | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
122 <param name="hairpin" type="data" format="tabular" |
2 | 123 label="Hairpin Annotation"/> |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
124 <param name="samples" type="data" format="tabular" |
2 | 125 label="Sample Annotation"/> |
126 </when> | |
127 </conditional> | |
128 | |
129 <conditional name="filterCPM"> | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
130 <param name="filtOption" type="select" label="Filter Low CPM?" |
2 | 131 help="Ignore hairpins with very low representation when performing |
132 analysis."> | |
133 <option value="yes">Yes</option> | |
134 <option value="no">No</option> | |
135 </param> | |
136 | |
137 <when value="yes"> | |
138 <param name="cpmReq" type="float" value="0.5" min="0" max="1" | |
139 label="Minimum CPM"/> | |
140 | |
141 <param name="sampleReq" type="integer" value="1" min="0" | |
142 label="Minimum Samples" | |
143 help="Filter out all the genes that do not meet the minimum | |
144 CPM in at least this many samples."/> | |
145 </when> | |
146 | |
147 <when value="no"/> | |
148 | |
149 </conditional> | |
150 | |
151 <conditional name="workMode"> | |
152 <param name="mode" type="select" label="Analysis Type" | |
153 help="Classic Exact Tests are useful for simple comparisons across | |
154 two sampling groups. Generalised linear models allow for more | |
155 complex contrasts and gene level analysis to be made."> | |
156 <option value="classic">Classic Exact Test</option> | |
157 <option value="glm">Generalised Linear Model</option> | |
158 </param> | |
159 | |
160 <when value="classic"> | |
161 <param name="pair1" type="text" label="Compare" size="40"/> | |
162 <param name="pair2" type="text" label="To" size="40" | |
163 help="The analysis will subtract values of this group from those | |
164 in the group above to establish the difference."/> | |
165 </when> | |
166 | |
167 <when value="glm"> | |
168 <param name="contrast" type="text" size="60" | |
169 label="Contrasts of interest" | |
170 help="Specify equations defining contrasts to be made. Eg. | |
171 KD-Control will result in positive fold change if KD has | |
172 greater expression and negative if Control has greater | |
173 expression."/> | |
174 | |
175 <conditional name="roast"> | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
176 <param name="roastOption" type="select" |
2 | 177 label="Perform Gene Level Analysis?" |
178 help="Analyse LogFC tendencies for hairpins belonging | |
179 to the same gene."> | |
180 <option value="no">No</option> | |
181 <option value="yes">Yes</option> | |
182 </param> | |
183 | |
184 <when value="yes"> | |
185 <param name="hairpinReq" type="integer" value="2" min="2" | |
186 label="Minimum Hairpins" | |
187 help="Only genes with at least this many hairpins will | |
188 be analysed."/> | |
189 | |
190 <conditional name="select"> | |
6
3d04308a99f9
- Added differentially expressed hairpin count output
shian_su <registertonysu@gmail.com>
parents:
2
diff
changeset
|
191 <param name="selOption" type="select" |
2 | 192 label="Gene Selection Method"> |
193 <option value="rank">By p-value Rank</option> | |
194 <option value="geneID">By Gene Identifier</option> | |
195 </param> | |
196 <when value="rank"> | |
197 <param name="selection" type="text" size="40" value="1:5" | |
198 label="Ranks of Top Genes to Plot" | |
199 help="Genes are ranked in ascending p-value for | |
200 differential representation, individual ranks can | |
201 be entered seperated by comma or a range seperated | |
202 by colon."/> | |
203 </when> | |
204 <when value="geneID"> | |
205 <param name="selection" type="text" size="80" value="" | |
206 label="Symbols of Genes to Plot" | |
207 help="Select genes based on their identifier in the | |
208 'Gene' column of the sample information file. | |
209 Please ensure exact match with the values in input | |
210 file and separate selections with commas."/> | |
211 </when> | |
212 </conditional> | |
213 | |
214 | |
215 </when> | |
216 | |
217 <when value="no"/> | |
218 </conditional> | |
219 </when> | |
220 </conditional> | |
221 | |
222 <param name="fdr" type="float" value="0.05" min="0" max="1" | |
223 label="FDR Threshold" | |
224 help="All observations below this threshold will be highlighted | |
225 in the smear plot."/> | |
226 <param name="lfc" type="float" value="0" min="0" | |
227 label="Absolute LogFC Threshold" | |
228 help="In additional to meeting the FDR requirement, the absolute | |
229 value of the log-fold-change of the observation must be above | |
230 this threshold to be highlighted."/> | |
231 </inputs> | |
232 | |
233 <outputs> | |
234 <data format="html" name="outFile" label="shRNAseq Analysis"/> | |
235 </outputs> | |
236 <help> | |
237 .. class:: infomark | |
238 | |
239 **What it does** | |
240 | |
241 Given tables containing information about the hairpins and their associated | |
242 barcodes, information about the samples and fastq file containing the hairpin | |
243 reads. This tool will generate plots and tables for the analysis of differential | |
244 representation. | |
245 | |
7 | 246 .. class:: infomark |
247 | |
248 A tutorial of how to use this tool is available at: | |
249 http://bioinf.wehi.edu.au/shRNAseq/galaxy.html | |
250 | |
2 | 251 ----- |
252 | |
253 .. class:: infomark | |
254 | |
255 **INPUTS** | |
256 | |
257 **Input File Type:** | |
258 | |
259 This tool is able to either generate counts from a raw FastQ file given the | |
260 information regarding the samples and hairpins. Alternatively if a table of | |
261 counts has already been generated it can also be used. | |
262 | |
263 **Counts Table (Counts Input):** | |
264 | |
265 A tab delimited text table of information regarding the counts of hairpins. | |
266 Should have a column 'ID' to denote the hairpins that counts correspond to. Each | |
267 additional column should have titles corresponding to the label for the sample. | |
268 | |
269 Example:: | |
270 | |
271 ID Sample1 Sample2 Sample3 | |
272 Control1 49802 48014 40148 | |
273 Control2 12441 16352 14232 | |
274 Control3 9842 9148 9111 | |
275 Hairpin1 3300 3418 2914 | |
276 Hairpin2 91418 95812 93174 | |
277 Hairpin3 32985 31975 35104 | |
278 Hairpin4 12082 14081 14981 | |
279 Hairpin5 2491 2769 2691 | |
280 Hairpin6 1294 1486 1642 | |
281 Hairpin7 49501 49076 47611 | |
282 ... | |
283 | |
284 **Hairpin Annotation:** | |
285 | |
286 A tab delimited text table of information regarding the hairpins. Should have | |
287 columns 'ID', 'Sequences' and 'Gene' to uniquely identify the hairpin, align it | |
288 with the reads to produce counts and identify which gene the hairpin acts on. | |
289 | |
290 NOTE: the column names are case sensitive and should be input exactly as they | |
291 are shown here. | |
292 | |
293 Example:: | |
294 | |
295 ID Sequences Gene | |
296 Control1 TCTCGCTTGGGCGAGAGTAAG 2 | |
297 Control2 CCGCCTGAAGTCTCTGATTAA 2 | |
298 Control3 AGGAATTATAATGCTTATCTA 2 | |
299 Hairpin1 AAGGCAGAGACTGACCACCTA 4 | |
300 Hairpin2 GAGCGACCTGGTGTTACTCTA 4 | |
301 Hairpin3 ATGGTGTAAATAGAGCTGTTA 4 | |
302 Hairpin4 CAGCTCATCTTCTGTGAAGAA 4 | |
303 Hairpin5 CAGCTCTGTGGGTCAGAAGAA 4 | |
304 Hairpin6 CCAGGCACAGATCTCAAGATA 4 | |
305 Hairpin7 ATGACAAGAAAGACATCTCAA 7 | |
306 ... | |
307 | |
308 **Sample Annotation (FastQ Input):** | |
309 | |
310 A tab delimited text table of information regarding the samples. Should have | |
311 columns 'ID', 'Sequences' and 'group' to uniquely identify each sample, identify | |
312 the sample in the reads by its barcode sequence and correctly group replicates | |
313 for analysis. Additional columns may inserted for annotation purposes and will | |
314 not interfere with analysis as long as the necessary columns are present. | |
315 | |
316 NOTE: the column names are case sensitive and should be input exactly as they | |
317 are shown here. | |
318 | |
319 Example:: | |
320 | |
321 ID Sequences group Replicate | |
322 3 GAAAG Day 2 1 | |
323 6 GAACC Day 10 1 | |
324 9 GAAGA Day 5 GFP neg 1 | |
325 16 GAATT Day 5 GFP pos 1 | |
326 18 GACAC Day 2 2 | |
327 21 GACCA Day 10 2 | |
328 28 GACGT Day 5 GFP neg 2 | |
329 31 GACTG Day 5 GFP pos 2 | |
330 33 GAGAA Day 2 3 | |
331 40 GAGCT Day 10 3 | |
332 ... | |
333 | |
334 **Specify Barcode and Hairpin Locations (FastQ Input):** | |
335 | |
336 It is assumed that in the sequencing reads that the first 5 bases are the | |
337 barcodes and that bases 37-57 are the hairpins. If this is not the case then the | |
338 values of the positions can be changed, however it still requires the barcodes | |
339 and hairpins to be in a consistent location an in a continuous sequence. | |
340 | |
341 **Filter Low CPM?:** | |
342 | |
343 Often in a large screen there may members with very low counts which are of no | |
344 interest in the experiment, these may be filtered out to speed up computations. | |
345 Filtering will be based on counts per million in a required number of samples. | |
346 | |
347 **Analysis Type:** | |
348 | |
349 * **Classic Exact Test:** This allows two experimental groups to be compared and | |
350 p-values for differential representation derivec for each hairpin. Simple and | |
351 fast for straightforward comparisons. In this option you will have the option of | |
352 "*Compare* x *To* y" which implicitly subtracts the data from y from that of x | |
353 to produce the comparison. | |
354 | |
355 * **Generalised Linear Model:** This allow for complex contrasts to be specified | |
356 and also gene level analysis to be performed. If this option is chosen then | |
357 contrasts must be explicitly stated in equations and multiple contrasts can be | |
358 made. In addition there will be the option to analyse hairpins on a per-gene | |
359 basis to see if hairpins belonging to a particular gene have any overall | |
360 tendencies for the direction of their log-fold-change. | |
361 | |
362 **FDR Threshold:** | |
363 The smear plot in the output will have hairpins highlighted to signify | |
364 significant differential representation. The significance is determined by | |
365 contorlling the false discovery rate, only those with a FDR lower than the | |
366 threshold will be highlighted in the plot. | |
367 | |
368 ----- | |
369 | |
370 **Citations:** | |
371 | |
372 .. class:: infomark | |
373 | |
374 limma | |
375 | |
376 Please cite the paper below for the limma software itself. Please also try | |
377 to cite the appropriate methodology articles that describe the statistical | |
378 methods implemented in limma, depending on which limma functions you are | |
379 using. The methodology articles are listed in Section 2.1 of the limma | |
380 User's Guide. | |
381 | |
382 * Smyth, GK (2005). Limma: linear models for microarray data. In: | |
383 'Bioinformatics and Computational Biology Solutions using R and | |
384 Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, | |
385 W. Huber (eds), Springer, New York, pages 397-420. | |
386 | |
387 .. class:: infomark | |
388 | |
389 edgeR | |
390 | |
391 Please cite the first paper for the software itself and the other papers for | |
392 the various original statistical methods implemented in edgeR. See | |
393 Section 1.2 in the User's Guide for more detail. | |
394 | |
395 * Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor | |
396 package for differential expression analysis of digital gene expression | |
397 data. Bioinformatics 26, 139-140 | |
398 | |
399 * Robinson MD and Smyth GK (2007). Moderated statistical tests for assessing | |
400 differences in tag abundance. Bioinformatics 23, 2881-2887 | |
401 | |
402 * Robinson MD and Smyth GK (2008). Small-sample estimation of negative | |
403 binomial dispersion, with applications to SAGE data. | |
404 Biostatistics, 9, 321-332 | |
405 | |
406 * McCarthy DJ, Chen Y and Smyth GK (2012). Differential expression analysis | |
407 of multifactor RNA-Seq experiments with respect to biological variation. | |
408 Nucleic Acids Research 40, 4288-4297 | |
8 | 409 |
410 Report problems to: su.s@wehi.edu.au | |
2 | 411 |
412 .. _edgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html | |
413 .. _limma: http://www.bioconductor.org/packages/release/bioc/html/limma.html | |
414 </help> | |
415 </tool> | |
416 |