comparison bwa_macros.xml @ 2:e29bc5c169bc draft

Uploaded
author devteam
date Fri, 20 Mar 2015 12:09:08 -0400
parents ff1ae217ccc2
children ac30bfd3e2a8
comparison
equal deleted inserted replaced
1:c71dd035971e 2:e29bc5c169bc
1 <macros> 1 <macros>
2
3 <token name="@set_rg_string@">
4 #set $rg_string = "@RG\tID:" + str($rg.ID) + "\tSM:" + str($rg.SM) + "\tPL:" + str($rg.PL)
5 #if $rg.LB
6 #set $rg_string += "\tLB:$rg.LB"
7 #end if
8 #if $rg.CN
9 #set $rg_string += "\tCN:$rg.CN"
10 #end if
11 #if $rg.DS
12 #set $rg_string += "\tDS:$rg.DS"
13 #end if
14 #if $rg.DT
15 #set $rg_string += "\tDT:$rg.DT"
16 #end if
17 #if $rg.FO
18 #set $rg_string += "\tFO:$rg.FO"
19 #end if
20 #if $rg.KS
21 #set $rg_string += "\tKS:$rg.KS"
22 #end if
23 #if $rg.PG
24 #set $rg_string += "\tPG:$rg.PG"
25 #end if
26 #if str($rg.PI)
27 #set $rg_string += "\tPI:$rg.PI"
28 #end if
29 #if $rg.PU
30 #set $rg_string += "\tPU:$rg.PU"
31 #end if
32 </token>
2 33
3 <token name="@RG@"> 34 <token name="@RG@">
4 ----- 35 -----
5 36
6 .. class:: warningmark 37 .. class:: warningmark
7 38
8 **Read Groups are Important!** 39 **Read Groups are Important!**
9 40
10 One of the recommended best practices in NGS analysis is adding read group information to BAM files. You can do thid directly in BWA interface using the 41 One of the recommended best practices in NGS analysis is adding read group information to BAM files. You can do thid directly in BWA interface using the
11 **Specify readgroup information?** widget. If you are not familiar with readgroups you shold know that this is effectively a way to tag reads with an additional ID. 42 **Specify read group information?** widget. If you are not familiar with read groups you shold know that this is effectively a way to tag reads with an additional ID.
12 This allows you to combine BAM files from, for example, multiple BWA runs into a single dataset. This significantly simplifies downstream processing as 43 This allows you to combine BAM files from, for example, multiple BWA runs into a single dataset. This significantly simplifies downstream processing as
13 instead of dealing with multiple datasets you only have to handle only one. This is possible because the readgroup information allows you to identify 44 instead of dealing with multiple datasets you only have to handle only one. This is possible because the read group information allows you to identify
14 data from different experiments even if they are combined in one file. Many downstream analysis tools such as varinat callers (e.g., FreeBayes or Naive Varinat Caller 45 data from different experiments even if they are combined in one file. Many downstream analysis tools such as varinat callers (e.g., FreeBayes or Naive Varinat Caller
15 present in Galaxy) are aware of readgtroups and will automatically generate calls for each individual sample even if they are combined within a single file. 46 present in Galaxy) are aware of readgtroups and will automatically generate calls for each individual sample even if they are combined within a single file.
16 47
17 **Description of read groups fields** 48 **Description of read groups fields**
18 49
49 @RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200 80 @RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200
50 @RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400 81 @RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400
51 @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400 82 @RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400
52 83
53 Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library). 84 Note the hierarchical relationship between read groups (unique for each lane) to libraries (sequenced on two lanes) and samples (across four lanes, two lanes for each library).
54 </token> 85 </token>
55 <token name="@info@"> 86 <token name="@info@">
56 ----- 87 -----
57 88
58 .. class:: infomark 89 .. class:: infomark
59 90
60 **More info** 91 **More info**
64 1. https://biostar.usegalaxy.org/ 95 1. https://biostar.usegalaxy.org/
65 2. https://www.biostars.org/ 96 2. https://www.biostars.org/
66 3. https://github.com/lh3/bwa 97 3. https://github.com/lh3/bwa
67 4. http://bio-bwa.sourceforge.net/ 98 4. http://bio-bwa.sourceforge.net/
68 99
69 </token> 100 </token>
70 101
71 <token name="@dataset_collections@"> 102 <token name="@dataset_collections@">
72 ------ 103 ------
73 104
74 **Dataset collections - processing large numbers of datasets at once** 105 **Dataset collections - processing large numbers of datasets at once**
75 106
76 This will be added shortly 107 This will be added shortly
77 108
78 109
79 </token> 110 </token>
80 111 <xml name="readgroup_params">
112 <conditional name="rg">
113 <param name="rg_selector" type="select" label="Set read groups information?" help="-R; Specifying read group information can greatly simplify your downstream analyses by allowing combining multiple datasets. See help below for more details">
114 <option value="set">Set</option>
115 <option value="do_not_set" selected="True">Do not set</option>
116 </param>
117 <when value="set">
118 <param name="ID" type="text" value="" size="20" label="Read group identifier (ID)" help="This value must be unique among multiple samples in your experiment">
119 <validator type="empty_field" />
120 </param>
121 <param name="SM" type="text" value="" size="20" label="Read group sample name (SM)" help="This value should be descriptive. Use pool name where a pool is being sequenced" />
122 <param name="PL" type="select" label="Platform/technology used to produce the reads (PL)">
123 <option value="CAPILLARY">CAPILLARY</option>
124 <option value="LS454">LS454</option>
125 <option value="ILLUMINA">ILLUMINA</option>
126 <option value="SOLID">SOLID</option>
127 <option value="HELICOS">HELICOS</option>
128 <option value="IONTORRENT">IONTORRENT</option>
129 <option value="PACBIO">PACBIO</option>
130 </param>
131 <param name="LB" type="text" size="25" label="Library name (LB)" />
132 <param name="CN" type="text" size="25" label="Sequencing center that produced the read (CN)" />
133 <param name="DS" type="text" size="25" label="Description (DS)" />
134 <param name="DT" type="text" size="25" label="Date that run was produced (DT)" help="ISO8601 format date or date/time, like YYYY-MM-DD" />
135 <param name="FO" type="text" size="25" optional="true" label="Flow order (FO)" help="The array of nucleotide bases that correspond to the nucleotides used for each flow of each read. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\*|[ACMGRSVTWYHKDBN]+/">
136 <validator type="regex" message="Invalid flow order">\*|[ACMGRSVTWYHKDBN]+$</validator>
137 </param>
138 <param name="KS" type="text" size="25" label="The array of nucleotide bases that correspond to the key sequence of each read (KS)" />
139 <param name="PG" type="text" size="25" label="Programs used for processing the read group (PG)" />
140 <param name="PI" type="integer" optional="true" label="Predicted median insert size (PI)" />
141 <param name="PU" type="text" size="25" label="Platform unit (PU)" help="Unique identifier (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD)" />
142 </when>
143 <when value="do_not_set">
144 <!-- do nothing -->
145 </when>
146 </conditional>
147 </xml>
81 148
82 </macros> 149 </macros>