comparison varextract.xml @ 0:aa82b2e54055 draft

planemo upload for repository https://github.com/wm75/mimodd_galaxy_wrappers commit b36048cd608ede0ec6f6559648525c9350caae34-dirty
author wolma
date Sat, 11 Nov 2017 18:19:22 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:aa82b2e54055
1 <tool id="mimodd_varextract" name="MiModD Extract Variant Sites"
2 version="@MIMODD_WRAPPER_VERSION@">
3 <description>from a BCF file</description>
4 <macros>
5 <import>macros.xml</import>
6 </macros>
7 <expand macro="requirements" />
8 <expand macro="stdio" />
9 <expand macro="version_command" />
10 <command><![CDATA[
11 mimodd varextract '$ifile'
12 #if $len($sitesinfo)
13 -p
14 #for $source in $sitesinfo
15 '${source.pre_vcf}'
16 #end for
17 #end if
18 --ofile '$output_vcf'
19 $keep_alts
20 --verbose
21 ]]></command>
22
23 <inputs>
24 <param name="ifile" type="data" format="bcf" label="BCF input file"
25 help="Use the MiModD Variant Calling tool to generate the input for this tool."/>
26 <repeat name="sitesinfo" title="include information from pre-calculated vcf dataset" default="0">
27 <param name="pre_vcf" type="data" format="vcf"
28 label="independently generated vcf datset" />
29 </repeat>
30 <param name="keep_alts" type="boolean" truevalue="-a" falsevalue="" checked="false"
31 label="keep all sites with alternate bases"
32 help="If selected, the VCF output will include ALL sites for which non-reference bases have been observed, i.e., even those not considered allelic sites by the variant caller." />
33 </inputs>
34 <outputs>
35 <data name="output_vcf" format="vcf"
36 label="Variants extracted with MiModd from ${on_string}"/>
37 </outputs>
38
39 <tests>
40 <test>
41 <param name="ifile" value="a.bcf" />
42 <output name="output_vcf" ftype="vcf">
43 <assert_contents>
44 <has_line_matching expression="#CHROM.POS.ID.REF.ALT.QUAL.FILTER.INFO.FORMAT.N2.ot266" />
45 </assert_contents>
46 </output>
47 <assert_command>
48 <not_has_text text="-a" />
49 </assert_command>
50 </test>
51 <test>
52 <param name="ifile" value="a_part2.bcf" />
53 <param name="keep_alts" value="true" />
54 <param name="pre_vcf" value="a.vcf" />
55 <output name="output_vcf" ftype="vcf">
56 <assert_contents>
57 <has_line_matching expression="#CHROM.POS.ID.REF.ALT.QUAL.FILTER.INFO.FORMAT.ot266.external_source_1_N2.external_source_1_ot266" />
58 </assert_contents>
59 </output>
60 <assert_command>
61 <has_text text="-a" />
62 </assert_command>
63 </test>
64 </tests>
65 <help><![CDATA[
66 .. class:: infomark
67
68 **What it does**
69
70 The tool takes as input a BCF dataset like the ones produced by the
71 *MiModD Variant Calling* tool, extracts just the variant sites from it and
72 reports them in VCF format.
73
74 If the BCF input file specifies multiple samples, sites are included if they qualify as variant sites in at least one sample.
75
76 ----------
77
78 **Options:**
79
80 **keep all sites with alternate bases**
81
82 By default, a variant site is considered to be a position in the genome for
83 which a non-reference allele appears in the inferred genotype of any sample.
84
85 You can select the *keep all sites with alternate bases* option, if instead
86 you want to extract all sites, for which at least one non-reference base has
87 been observed (whether resulting in a non-reference allele call or not).
88 Using this option should rarely be necessary, but could be occassionally
89 helpful for closer inspection of candidate genomic regions.
90
91
92 **include information from pre-calculated vcf dataset**
93
94 During the process of variant extraction the tool can take into account
95 genome positions specified in one or more independently generated VCF datasets.
96 If such additional VCF input is provided, the tool output will contain the
97 samples found in these files as additional samples and sites from the main BCF
98 dataset will be included not only if they qualify as variant sites in at least
99 one sample specified in the BCF, but also if they are listed in any of the
100 additional VCF datasets.
101
102 Optional VCF input can be particularly useful in one of the following
103 situations:
104
105 1) you have prior information that leads you to think that certain genome
106 positions are of special relevance for your project and, thus, you are
107 interested in the statistics produced by the variant caller for these
108 positions even if they are not considered variant sites. In this case you
109 can use a minimal VCF dataset to guide the variant extraction process to
110 include these positions. This dataset needs a minimal header of the form:
111
112 ``##fileformat=VCFv4.2``
113
114 followed by positional information like in this example::
115
116 #CHROM POS ID REF ALT QUAL FILTER INFO
117 chrI 1222 . . . . . .
118 chrI 2651 . . . . . .
119 chrI 3659 . . . . . .
120 chrI 3731 . . . . . .
121
122 , where columns are tab-separated and . serves as a placeholder for missing
123 information.
124
125 2) you have actual variant calls from an additional sample, but you do not
126 have access to the original sequenced reads data (if you had, the
127 recommended approach would be to include that data in the
128 *MiModD Variant Calling* step.
129
130 This situation is often encountered with published datasets. Assume you
131 have obtained a list of known single nucleotide variants (SNVs) found in
132 one particular strain of your favorite model organism and you would like
133 to know which of these SNVs are present in the related strains you have
134 sequenced. You have aligned the sequenced reads from your samples and have
135 used the *MiModD Variant Calling* tool, which has generated a BCF dataset
136 ready for variant extraction. If the SNV list for the previously sequenced
137 strain is in VCF format already, you can now just plug it into the
138 analysis process by specifying it in the tool interface as an
139 *independently generated vcf dataset*.
140 The resulting vcf output will contain all SNV sites along with the variant
141 sites found in the BCF alone. You can then proceed to the
142 *MiModD VCF Filter* tool to look at the original SNV sites only or to
143 investigate any other interesting subset of sites. If the SNV list is in
144 some other format, you will have o convert it to VCF first. At a minimum,
145 the dataset must have a ``##fileformat`` header line like the previous
146 example and have the ``REF`` and ``ALT`` column filled in like so::
147
148 #CHROM POS ID REF ALT QUAL FILTER INFO
149 chrI 1897409 . A G . . .
150 chrI 1897492 . C T . . .
151 chrI 1897616 . C A . . .
152 chrI 1897987 . A T . . .
153 chrI 1898185 . C T . . .
154 chrI 1898715 . G A . . .
155 chrI 1898729 . T C . . .
156 chrI 1900288 . T A . . .
157
158 , in which case the tool will assume that the corresponding sample is
159 homozygous for each of the SNVs.
160 If you need to distinguish between homozygous and heterozygous SNVs you
161 will have to extend the format to include a format and a sample column
162 with genotype (GT) information like in this example::
163
164 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sampleX
165 chrI 1897409 . A G . . . GT 1/1
166 chrI 1897492 . C T . . . GT 0/1
167 chrI 1897616 . C A . . . GT 0/1
168 chrI 1897987 . A T . . . GT 0/1
169 chrI 1898185 . C T . . . GT 0/1
170 chrI 1898715 . G A . . . GT 0/1
171 chrI 1898729 . T C . . . GT 0/1
172 chrI 1900288 . T A . . . GT 0/1
173
174 , in which sampleX would be heterozygous for all SNVs except the first.
175
176 .. class:: warningmark
177
178 If the optional VCF input contains INDEL calls, these will be ignored by the
179 tool.
180
181 @HELP_FOOTER@
182 ]]></help>
183 <expand macro="citations" />
184 </tool>