Mercurial > repos > wolma > mimodd_core
comparison varextract.xml @ 0:aa82b2e54055 draft
planemo upload for repository https://github.com/wm75/mimodd_galaxy_wrappers commit b36048cd608ede0ec6f6559648525c9350caae34-dirty
author | wolma |
---|---|
date | Sat, 11 Nov 2017 18:19:22 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:aa82b2e54055 |
---|---|
1 <tool id="mimodd_varextract" name="MiModD Extract Variant Sites" | |
2 version="@MIMODD_WRAPPER_VERSION@"> | |
3 <description>from a BCF file</description> | |
4 <macros> | |
5 <import>macros.xml</import> | |
6 </macros> | |
7 <expand macro="requirements" /> | |
8 <expand macro="stdio" /> | |
9 <expand macro="version_command" /> | |
10 <command><![CDATA[ | |
11 mimodd varextract '$ifile' | |
12 #if $len($sitesinfo) | |
13 -p | |
14 #for $source in $sitesinfo | |
15 '${source.pre_vcf}' | |
16 #end for | |
17 #end if | |
18 --ofile '$output_vcf' | |
19 $keep_alts | |
20 --verbose | |
21 ]]></command> | |
22 | |
23 <inputs> | |
24 <param name="ifile" type="data" format="bcf" label="BCF input file" | |
25 help="Use the MiModD Variant Calling tool to generate the input for this tool."/> | |
26 <repeat name="sitesinfo" title="include information from pre-calculated vcf dataset" default="0"> | |
27 <param name="pre_vcf" type="data" format="vcf" | |
28 label="independently generated vcf datset" /> | |
29 </repeat> | |
30 <param name="keep_alts" type="boolean" truevalue="-a" falsevalue="" checked="false" | |
31 label="keep all sites with alternate bases" | |
32 help="If selected, the VCF output will include ALL sites for which non-reference bases have been observed, i.e., even those not considered allelic sites by the variant caller." /> | |
33 </inputs> | |
34 <outputs> | |
35 <data name="output_vcf" format="vcf" | |
36 label="Variants extracted with MiModd from ${on_string}"/> | |
37 </outputs> | |
38 | |
39 <tests> | |
40 <test> | |
41 <param name="ifile" value="a.bcf" /> | |
42 <output name="output_vcf" ftype="vcf"> | |
43 <assert_contents> | |
44 <has_line_matching expression="#CHROM.POS.ID.REF.ALT.QUAL.FILTER.INFO.FORMAT.N2.ot266" /> | |
45 </assert_contents> | |
46 </output> | |
47 <assert_command> | |
48 <not_has_text text="-a" /> | |
49 </assert_command> | |
50 </test> | |
51 <test> | |
52 <param name="ifile" value="a_part2.bcf" /> | |
53 <param name="keep_alts" value="true" /> | |
54 <param name="pre_vcf" value="a.vcf" /> | |
55 <output name="output_vcf" ftype="vcf"> | |
56 <assert_contents> | |
57 <has_line_matching expression="#CHROM.POS.ID.REF.ALT.QUAL.FILTER.INFO.FORMAT.ot266.external_source_1_N2.external_source_1_ot266" /> | |
58 </assert_contents> | |
59 </output> | |
60 <assert_command> | |
61 <has_text text="-a" /> | |
62 </assert_command> | |
63 </test> | |
64 </tests> | |
65 <help><![CDATA[ | |
66 .. class:: infomark | |
67 | |
68 **What it does** | |
69 | |
70 The tool takes as input a BCF dataset like the ones produced by the | |
71 *MiModD Variant Calling* tool, extracts just the variant sites from it and | |
72 reports them in VCF format. | |
73 | |
74 If the BCF input file specifies multiple samples, sites are included if they qualify as variant sites in at least one sample. | |
75 | |
76 ---------- | |
77 | |
78 **Options:** | |
79 | |
80 **keep all sites with alternate bases** | |
81 | |
82 By default, a variant site is considered to be a position in the genome for | |
83 which a non-reference allele appears in the inferred genotype of any sample. | |
84 | |
85 You can select the *keep all sites with alternate bases* option, if instead | |
86 you want to extract all sites, for which at least one non-reference base has | |
87 been observed (whether resulting in a non-reference allele call or not). | |
88 Using this option should rarely be necessary, but could be occassionally | |
89 helpful for closer inspection of candidate genomic regions. | |
90 | |
91 | |
92 **include information from pre-calculated vcf dataset** | |
93 | |
94 During the process of variant extraction the tool can take into account | |
95 genome positions specified in one or more independently generated VCF datasets. | |
96 If such additional VCF input is provided, the tool output will contain the | |
97 samples found in these files as additional samples and sites from the main BCF | |
98 dataset will be included not only if they qualify as variant sites in at least | |
99 one sample specified in the BCF, but also if they are listed in any of the | |
100 additional VCF datasets. | |
101 | |
102 Optional VCF input can be particularly useful in one of the following | |
103 situations: | |
104 | |
105 1) you have prior information that leads you to think that certain genome | |
106 positions are of special relevance for your project and, thus, you are | |
107 interested in the statistics produced by the variant caller for these | |
108 positions even if they are not considered variant sites. In this case you | |
109 can use a minimal VCF dataset to guide the variant extraction process to | |
110 include these positions. This dataset needs a minimal header of the form: | |
111 | |
112 ``##fileformat=VCFv4.2`` | |
113 | |
114 followed by positional information like in this example:: | |
115 | |
116 #CHROM POS ID REF ALT QUAL FILTER INFO | |
117 chrI 1222 . . . . . . | |
118 chrI 2651 . . . . . . | |
119 chrI 3659 . . . . . . | |
120 chrI 3731 . . . . . . | |
121 | |
122 , where columns are tab-separated and . serves as a placeholder for missing | |
123 information. | |
124 | |
125 2) you have actual variant calls from an additional sample, but you do not | |
126 have access to the original sequenced reads data (if you had, the | |
127 recommended approach would be to include that data in the | |
128 *MiModD Variant Calling* step. | |
129 | |
130 This situation is often encountered with published datasets. Assume you | |
131 have obtained a list of known single nucleotide variants (SNVs) found in | |
132 one particular strain of your favorite model organism and you would like | |
133 to know which of these SNVs are present in the related strains you have | |
134 sequenced. You have aligned the sequenced reads from your samples and have | |
135 used the *MiModD Variant Calling* tool, which has generated a BCF dataset | |
136 ready for variant extraction. If the SNV list for the previously sequenced | |
137 strain is in VCF format already, you can now just plug it into the | |
138 analysis process by specifying it in the tool interface as an | |
139 *independently generated vcf dataset*. | |
140 The resulting vcf output will contain all SNV sites along with the variant | |
141 sites found in the BCF alone. You can then proceed to the | |
142 *MiModD VCF Filter* tool to look at the original SNV sites only or to | |
143 investigate any other interesting subset of sites. If the SNV list is in | |
144 some other format, you will have o convert it to VCF first. At a minimum, | |
145 the dataset must have a ``##fileformat`` header line like the previous | |
146 example and have the ``REF`` and ``ALT`` column filled in like so:: | |
147 | |
148 #CHROM POS ID REF ALT QUAL FILTER INFO | |
149 chrI 1897409 . A G . . . | |
150 chrI 1897492 . C T . . . | |
151 chrI 1897616 . C A . . . | |
152 chrI 1897987 . A T . . . | |
153 chrI 1898185 . C T . . . | |
154 chrI 1898715 . G A . . . | |
155 chrI 1898729 . T C . . . | |
156 chrI 1900288 . T A . . . | |
157 | |
158 , in which case the tool will assume that the corresponding sample is | |
159 homozygous for each of the SNVs. | |
160 If you need to distinguish between homozygous and heterozygous SNVs you | |
161 will have to extend the format to include a format and a sample column | |
162 with genotype (GT) information like in this example:: | |
163 | |
164 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sampleX | |
165 chrI 1897409 . A G . . . GT 1/1 | |
166 chrI 1897492 . C T . . . GT 0/1 | |
167 chrI 1897616 . C A . . . GT 0/1 | |
168 chrI 1897987 . A T . . . GT 0/1 | |
169 chrI 1898185 . C T . . . GT 0/1 | |
170 chrI 1898715 . G A . . . GT 0/1 | |
171 chrI 1898729 . T C . . . GT 0/1 | |
172 chrI 1900288 . T A . . . GT 0/1 | |
173 | |
174 , in which sampleX would be heterozygous for all SNVs except the first. | |
175 | |
176 .. class:: warningmark | |
177 | |
178 If the optional VCF input contains INDEL calls, these will be ignored by the | |
179 tool. | |
180 | |
181 @HELP_FOOTER@ | |
182 ]]></help> | |
183 <expand macro="citations" /> | |
184 </tool> |