annotate varextract.xml @ 0:76cce6f89a9e draft

Imported from capsule None
author wolma
date Sat, 13 Dec 2014 17:19:20 -0500
parents
children c636b35c5e59
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
1 <tool id="extract_variants" name="Extract Variant Sites">
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
2 <description>from a BCF file</description>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
3 <requirements>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
4 <requirement type="package" version="0.1.5">mimodd</requirement>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
5 </requirements>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
6 <version_command>mimodd version -q</version_command>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
7 <command>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
8 mimodd varextract $ifile
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
9 #if $len($sitesinfo)
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
10 -p
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
11 #for $source in $sitesinfo
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
12 "${source.pre_vcf}"
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
13 #end for
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
14 #end if
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
15 --ofile $output_vcf
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
16 $keep_alts
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
17 --verbose
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
18 --quiet
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
19 </command>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
20
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
21 <inputs>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
22 <param name="ifile" type="data" format="bcf" label="BCF input file" help="Use the Variant Calling tool to generate the input for this tool."/>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
23 <repeat name="sitesinfo" title="include information from pre-calculated vcf file" default="0">
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
24 <param name="pre_vcf" type="data" format="vcf" label="independently generated vcf file" />
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
25 </repeat>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
26 <param name="keep_alts" type="boolean" label="keep all sites with alternate bases" truevalue="-a" falsevalue="" checked="false" help="If selected, the VCF output will include ALL sites for which non-reference bases have been observed, i.e., even those not considered allelic sites by the variant caller." />
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
27 </inputs>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
28 <outputs>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
29 <data name="output_vcf" format="vcf" label="Variants extracted with MiModd from ${on_string}"/>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
30 </outputs>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
31
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
32 <help>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
33 .. class:: infomark
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
34
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
35 **What it does**
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
36
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
37 The tool takes as input a BCF file like the ones produced by the *Variant Calling* tool, extracts just the variant sites from it and reports them in VCF format.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
38
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
39 If the BCF input file specifies multiple samples, sites are included if they qualify as variant sites in at least one sample.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
40
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
41 In a typical analysis workflow, you will use the tool's VCF output as input for the *VCF Filter* tool to cut down the often still impressive list of sites to a subset with relevance to your project.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
42
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
43 **Options:**
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
44
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
45 1) By default, a variant site is considered to be a position in the genome for which a non-reference allele appears in the inferred genotype of any sample.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
46
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
47 You can select the *keep all sites with alternate bases* option, if instead you want to extract all sites, for which at least one non-reference base has been observed (whether resulting in a non-reference allele call or not). Using this option should rarely be necessary, but could be occassionally helpful for closer inspection of candidate genomic regions.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
48
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
49 2) During the process of variant extraction the tool can take into account genome positions specified in one or more independently generated VCF files. If such additional VCF input is provided, the tool output will contain the samples found in these files as additional samples and sites from the main BCF file will be included if they either qualify as variant sites in at least one sample specified in the BCF or if they are listed in any of the additional VCF files.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
50
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
51 Optional VCF input can be particularly useful in one of the following situations:
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
52
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
53 *scenario i* - you have prior information that leads you to think that certain genome positions are of special relevance for your project and, thus, you are interested in the statistics produced by the variant caller for these positions even if they are not considered variant sites. In this case you can use a minimal VCF file to guide the variant extraction process to include these positions. This minimal VCF file needs a minimal header:
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
54
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
55 ``##fileformat=VCFv4.2``
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
56
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
57 followed by positional information like in this example::
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
58
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
59 #CHROM POS ID REF ALT QUAL FILTER INFO
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
60 chrI 1222 . . . . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
61 chrI 2651 . . . . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
62 chrI 3659 . . . . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
63 chrI 3731 . . . . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
64
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
65 , where columns are tab-separated and . serves as a placeholder for missing information.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
66
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
67 *scenario ii* - you have actual variant calls from an additional sample, but you do not have access to the original sequenced reads data (if you had, the recommended approach would be to align this data along with your other sequencing data or, at least, to perform the *Variant Calling* step together).
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
68
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
69 This situation is often encountered with published datasets. Assume you have obtained a list of known single nucleotide variants (SNVs) found in one particular strain of your favorite model organism and you would like to know which of these SNVs are present in the related strains you have sequenced. You have aligned the sequenced reads from your samples and have used the *Variant Calling* tool, which has generated a BCF file ready for variant extraction. If the SNV list for the previously sequenced strain is in VCF format already, you can now just plug it into the analysis process by specifying it in the tool interface as an *independently generated vcf file*. The resulting vcf output file will contain all SNV sites along with the variant sites found in the BCF alone. You can then proceed to the *VCF Filter* tool to look at the original SNV sites only or to investigate any other interesting subset of sites. If the SNV list is in some other format, you will have o convert it to VCF first. At a minimum, the file must have a ``##fileformat`` header line like the previous example and have the ``REF`` and ``ALT`` column filled in like so::
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
70
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
71 #CHROM POS ID REF ALT QUAL FILTER INFO
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
72 chrI 1897409 . A G . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
73 chrI 1897492 . C T . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
74 chrI 1897616 . C A . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
75 chrI 1897987 . A T . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
76 chrI 1898185 . C T . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
77 chrI 1898715 . G A . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
78 chrI 1898729 . T C . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
79 chrI 1900288 . T A . . .
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
80
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
81 , in which case the tool will assume that the corresponding sample is homozygous for each of the SNVs. If you need to distinguish between homozygous and heterozygous SNVs you will have to extend the format to include a format and a sample column with genotype (GT) information like in this example::
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
82
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
83 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sampleX
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
84 chrI 1897409 . A G . . . GT 1/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
85 chrI 1897492 . C T . . . GT 0/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
86 chrI 1897616 . C A . . . GT 0/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
87 chrI 1897987 . A T . . . GT 0/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
88 chrI 1898185 . C T . . . GT 0/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
89 chrI 1898715 . G A . . . GT 0/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
90 chrI 1898729 . T C . . . GT 0/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
91 chrI 1900288 . T A . . . GT 0/1
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
92
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
93 , in which sampleX would be heterozygous for all SNVs except the first.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
94
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
95 .. class:: warningmark
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
96
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
97 If the optional VCF input contains INDEL calls, these will be ignored by the tool.
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
98
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
99
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
100 </help>
76cce6f89a9e Imported from capsule None
wolma
parents:
diff changeset
101 </tool>