Mercurial > repos > saskia-hiltemann > virtual_normal_analysis
view TV-vs-background.xml @ 4:58815aed4ec3 draft default tip
few bugfixes in VCF-2-variantlist
author | saskia-hiltemann |
---|---|
date | Wed, 04 Nov 2015 05:06:12 -0500 |
parents | 885ba15c2564 |
children |
line wrap: on
line source
<tool id="t-vs-vnormal" name="Virtual Normal Correction SmallVars" version="1.7"> <description> Filter small variants based on presence in Virtual Normal set </description> <requirements> <requirement type="package" version="1.7">cgatools</requirement> </requirements> <command interpreter="bash"> TV-vs-background.sh --variants $variants --reference ${reference.fields.reference_crr_cgatools} --VN_varfiles "${reference.fields.VN_genomes_varfiles_list}${VNset}" --threshold $threshold --thresholdhc $thresholdhc --outputfile_all $output_all --outputfile_filtered $output_filtered </command> <inputs> <param name="variants" type="data" format="tabular" label="List of Variants as produced by Listvariants program or VCF-2-LV conversion program"/> <!--select build--> <param name="reference" type="select" label="Select Build"> <options from_data_table="virtual_normal_correction" /> </param> <!-- edit these options to reflect sets of normal you have available. The values must name files within the directories specified in data_table_conf.xml file --> <param name="VNset" type="select" label="Select Virtual Normal set to use" help="1000Genomes set can only be used for hg19 samples, for hg18 54 genomes will be used."> <option value="46_diversity.txt" > CG Diversity Panel and trios (54 Genomes) </option> <option value="433_1000g.txt" > CG 1000G project genomes (433 Genomes) (hg19 only) </option> <option value="479_diversity_1000g.txt" > Diversity and 1000G (479 genomes) (hg19 only) </option> <option value="10_tutorial.txt" > Small VN for tutorial (10 Genomes) </option> </param> <param name="threshold" type="text" value="1" label="Filter out variants present in at least this number of the virtual normal genomes"/> <param name="thresholdhc" type="text" value="10" label="High Confidence Threshold: Label a somatic variant as high-confidence if locus was fully called in at least this many normal genomes" help="Please adjust according to number of normals used and desired stringency. "/> <param name="fname" type="text" value="" label="Prefix for your output file" help="Optional. For example sample name."/> </inputs> <outputs> <data format="tabular" name="output_all" label="All variants for ${tool.name} on ${on_string}"/> <data format="tabular" name="output_filtered" label="Filtered variants for ${tool.name} on ${on_string}"/> <data format="tabular" name="output_filtered_highconf" label="${fname} High Confidence Filtered variants for ${tool.name} on ${on_string}" from_work_dir="output_filtered_highconf.tsv"/> </outputs> <help> **What it does** This tool compares a list of variants to a set of normal genomes. Each variant will be annotated with the number of normal samples it appears in. The tool will also output how often the variant was found in one or both alleles (01 or 11), and distinguish between a variant not being present in the normal (00) or the location being no-called in the normal (NN) or half-called (0N,1N) etc. This may take quite some time depending on the number of input variants and the number of normal genomes. **Input Files** This program takes as input a list of variants as produced by the ListVariants tool, or the vcf-to-LV preprocessing tool. Input must be a tab-separated file of the following format:: variantID - chromosome - begin - end - varType - reference - alleleSeq - xRef 1034 chr1 972803 972804 snp T C dbsnp:rs31238120 valid entries in varType column are: snp,sub,ins,del. Chromosome coordinates must be zero-based half-open. Column names must match the ones given above. **Output Files** 1) Original input file annotated with presence (or lack thereof) in background genomes 2) Filtered version of output 1, variants are removed when present in at least *threshold* of the background normal genomes (default: 1) (filters on column 9 of output file) 3) High Confidence filtered version of output 2. Of all the variants labelled somatic, filter out any variants not fully called in at least *high confidence threshold* normals. (filter on column 11 of output file) Example output format:: variantId chromosome begin end varType reference alleleSeq xRef VN_occurrences VN_frequency VN_fullycalled_count VN_fullycalled_frequency VN_00 VN_01 VN_11 VN_0N VN_1N VN_NN VN_0 VN_1 VN_N 34 chr1 46661 46662 snp T C dbsnp.100:rs2691309 26 0.472727 33 0.6 7 19 7 1 0 20 0 0 0 35 chr1 46850 46850 ins A 0 0 10 0.181818 10 0 0 5 0 39 0 0 0 36 chr1 46895 46896 snp T C dbsnp.100:rs2691311 8 0.145455 40 0.727273 33 7 0 2 1 11 0 0 0 37 chr1 46926 46927 snp G A dbsnp.100:rs2548884 7 0.127273 43 0.781818 36 7 0 2 0 9 0 0 0 </help> </tool>