Mercurial > repos > saskia-hiltemann > virtual_normal_analysis
comparison TV-vs-background.xml @ 0:1209f18a5a83 draft
Uploaded
author | saskia-hiltemann |
---|---|
date | Mon, 03 Aug 2015 05:01:15 -0400 |
parents | |
children | 885ba15c2564 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:1209f18a5a83 |
---|---|
1 <tool id="t-vs-vnormal" name="Virtual Normal Correction SmallVars" version="1.6"> | |
2 <description> Filter small variants based on presence in Virtual Normal set </description> | |
3 | |
4 <requirements> | |
5 <requirement type="package" version="1.7">cgatools</requirement> | |
6 </requirements> | |
7 | |
8 <command interpreter="bash"> | |
9 TV-vs-background.sh | |
10 --variants $variants | |
11 --reference ${reference.fields.reference_crr_cgatools} | |
12 #if $virtnorm.VNset == "diversity": | |
13 --VN_varfiles ${reference.fields.VN_genomes_varfiles_list} | |
14 #else | |
15 --VN_varfiles ${reference.fields.VN_genomes_varfiles_list_1000G} | |
16 #end if | |
17 --threshold $threshold | |
18 --thresholdhc $thresholdhc | |
19 --outputfile_all $output_all | |
20 --outputfile_filtered $output_filtered | |
21 </command> | |
22 | |
23 <inputs> | |
24 <param name="variants" type="data" format="tabular" label="List of Variants as produced by Listvariants program or VCF-2-LV conversion program"/> | |
25 <!--select build--> | |
26 <param name="reference" type="select" label="Select Build"> | |
27 <options from_data_table="virtual_normal_correction" /> | |
28 <filter type="data_meta" ref="variants" key="dbkey" column="0" /> | |
29 </param> | |
30 <conditional name="virtnorm" > | |
31 <param name="VNset" type="select" label="Select Virtual Normal set to use" help="1000Genomes set can only be used for hg19 samples, for hg18 54 genomes will be used."> | |
32 <option value="diversity" > CG Diversity Panel and trios (54 Genomes) </option> | |
33 <option value="thousand" > CG 1000G project genomes (433 Genomes) (hg19 only) </option> | |
34 </param> | |
35 </conditional> | |
36 | |
37 <param name="threshold" type="text" value="1" label="Threshold: Filter variants if present in at least this number of the background genomes"/> | |
38 <param name="thresholdhc" type="text" value="10" label="High Confidence Threshold: Label a somatic variant as high-confidence if locus was fully called in at least this many normal genomes" help="Please adjust according to number of normals used and desired stringency. "/> | |
39 <param name="fname" type="text" value="" label="Prefix for your output file" help="Optional. For example sample name."/> | |
40 <!--<param name="debug" type="select" label="individual level annotations?" help="get a columns per normal sample whether variant was present (only available for fully public normal samples)"> | |
41 <option value="N" > No </option> | |
42 <option value="Y" > Yes </option> | |
43 </param> | |
44 --> | |
45 </inputs> | |
46 | |
47 <outputs> | |
48 <data format="tabular" name="output_all" label="${fname} All variants for ${tool.name} on ${on_string}"/> | |
49 <data format="tabular" name="output_filtered" label="${fname} Filtered variants for ${tool.name} on ${on_string}"/> | |
50 <data format="tabular" name="output_filtered_highconf" label="${fname} High Confidence Filtered variants for ${tool.name} on ${on_string}" from_work_dir="output_filtered_highconf.tsv"/> | |
51 <!--<data format="tabular" name="output_filtered" label="${fname} Filtered variants for ${tool.name} on ${on_string}"/> | |
52 <data format="tabular" name="output_expanded" from_work_dir="output_expanded" label="${fname} expanded annotation for ${tool.name} on ${on_string}"> | |
53 <filter> $debug == "Y" </filter> | |
54 </data> | |
55 --> | |
56 </outputs> | |
57 | |
58 <help> | |
59 **What it does** | |
60 | |
61 This tool compares a list of variants to a set of normal genomes. Each variant will be annotated with the number of normal samples it appears in. | |
62 The tool will also output how often the variant was found in one or both alleles (01 or 11), and distinguish between a variant not being present in the normal (00) | |
63 or the location being no-called in the normal (NN) or half-called (0N,1N) etc. | |
64 | |
65 This may take quite some time depending on the number of input variants and the number of normal genomes. | |
66 | |
67 **Input Files** | |
68 | |
69 This program takes as input a list of variants as produced by the ListVariants tool, or the vcf-to-LV preprocessing tool. Input must be a tab-separated file of the following format:: | |
70 | |
71 variantID - chromosome - begin - end - varType - reference - alleleSeq - xRef | |
72 1034 chr1 972803 972804 snp T C dbsnp:rs31238120 | |
73 | |
74 valid entries in varType column are: snp,sub,ins,del. | |
75 | |
76 Chromosome coordinates must be zero-based half-open. | |
77 | |
78 Column names must match the ones given above. | |
79 | |
80 | |
81 **Output Files** | |
82 | |
83 1) Original input file annotated with presence (or lack thereof) in background genomes | |
84 | |
85 2) Filtered version of output 1, variants are removed when present in at least *threshold* of the background normal genomes (default: 1) (filters on column 9 of output file) | |
86 | |
87 3) High Confidence filtered version of output 2. Of all the variants labelled somatic, filter out any variants not fully called in at least *high confidence threshold* normals. (filter on column 11 of output file) | |
88 | |
89 Example output format:: | |
90 | |
91 variantId chromosome begin end varType reference alleleSeq xRef VN_occurrences VN_frequency VN_fullycalled_count VN_fullycalled_frequency VN_00 VN_01 VN_11 VN_0N VN_1N VN_NN VN_0 VN_1 VN_N | |
92 34 chr1 46661 46662 snp T C dbsnp.100:rs2691309 26 0.472727 33 0.6 7 19 7 1 0 20 0 0 0 | |
93 35 chr1 46850 46850 ins A 0 0 10 0.181818 10 0 0 5 0 39 0 0 0 | |
94 36 chr1 46895 46896 snp T C dbsnp.100:rs2691311 8 0.145455 40 0.727273 33 7 0 2 1 11 0 0 0 | |
95 37 chr1 46926 46927 snp G A dbsnp.100:rs2548884 7 0.127273 43 0.781818 36 7 0 2 0 9 0 0 0 | |
96 | |
97 </help> | |
98 | |
99 </tool> | |
100 | |
101 |