comparison mutspecAnnot.xml @ 0:8c682b3a7c5b draft

Uploaded
author iarc
date Tue, 19 Apr 2016 03:07:11 -0400
parents
children 748b7a8b634c
comparison
equal deleted inserted replaced
-1:000000000000 0:8c682b3a7c5b
1 <tool id="mutSpecannot" name="MutSpec Annot" version="0.1" hidden="false">
2 <description>Annotate variants with ANNOVAR and other databases</description>
3
4 <requirements>
5 <requirement type="set_environment">SCRIPT_PATH</requirement>
6 <requirement type="package" version="5.18.1">perl</requirement>
7 </requirements>
8
9 <command interpreter="bash">
10 mutspecAnnot_wrapper.sh
11 $output
12 --refGenome ${refGenome}
13 --AVDB ${refGenome.fields.path}
14 --interval $interval
15 --fullAnnotation ${annotation_type}
16 $input
17 </command>
18
19 <inputs>
20 <param name="input" type="data" format="txt" label="Input file" help="Select a single file, multiple files or a dataset collection"/>
21
22 <param name="refGenome" type="select" label="Reference genome" help="Select the reference genome that was used for generating your data">
23 <options from_data_table="annovar_index" />
24 </param>
25
26 <param name="interval" type="text" value="10" label="Sequence context of variants" help="Number of retrieved bases that flank variants in 5' and 3'"/>
27
28 <param name="annotation_type" type="boolean" checked="true" truevalue="yes" falsevalue="no" label="Complete annotations" help="Select No if you have a file with millions of variants and you are just interested in having a quick overview of the mutational spectrum. Only the annotation from refGene, the strand orientation and the sequence context will be added." />
29
30 </inputs>
31
32 <outputs>
33 <data name="output" type="data" format="tabular" label="${input.name} annotated" />
34 </outputs>
35
36
37 <stdio>
38 <regex match="ANNOVAR LOG FILE"
39 source="stdout"
40 level="fatal"
41 description="Read Annovar log file for more information" />
42 </stdio>
43
44 <help>
45
46 **What it does**
47
48 MutSpect-Annot provides functional annotations from `ANNOVAR software`__ (June 2015 version is provided here), as well as the strand transcript orientation (from refGene database) and sequence context of variants (extrated from the reference genome selected).
49
50 .. __: http://www.openbioinformatics.org/annovar/
51
52 --------------------------------------------------------------------------------------------------------------------------------------------------
53
54 **Input formats**
55
56 MutSpect-Annot accepts files in VCF (version 4.1) or in tab-delimited (TAB) format.
57
58 .. class:: infomark
59
60 TIP: If your data is not TAB delimited, use *Text manipulation -> convert*
61
62 .. class:: warningmark
63
64 Filenames must be &#60;= 31 characters.
65
66 .. class:: warningmark
67
68 These files should contain at least four columns describing for each variant, the chromosome number, the start genomic position, the reference allele and the alternate allele
69
70 .. class:: warningmark
71
72 The tool supports different column names (**names are case-sensitive**) depending on the source file as follows:
73
74 **mutect** : contig position ref_allele alt_allele
75
76 **vcf** : CHROM POS REF ALT
77
78 **cosmic** : Mutation_GRCh37_chromosome_number Mutation_GRCh37_genome_position Description_Ref_Genomic Description_Alt_Genomic
79
80 **icgc** : chromosome chromosome_start reference_genome_allele mutated_to_allele
81
82 **tcga** : Chromosome Start_position Reference_Allele Tumor_Seq_Allele2
83
84 **ionTorrent** : chr Position Ref Alt
85
86 **proton** : Chrom Position Ref Variant
87
88 **varScan2** : Chrom Position Ref VarAllele
89
90 **annovar** : Chr Start Ref Obs
91
92 **custom** : Chromosome Start Wild_Type Mutant
93
94 .. class:: infomark
95
96 For MuTect output files, only confident calls are considered (variants containing the string REJECT in the judgement column are not annotated and excluded from the MutSpect-Annot output) as other calls are very likely to be dubious calls or artefacts.
97
98 .. class:: infomark
99
100 For COSMIC and ICGC files, variants are reported on several transcripts. These duplicate variants need to be remove before annotated the file.
101
102 .. class:: warningmark
103
104 If multiple input files are specified they should be from the **same genome build**
105
106
107 --------------------------------------------------------------------------------------------------------------------------------------------------
108
109 **Output**
110
111 The output is a tabular text file, that contains the retrieved annotations in the first columns and all columns from the original file at the end.
112
113 .. class:: infomark
114
115 Variants on chromosome M and random chromosomes are not considered for the annotation and excluded from MutSpec-Annot output.
116
117 The following annotations are retrieved:
118
119 **ANNOVAR annotations**
120
121 An example of annotations retrieved by the tool.
122
123 Gene-based: RefSeqGene, UCSC Known Gene and Ensembl Gene
124
125 Region-based: localization of the variant on cytogenetic band (cytoBand), variant reported in Genome-Wide association studies (gwasCatalog) and variant mapped to segmental duplications (genomicSuperDups)
126
127 Filter-based:
128
129 - dbSNP: For human genome there is two versions available: the defaul version (snp) and a pre-filtered version (snpNonFlagged). In the pre-filtered version all SNPs &#139; 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, or flagged in dbSnp as clinically associated are removed from the full dbSNP database and therefore not present in this version.
130
131 - 1000 Genomes Project (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), SAS (South Asian))
132
133 - ESP: Exome Sequencing Project (ALL, AA (African American), EA (European American))
134
135 - ExAC: Exome Aggregation Consortium (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), FIN (Finnish), NFE (Non-finnish European), OTH (other), SAS (South Asian))
136
137 - LJB26: SIFT, PolyPhen-2 (HDIV and HVAR)
138
139 **Transcript orientation**
140
141 The strand annotation corresponding to transcript orientation within genic regions is recovered from RefSeqGene database.
142
143 **Sequence context**
144
145 Flanking bases in both sides in 5' and 3' of the variant position retrieved from the reference genome used.
146
147 --------------------------------------------------------------------------------------------------------------------------------------------------
148
149 **Example**
150
151 Annotate the following file::
152
153 Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2
154 chr7 121717919 121717920 - G
155 chr1 230846235 230846235 T A
156 chr14 33290999 33290999 A G
157 chr12 8082458 8082458 C T
158 chr4 70156391 70156391 T C
159
160 Will produce::
161
162 Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups snp138 1000g2014oct_all esp6500si_all Strand context Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2
163 chr7 121717919 121717920 - G exonic AASS frameshift insertion AASS:NM_005763:exon23:c.2634dupC:p.A879fs NA rs147476318 NA NA - GCG chr7 121717919 121717920 - G
164 chr1 230846235 230846235 T A exonic AGT nonsynonymous SNV AGT:NM_000029:exon2:c.A362T:p.H121L NA NA NA NA - GTG chr1 230846235 230846235 T A
165 chr14 33290999 33290999 A G exonic AKAP6 nonsynonymous SNV AKAP6:NM_004274:exon13:c.A3980G:p.D1327G NA NA NA NA + GAC chr14 33290999 33290999 A G
166 chr12 8082458 8082458 C T exonic SLC2A3 nonsynonymous SNV SLC2A3:NM_006931:exon6:c.G683A:p.R228Q NA rs200481428 0.000199681 NA - CCG chr12 8082458 8082458 C T
167 chr4 70156391 70156391 T C exonic UGT2B28 nonsynonymous SNV UGT2B28:NM_053039:exon5:c.T1172C:p.V391A score=0.949699;Name=chr4:70035680 NA 0.000199681 NA + GTA chr4 70156391 70156391 T C
168
169
170 </help>
171
172 </tool>