annotate mutspecAnnot.xml @ 0:8c682b3a7c5b draft

Uploaded
author iarc
date Tue, 19 Apr 2016 03:07:11 -0400
parents
children 748b7a8b634c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
1 <tool id="mutSpecannot" name="MutSpec Annot" version="0.1" hidden="false">
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
2 <description>Annotate variants with ANNOVAR and other databases</description>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
3
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
4 <requirements>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
5 <requirement type="set_environment">SCRIPT_PATH</requirement>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
6 <requirement type="package" version="5.18.1">perl</requirement>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
7 </requirements>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
8
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
9 <command interpreter="bash">
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
10 mutspecAnnot_wrapper.sh
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
11 $output
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
12 --refGenome ${refGenome}
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
13 --AVDB ${refGenome.fields.path}
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
14 --interval $interval
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
15 --fullAnnotation ${annotation_type}
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
16 $input
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
17 </command>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
18
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
19 <inputs>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
20 <param name="input" type="data" format="txt" label="Input file" help="Select a single file, multiple files or a dataset collection"/>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
21
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
22 <param name="refGenome" type="select" label="Reference genome" help="Select the reference genome that was used for generating your data">
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
23 <options from_data_table="annovar_index" />
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
24 </param>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
25
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
26 <param name="interval" type="text" value="10" label="Sequence context of variants" help="Number of retrieved bases that flank variants in 5' and 3'"/>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
27
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
28 <param name="annotation_type" type="boolean" checked="true" truevalue="yes" falsevalue="no" label="Complete annotations" help="Select No if you have a file with millions of variants and you are just interested in having a quick overview of the mutational spectrum. Only the annotation from refGene, the strand orientation and the sequence context will be added." />
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
29
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
30 </inputs>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
31
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
32 <outputs>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
33 <data name="output" type="data" format="tabular" label="${input.name} annotated" />
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
34 </outputs>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
35
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
36
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
37 <stdio>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
38 <regex match="ANNOVAR LOG FILE"
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
39 source="stdout"
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
40 level="fatal"
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
41 description="Read Annovar log file for more information" />
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
42 </stdio>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
43
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
44 <help>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
45
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
46 **What it does**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
47
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
48 MutSpect-Annot provides functional annotations from `ANNOVAR software`__ (June 2015 version is provided here), as well as the strand transcript orientation (from refGene database) and sequence context of variants (extrated from the reference genome selected).
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
49
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
50 .. __: http://www.openbioinformatics.org/annovar/
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
51
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
52 --------------------------------------------------------------------------------------------------------------------------------------------------
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
53
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
54 **Input formats**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
55
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
56 MutSpect-Annot accepts files in VCF (version 4.1) or in tab-delimited (TAB) format.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
57
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
58 .. class:: infomark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
59
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
60 TIP: If your data is not TAB delimited, use *Text manipulation -> convert*
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
61
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
62 .. class:: warningmark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
63
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
64 Filenames must be &#60;= 31 characters.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
65
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
66 .. class:: warningmark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
67
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
68 These files should contain at least four columns describing for each variant, the chromosome number, the start genomic position, the reference allele and the alternate allele
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
69
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
70 .. class:: warningmark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
71
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
72 The tool supports different column names (**names are case-sensitive**) depending on the source file as follows:
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
73
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
74 **mutect** : contig position ref_allele alt_allele
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
75
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
76 **vcf** : CHROM POS REF ALT
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
77
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
78 **cosmic** : Mutation_GRCh37_chromosome_number Mutation_GRCh37_genome_position Description_Ref_Genomic Description_Alt_Genomic
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
79
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
80 **icgc** : chromosome chromosome_start reference_genome_allele mutated_to_allele
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
81
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
82 **tcga** : Chromosome Start_position Reference_Allele Tumor_Seq_Allele2
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
83
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
84 **ionTorrent** : chr Position Ref Alt
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
85
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
86 **proton** : Chrom Position Ref Variant
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
87
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
88 **varScan2** : Chrom Position Ref VarAllele
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
89
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
90 **annovar** : Chr Start Ref Obs
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
91
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
92 **custom** : Chromosome Start Wild_Type Mutant
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
93
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
94 .. class:: infomark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
95
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
96 For MuTect output files, only confident calls are considered (variants containing the string REJECT in the judgement column are not annotated and excluded from the MutSpect-Annot output) as other calls are very likely to be dubious calls or artefacts.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
97
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
98 .. class:: infomark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
99
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
100 For COSMIC and ICGC files, variants are reported on several transcripts. These duplicate variants need to be remove before annotated the file.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
101
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
102 .. class:: warningmark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
103
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
104 If multiple input files are specified they should be from the **same genome build**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
105
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
106
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
107 --------------------------------------------------------------------------------------------------------------------------------------------------
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
108
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
109 **Output**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
110
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
111 The output is a tabular text file, that contains the retrieved annotations in the first columns and all columns from the original file at the end.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
112
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
113 .. class:: infomark
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
114
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
115 Variants on chromosome M and random chromosomes are not considered for the annotation and excluded from MutSpec-Annot output.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
116
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
117 The following annotations are retrieved:
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
118
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
119 **ANNOVAR annotations**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
120
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
121 An example of annotations retrieved by the tool.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
122
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
123 Gene-based: RefSeqGene, UCSC Known Gene and Ensembl Gene
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
124
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
125 Region-based: localization of the variant on cytogenetic band (cytoBand), variant reported in Genome-Wide association studies (gwasCatalog) and variant mapped to segmental duplications (genomicSuperDups)
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
126
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
127 Filter-based:
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
128
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
129 - dbSNP: For human genome there is two versions available: the defaul version (snp) and a pre-filtered version (snpNonFlagged). In the pre-filtered version all SNPs &#139; 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, or flagged in dbSnp as clinically associated are removed from the full dbSNP database and therefore not present in this version.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
130
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
131 - 1000 Genomes Project (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), SAS (South Asian))
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
132
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
133 - ESP: Exome Sequencing Project (ALL, AA (African American), EA (European American))
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
134
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
135 - ExAC: Exome Aggregation Consortium (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), FIN (Finnish), NFE (Non-finnish European), OTH (other), SAS (South Asian))
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
136
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
137 - LJB26: SIFT, PolyPhen-2 (HDIV and HVAR)
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
138
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
139 **Transcript orientation**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
140
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
141 The strand annotation corresponding to transcript orientation within genic regions is recovered from RefSeqGene database.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
142
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
143 **Sequence context**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
144
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
145 Flanking bases in both sides in 5' and 3' of the variant position retrieved from the reference genome used.
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
146
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
147 --------------------------------------------------------------------------------------------------------------------------------------------------
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
148
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
149 **Example**
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
150
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
151 Annotate the following file::
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
152
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
153 Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
154 chr7 121717919 121717920 - G
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
155 chr1 230846235 230846235 T A
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
156 chr14 33290999 33290999 A G
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
157 chr12 8082458 8082458 C T
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
158 chr4 70156391 70156391 T C
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
159
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
160 Will produce::
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
161
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
162 Chr Start End Ref Alt Func.refGene Gene.refGene ExonicFunc.refGene AAChange.refGene genomicSuperDups snp138 1000g2014oct_all esp6500si_all Strand context Chromosome Start_Position End_Position Reference_Allele Tumor_Seq_Allele2
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
163 chr7 121717919 121717920 - G exonic AASS frameshift insertion AASS:NM_005763:exon23:c.2634dupC:p.A879fs NA rs147476318 NA NA - GCG chr7 121717919 121717920 - G
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
164 chr1 230846235 230846235 T A exonic AGT nonsynonymous SNV AGT:NM_000029:exon2:c.A362T:p.H121L NA NA NA NA - GTG chr1 230846235 230846235 T A
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
165 chr14 33290999 33290999 A G exonic AKAP6 nonsynonymous SNV AKAP6:NM_004274:exon13:c.A3980G:p.D1327G NA NA NA NA + GAC chr14 33290999 33290999 A G
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
166 chr12 8082458 8082458 C T exonic SLC2A3 nonsynonymous SNV SLC2A3:NM_006931:exon6:c.G683A:p.R228Q NA rs200481428 0.000199681 NA - CCG chr12 8082458 8082458 C T
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
167 chr4 70156391 70156391 T C exonic UGT2B28 nonsynonymous SNV UGT2B28:NM_053039:exon5:c.T1172C:p.V391A score=0.949699;Name=chr4:70035680 NA 0.000199681 NA + GTA chr4 70156391 70156391 T C
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
168
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
169
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
170 </help>
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
171
8c682b3a7c5b Uploaded
iarc
parents:
diff changeset
172 </tool>