Repository 'annovar'
hg clone https://toolshed.g2.bx.psu.edu/repos/saskia-hiltemann/annovar

Changeset 5:4600be69b96f (2015-10-01)
Previous changeset 4:e423536a0780 (2014-04-10) Next changeset 6:a3b16fd125c3 (2015-10-01)
Commit message:
Added databases 1000g2015aug, SPIDEX, avsnp138, avsnp142, exac03
modified:
tools/annovar/annovar.sh
tools/annovar/annovar.xml
added:
tools/README
tools/tool-data/annovar.loc.sample
tools/tool_data_table_conf.xml.sample
tools/tool_dependencies.xml
tools/tools/annovar/annovar.sh
tools/tools/annovar/annovar.xml
b
diff -r e423536a0780 -r 4600be69b96f tools/README
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/README Thu Oct 01 04:24:45 2015 -0400
[
@@ -0,0 +1,242 @@
+ANNOVAR needs to be installed manually in the following way:
+
+
+1) If you already have ANNOVAR installed on your system, simply edit the tool-data/annovar.loc file to reflect locations of 
+ the perl scripts (annotate_variation.pl and convert2annovar.pl) and humandb directory (directory containing the annovar database files)
+1b) Restart galaxy instance for changes in .loc file to take effect
+
+
+2) If you do not have ANNOVAR installed, request annovar download and sign license here: 
+ http://www.openbioinformatics.org/annovar/annovar_download_form.php
+
+ 3) Once downloaded, install annovar per the installation instructions and edit annovar.loc file to reflect location of directory containing perl scripts.
+ tool uses annotate_variation.pl  and  convert2annovar.pl
+
+ 4) Then download all desired databases for all desired builds as follows:
+ annotate_variation.pl -downdb -buildver <build> [-webfrom annovar] <database> <humandb>
+
+ where <humandb> is location where all database files should be stored
+ and <database> is the database file to download, e.g. refGene (see bottom of document for all available database files at the time of writing this tool)
+ and <build> can be hg18 or hg19 for humans, also other organisms available.
+
+ list of all available databases can be found here: http://www.openbioinformatics.org/annovar/annovar_db.html
+
+ 5) edit the tool-data/annovar.loc file to reflect location of humandb folder
+ 5b) restart galaxy instance for changes in .loc file to take effect
+
+6) Tool uses cgatools join for combining of files, this should be installed automatically with repository. If not, get a copy from Complete Genomics directly:
+ wget http://sourceforge.net/projects/cgatools/files/1.7.1/cgatools-1.7.1.5-linux_binary-x86_64.tar.gz
+ tar xvzf cgatools-1.7.1.5-linux_binary-x86_64.tar.gz
+
+ and place the "cgatools" binary found in bin/ directory on your $PATH
+
+
+list of files in my own humandb folder:
+
+hg18_ALL.sites.2012_04.txt
+hg18_ALL.sites.2012_04.txt.idx
+hg18_CEU.sites.2010_07.txt
+hg18_CEU.sites.2010_07.txt.idx
+hg18_JPTCHB.sites.2010_07.txt
+hg18_JPTCHB.sites.2010_07.txt.idx
+hg18_YRI.sites.2010_07.txt
+hg18_YRI.sites.2010_07.txt.idx
+hg18_cg46.txt
+hg18_cg46.txt.idx
+hg18_cg69.txt
+hg18_cg69.txt.idx
+hg18_cytoBand.txt
+hg18_dgvMerged.txt
+hg18_ensGene.txt
+hg18_ensGeneMrna.fa
+hg18_esp5400_aa.txt
+hg18_esp5400_aa.txt.idx
+hg18_esp5400_all.txt
+hg18_esp5400_all.txt.idx
+hg18_esp6500_aa.txt
+hg18_esp6500_aa.txt.idx
+hg18_esp6500_all.txt
+hg18_esp6500_all.txt.idx
+hg18_esp6500_ea.txt
+hg18_esp6500_ea.txt.idx
+hg18_esp6500si_aa.txt
+hg18_esp6500si_aa.txt.idx
+hg18_esp6500si_all.txt
+hg18_esp6500si_all.txt.idx
+hg18_esp6500si_ea.txt
+hg18_esp6500si_ea.txt.idx
+hg18_example_db_generic.txt
+hg18_example_db_gff3.txt
+hg18_genomicSuperDups.txt
+hg18_gerp++gt2.txt
+hg18_gerp++gt2.txt.idx
+hg18_gwasCatalog.txt
+hg18_kgXref.txt
+hg18_knownGene.txt
+hg18_knownGeneMrna.fa
+hg18_ljb2_fathmm.txt
+hg18_ljb2_fathmm.txt.idx
+hg18_ljb2_gerp++.txt
+hg18_ljb2_gerp++.txt.idx
+hg18_ljb2_ma.txt
+hg18_ljb2_ma.txt.idx
+hg18_ljb2_mt.txt
+hg18_ljb2_mt.txt.idx
+hg18_ljb2_phylop.txt
+hg18_ljb2_phylop.txt.idx
+hg18_ljb2_pp2hdiv.txt
+hg18_ljb2_pp2hdiv.txt.idx
+hg18_ljb2_pp2hvar.txt
+hg18_ljb2_pp2hvar.txt.idx
+hg18_ljb2_sift.txt
+hg18_ljb2_sift.txt.idx
+hg18_ljb2_siphy.txt
+hg18_ljb2_siphy.txt.idx
+hg18_phastConsElements44way.txt
+hg18_refGene.txt
+hg18_refGeneMrna.fa
+hg18_refLink.txt
+hg18_snp128.txt
+hg18_snp128.txt.idx
+hg18_snp128NonFlagged.txt
+hg18_snp128NonFlagged.txt.idx
+hg18_snp129.txt
+hg18_snp129.txt.idx
+hg18_snp129NonFlagged.txt
+hg18_snp129NonFlagged.txt.idx
+hg18_snp130.txt
+hg18_snp130.txt.idx
+hg18_snp130NonFlagged.txt
+hg18_snp130NonFlagged.txt.idx
+hg18_snp131.txt
+hg18_snp131.txt.idx
+hg18_snp131NonFlagged.txt
+hg18_snp131NonFlagged.txt.idx
+hg18_snp132.txt
+hg18_snp132.txt.idx
+hg18_snp132NonFlagged.txt
+hg18_snp132NonFlagged.txt.idx
+hg18_tfbsConsSites.txt
+hg19_AFR.sites.2012_04.txt
+hg19_AFR.sites.2012_04.txt.idx
+hg19_ALL.sites.2010_11.txt
+hg19_ALL.sites.2010_11.txt.idx
+hg19_ALL.sites.2012_02.txt
+hg19_ALL.sites.2012_02.txt.idx
+hg19_ALL.sites.2012_04.txt
+hg19_ALL.sites.2012_04.txt.idx
+hg19_AMR.sites.2012_04.txt
+hg19_AMR.sites.2012_04.txt.idx
+hg19_ASN.sites.2012_04.txt
+hg19_ASN.sites.2012_04.txt.idx
+hg19_EUR.sites.2012_04.txt
+hg19_EUR.sites.2012_04.txt.idx
+hg19_avsift.txt
+hg19_avsift.txt.idx
+hg19_cg46.txt
+hg19_cg46.txt.idx
+hg19_cg69.txt
+hg19_cg69.txt.idx
+hg19_clinvar_20131105.txt
+hg19_clinvar_20131105.txt.idx
+hg19_cosmic61.txt
+hg19_cosmic61.txt.idx
+hg19_cosmic63.txt
+hg19_cosmic63.txt.idx
+hg19_cosmic64.txt
+hg19_cosmic64.txt.idx
+hg19_cosmic65.txt
+hg19_cosmic65.txt.idx
+hg19_cosmic67.txt
+hg19_cytoBand.txt
+hg19_dgvMerged.txt
+hg19_ensGene.txt
+hg19_ensGeneMrna.fa
+hg19_esp5400_aa.txt
+hg19_esp5400_aa.txt.idx
+hg19_esp5400_all.txt
+hg19_esp5400_all.txt.idx
+hg19_esp6500_aa.txt
+hg19_esp6500_aa.txt.idx
+hg19_esp6500_all.txt
+hg19_esp6500_all.txt.idx
+hg19_esp6500_ea.txt
+hg19_esp6500_ea.txt.idx
+hg19_esp6500si_aa.txt
+hg19_esp6500si_aa.txt.idx
+hg19_esp6500si_all.txt
+hg19_esp6500si_all.txt.idx
+hg19_esp6500si_ea.txt
+hg19_esp6500si_ea.txt.idx
+hg19_genomicSuperDups.txt
+hg19_gerp++gt2.txt
+hg19_gerp++gt2.txt.idx
+hg19_gwasCatalog.txt
+hg19_kgXref.txt
+hg19_knownGene.txt
+hg19_knownGeneMrna.fa
+hg19_ljb2_fathmm.txt
+hg19_ljb2_fathmm.txt.idx
+hg19_ljb2_gerp++.txt
+hg19_ljb2_gerp++.txt.idx
+hg19_ljb2_ma.txt
+hg19_ljb2_ma.txt.idx
+hg19_ljb2_mt.txt
+hg19_ljb2_phylop.txt
+hg19_ljb2_phylop.txt.idx
+hg19_ljb2_pp2hdiv.txt
+hg19_ljb2_pp2hdiv.txt.idx
+hg19_ljb2_pp2hvar.txt
+hg19_ljb2_pp2hvar.txt.idx
+hg19_ljb2_sift.txt
+hg19_ljb2_sift.txt.idx
+hg19_ljb2_siphy.txt
+hg19_nci60.txt
+hg19_nci60.txt.idx
+hg19_phastConsElements46way.txt
+hg19_refGene.txt
+hg19_refGeneMrna.fa
+hg19_refLink.txt
+hg19_snp130.txt
+hg19_snp130.txt.idx
+hg19_snp130NonFlagged.txt
+hg19_snp130NonFlagged.txt.idx
+hg19_snp131.txt
+hg19_snp131NonFlagged.txt
+hg19_snp131NonFlagged.txt.idx
+hg19_snp132.txt
+hg19_snp132.txt.idx
+hg19_snp132NonFlagged.txt
+hg19_snp132NonFlagged.txt.idx
+hg19_snp135.txt
+hg19_snp135NonFlagged.txt
+hg19_snp135NonFlagged.txt.idx
+hg19_snp137.txt
+hg19_snp137NonFlagged.txt
+hg19_snp137NonFlagged.txt.idx
+hg19_tfbsConsSites.txt
+
+
+obsolete functional impact database files: (disabled by default)
+hg18_avsift.txt
+hg18_avsift.txt.idx
+hg19_ljb_all.txt
+hg19_ljb_all.txt.idx
+hg19_ljb_lrt.txt
+hg19_ljb_lrt.txt.idx
+hg19_ljb_mt.txt
+hg19_ljb_mt.txt.idx
+hg19_ljb_phylop.txt
+hg19_ljb_phylop.txt.idx
+hg19_ljb_pp2.txt
+hg19_ljb_pp2.txt.idx
+hg18_ljb_all.txt
+hg18_ljb_all.txt.idx
+hg18_ljb_lrt.txt
+hg18_ljb_lrt.txt.idx
+hg18_ljb_mt.txt
+hg18_ljb_mt.txt.idx
+hg18_ljb_phylop.txt
+hg18_ljb_phylop.txt.idx
+hg18_ljb_pp2.txt
+hg18_ljb_pp2.txt.idx
b
diff -r e423536a0780 -r 4600be69b96f tools/annovar/annovar.sh
--- a/tools/annovar/annovar.sh Thu Apr 10 09:31:26 2014 -0400
+++ b/tools/annovar/annovar.sh Thu Oct 01 04:24:45 2015 -0400
[
b'@@ -179,7 +179,7 @@\n #################################\n \n \n-set -- `getopt -n$0 -u -a --longoptions="inputfile: buildver: humandb: varfile: VCF: chrcol: startcol: endcol: refcol: obscol: vartypecol: convertcoords: geneanno: hgvs: verdbsnp: tfbs: mce: cytoband: segdup: dgv: gwas: ver1000g: cg46: cg69: impactscores: newimpactscores: otherinfo: esp: gerp: cosmic61: cosmic63: cosmic64: cosmic65: cosmic67: cosmic68: clinvar: nci60: outall: outfilt: outinvalid: scriptsdir: dorunannovar: dofilter: filt_dbsnp: filt1000GALL: filt1000GAFR: filt1000GAMR: filt1000GASN: filt1000GEUR: filtESP6500ALL: filtESP6500EA: filtESP6500AA: filtcg46: filtcg69: dummy:" "h:" "$@"` || usage\n+set -- `getopt -n$0 -u -a --longoptions="inputfile: buildver: humandb: varfile: VCF: chrcol: startcol: endcol: refcol: obscol: vartypecol: convertcoords: geneanno: hgvs: verdbsnp: tfbs: mce: cytoband: segdup: dgv: gwas: ver1000g: cg46: cg69: impactscores: newimpactscores: otherinfo: esp: exac03: spidex: gonl: gerp: cosmic61: cosmic63: cosmic64: cosmic65: cosmic67: cosmic68: clinvar: nci60: outall: outfilt: outinvalid: scriptsdir: dorunannovar: dofilter: filt_dbsnp: filt1000GALL: filt1000GAFR: filt1000GAMR: filt1000GASN: filt1000GEUR: filtESP6500ALL: filtESP6500EA: filtESP6500AA: filtcg46: filtcg69: dummy:" "h:" "$@"` || usage\n [ $# -eq 0 ] && usage\n \n \n@@ -216,6 +216,9 @@\n \t\t--otherinfo)\t\t\t\totherinfo=$2;shift;; \n \t\t--scriptsdir)\t      \t\tscriptsdirtmp=$2;shift;; # Y or N \n \t\t--esp)      \t\t\t\tesp=$2;shift;; \t# Y or N \n+\t\t--exac03)                   exac03=$2;shift;;\n+\t\t--gonl)                     gonl=$2;shift;;\n+\t\t--spidex)                   spidex=$2;shift;;\n \t\t--gerp)      \t\t\t\tgerp=$2;shift;; \t# Y or N \n \t\t--cosmic61)\t\t\t\t\tcosmic61=$2;shift;;  # Y or N\n \t\t--cosmic63)\t\t\t\t\tcosmic63=$2;shift;;  # Y or N\n@@ -524,6 +527,7 @@\n \t\t\t\tOFS="\\t";\n \t\t\t}{\n \t\t\t\tif(FNR>1) { \n+                                        gsub(/chr/,"",$"\'"${chrcol}"\'")\n \t\t\t\t\tif( $"\'"${vartypecol}"\'" == "snp" ){ $"\'"${startcol}"\'" += 1 }; \n \t\t\t\t\tif( $"\'"${vartypecol}"\'" == "ins" ){ $"\'"${refcol}"\'" = "-" };\n \t\t\t\t\tif( $"\'"${vartypecol}"\'" == "del" ){ $"\'"${startcol}"\'" +=1; $"\'"${obscol}"\'" = "-" };\n@@ -536,13 +540,15 @@\n \t\t\t}\' $infile > annovarinput\n \n \t\t\t#remove any "chr" prefixes\n-\t\t\tsed -i \'s/chr//g\' annovarinput\n+\t\t\t#sed -i \'2,$s/chr//g\' annovarinput\n \n \t\t\tawk \'BEGIN{\n \t\t\t\tFS="\\t";\n-\t\t\t\tOFS="\\t";\n+\t\t\t\tOFS="\\t";\t\t\t\t\n \t\t\t}{\n+                                \n \t\t\t\tif(FNR>=1) { \n+\t\t\t\t        gsub(/chr/,"",$"\'"${chrcol}"\'")\n \t\t\t\t\tif( $"\'"${vartypecol}"\'" == "snp" ){ $"\'"${startcol}"\'" += 1 }; \n \t\t\t\t\tif( $"\'"${vartypecol}"\'" == "ins" ){ $"\'"${refcol}"\'" = "-" };\n \t\t\t\t\tif( $"\'"${vartypecol}"\'" == "del" ){ $"\'"${startcol}"\'" +=1; $"\'"${obscol}"\'" = "-" };\n@@ -555,7 +561,7 @@\n \t\t\t}\' $infile > originalfile\n \n \t\t\t#remove any "chr" prefixes\n-\t\t\tsed -i \'s/chr//g\' originalfile\n+\t\t\t#sed -i \'2,$s/chr//g\' originalfile\n \t\t\tsed -i \'s/omosome/chromosome/g\' originalfile\n \n \n@@ -565,16 +571,16 @@\n \t\t\t\tFS="\\t";\n \t\t\t\tOFS="\\t";\n \t\t\t}{\n-\t\t\t\tif(FNR>1) { \t\t\t\n-\t\t\t\t\tprintf("%s\\t%s\\t%s\\t%s\\t%s\\n",$"\'"${chrcol}"\'",$"\'"${startcol}"\'",$"\'"${endcol}"\'",$"\'"${refcol}"\'",$"\'"${obscol}"\'");\t\t\t\t\n+\t\t\t\tif(FNR>1) { \t\t                                    \n+                                    printf("%s\\t%s\\t%s\\t%s\\t%s\\n",$"\'"${chrcol}"\'",$"\'"${startcol}"\'",$"\'"${endcol}"\'",$"\'"${refcol}"\'",$"\'"${obscol}"\'");\t\t\t\t\n \t\t\t\t}\n \t\t\t}\t\n \t\t\tEND{\n \t\t\t}\' $infile > annovarinput\n \n \t\t\t#remove any "chr" prefixes\n-\t\t\tsed -i \'s/chr//g\' annovarinput\n-\t\t\tsed \'s/chr//g\' $infile > originalfile\n+\t\t\tsed -i \'2,$s/chr//g\' annovarinput\n+\t\t\tsed \'2,$s/chr//g\' $infile > originalfile\n \t\t\tsed -i \'s/omosome/chromosome/g\' originalfile\n \tfi\n \n@@ -745,9 +751,15 @@\n \t\techo -e "\\ndbSNP region Annotation, version: $version"\n \t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype ${version} annovarinput $humandb 2>&1\n \t\n+\t\tcolumnname=${version}\n+\t\tif [[ $columnname == snp* ]]\n+\t\tthen\n+\t\t\tcolumnname="db${version}"\n+\t\tfi\n+\n \t\tannovarout=annovarinput.${buildver}_${versi'..b'opped\n+\t\t\t\tsed -i \'1i\\db\\t\'$g1000_colheader_SAS\'\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\t\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.$g1000_colheader_SAS\t\n+\t\t\tfi\n+\t\t\t\n \t\t\t# EUR\n \t\t\tif [ $doEUR == "Y"  ]\n \t\t\tthen\n@@ -937,6 +998,8 @@\n \t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver $otherinfo -dbtype ljb2_pp2hvar annovarinput $humandb 2>&1\n \t\n \t\tannovarout=annovarinput.${buildver}_ljb2_pp2hvar_dropped\n+\t\t\n+\t\thead $annovarout\n \t\tsed -i \'1i\\db\\tLJB2_PolyPhen2_HVAR\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n \t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.LJB2_PolyPhen2_HVAR\t\n \tfi\n@@ -1191,7 +1254,76 @@\n \t\tdone\n \tfi\t\n \n+\t\n+\t#ExAC-03 database \n+\tif [ $exac03 == "Y" ]\n+\tthen\n+\t\techo -e "\\nExAC03 Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter -otherinfo --buildver $buildver --otherinfo -dbtype exac03 annovarinput $humandb 2>&1\n+\t        \n+\t\t#annovarout=annovarinput.${buildver}_exac03_dropped\n+\t\t\n+\t\t# split allelefrequency column into several columns, one per population\n+\t\tawk \'BEGIN{FS="\\t"\n+\t\t           OFS="\\t"\t\t           \n+\t\t           }{\t\t           \n+\t\t           gsub(",","\\t",$2)\n+\t\t           print $0\t\t           \n+\t\t           }END{}\' annovarinput.${buildver}_exac03_dropped > $annovarout\n+\t\t\n+\t\tsed -i \'1i\\db\\tExAC_Freq\\tExAC_AFR\\tExAC_AMR\\tExAC_EAS\\tExAC_FIN\\tExAC_NFE\\tExAC_OTH\\tExAC_SAS\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 10 11 12 13 14 B.ExAC_Freq,B.ExAC_AFR,B.ExAC_AMR,B.ExAC_EAS,B.ExAC_FIN,B.ExAC_NFE,B.ExAC_OTH,B.ExAC_SAS\t\n+\tfi\n \n+    #GoNL database \n+\tif [ $gonl == "Y" ]\n+\tthen\n+\t\n+            if [ $buildver == "hg19" ]\n+            then\n+\t\techo -e "\\nGoNL Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver --otherinfo -dbtype generic -genericdbfile ${buildver}_gonl.txt annovarinput $humandb 2>&1\n+\t        \n+\t        ls\n+\t\tannovarout=annovarinput.${buildver}_generic_dropped\n+\t\t\n+\t\thead $annovarout\n+\t\t\n+\t\tsed -i \'1i\\db\\tGoNL\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.GoNL\t\n+\t\t\n+            fi\n+            \n+\tfi\n+\t\n+\t#SPIDEX database \n+\tif [ $spidex == "Y" ]\n+\tthen\n+\t\n+        if [ $buildver == "hg19" ]\n+        then\n+\t\t\techo -e "\\nSPIDEX Annotation"\n+\t\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver --otherinfo -dbtype spidex annovarinput $humandb 2>&1\n+\t\t\t    \n+\t\t\t# split allelefrequency column into several columns, one per population\n+\t\t    awk \'BEGIN{FS="\\t"\n+\t\t           OFS="\\t"\t\t           \n+\t\t           }{\t\t           \n+\t\t           gsub(",","\\t",$2)\n+\t\t           print $0\t\t           \n+\t\t    }END{}\' annovarinput.${buildver}_spidex_dropped > $annovarout    \n+\t\t\t\n+\t\t\t#annovarout=annovarinput.${buildver}_spidex_dropped\n+\t\t    #head $annovarout\n+\t\t\n+\t\t\tsed -i \'1i\\db\\tSPIDEX_dpsi_max_tissue\\tSPIDEX_dpsi_zscore\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\t\tjoinresults originalfile $annovarout 4 5 6 7 8 B.SPIDEX_dpsi_max_tissue,B.SPIDEX_dpsi_zscore\t\n+\t\t\n+        fi\n+            \n+\tfi\n+\t\n+\t\n \t#GERP++\n \tif [ $gerp == "Y" ]\n \tthen\n@@ -1271,6 +1403,16 @@\n \n \tfi\n \t\n+\tif [[ $cosmic70 == "Y" && $buildver == "hg19" ]]\n+\tthen\n+\t\techo -e "\\nCOSMIC70 Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic70 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cosmic70_dropped"\n+\t\tsed -i \'1i\\db\\tCOSMIC70\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC70\n+\n+\tfi\n \n \tif [[ $clinvar == "Y" && $buildver == "hg19" ]]\n \tthen\n@@ -1356,6 +1498,10 @@\n \t\n \tcp originalfile_coords $outfile_all\n \tcp annovarinput.invalid_input $outfile_invalid 2>&1\n+\t\n+\tsed -i \'s/chrchr/chr/g\' $outfile_all\n+\tsed -i \'s/chrchr/chr/g\' $outfile_invalid\n+\t\n fi #if $dorunannovar\n \n \n'
b
diff -r e423536a0780 -r 4600be69b96f tools/annovar/annovar.xml
--- a/tools/annovar/annovar.xml Thu Apr 10 09:31:26 2014 -0400
+++ b/tools/annovar/annovar.xml Thu Oct 01 04:24:45 2015 -0400
b
@@ -1,4 +1,4 @@
-<tool id="AnnovarShed" name="ANNOVAR" version="2013aug">
+<tool id="AnnovarShed" name="ANNOVAR" version="2015may">
  <description> Annotate a file using ANNOVAR </description>
 
  <requirements>
@@ -8,6 +8,9 @@
  <command interpreter="bash">
  annovar.sh
  --esp ${esp}
+ --gonl ${gonl}
+ --exac03 ${exac03}
+ --spidex ${spidex}
  --gerp ${gerp}
  --cosmic61 ${cosmic61}
  --cosmic63 ${cosmic63}
@@ -67,8 +70,9 @@
 
  <inputs>
  <param name="dorun" type="hidden" value="Y"/> <!-- will add tool in future to filter on annovar columns, then will call annovar.sh with dorun==N -->
- <param name="reference" type="select" label="Reference">
+ <param name="reference" type="select" label="Reference">         
  <options from_data_table="annovar_loc" />
+ <filter type="data_meta" ref="infile" key="dbkey" column="0"/>
  </param>
 
  <param name="infile" type="data" label="Select file to annotate" help="Must be CG varfile or a tab-separated file with a 1 line header"/>
@@ -117,7 +121,7 @@
 
 
  <!-- filter-based annotation -->
- <param name="verdbsnp" type="select" label="Select dbSNP version(s) to annotate with" multiple="true" display="checkboxes"  optional="true" help="SNPs in dbSNP may be flagged as Clinically Associated, Select the NonFlagged version if you do not wish to annotate with these SNPs ">
+ <param name="verdbsnp" type="select" label="Select dbSNP version(s) to annotate with" multiple="true" display="checkboxes"  optional="true" help="avSNP are reformatted dbSNP databases with one variant per line and left-normalized indels (for a more detailed discussion read this article: http://annovar.openbioinformatics.org/en/latest/articles/dbSNP/). Flagged SNPs include SNPs less than 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as clinically associated">
  <option value="snp128"          > 128            (hg18/hg19) </option>
  <option value="snp128NonFlagged"> 128 NonFlagged  </option>
  <option value="snp129"          > 129            (hg18/hg19) </option>
@@ -134,9 +138,13 @@
  <option value="snp137NonFlagged"> 137 NonFlagged  </option>
  <option value="snp138"          > 138            (hg19 only) </option>
  <option value="snp138NonFlagged"> 138 NonFlagged  </option>
+ <option value="avsnp138"          > 138            (avSNP, hg19 only) </option>
+ <option value="avsnp142"          > 142            (avSNP, hg19 only) </option>
  </param>
 
  <param name="ver1000g" type="select" label="Select 1000Genomes Annotation(s)" multiple="true" display="checkboxes"  optional="true" help="2012april database for ALL populations was converted to hg18 using the UCSC liftover program">
+ <option value="1000g2015aug"> 2015aug (hg19) (6 populations: AMR,AFR,EUR,EAS,SAS,ALL) </option>
+ <option value="1000g2014oct"> 2014oct (hg18/hg19) (6 populations: AMR,AFR,EUR,EAS,SAS,ALL) </option>
  <option value="1000g2012apr"> 2012apr (hg18/hg19) (5 populations: AMR,AFR,ASN,CEU,ALL) </option>
  <option value="1000g2012feb"> 2012feb (hg19) (1 population: ALL) </option>
  <option value="1000g2010nov"> 2010nov (hg19) (1 population: ALL) </option>
@@ -159,21 +167,28 @@
  <option value="esp5400_aa"          > ESP5400   African Americans  </option>
  </param>
 
+        <param name="exac03" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with ExAC 03? (The Exome Aggregation Consortium)" help=" The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community. The data set provided on this website spans 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. See http://exac.broadinstitute.org/faq for more information."/> 
 
+        <param name="gonl" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with GoNL data? (hg19 only)"/> 
+        <param name="spidex" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with SPIDEX database? (hg19 only)" help="This dataset provides machine-learning prediction on how genetic variants affect RNA splicing. (Xiong et al, Science 2015)"/> 
+
+                
  <param name="gerp" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="GERP++ Annotation?" help="GERP identifies constrained elements in multiple alignments by quantifying substitution deficits (see http://mendel.stanford.edu/SidowLab/downloads/gerp/ for details) This option annotates those variants having GERP++>2 in human genome, as this threshold is typically regarded as evolutionarily conserved and potentially functional"/>
 
  <param name="clinvar" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="CLINVAR Annotation? (hg19 only)" help="version 2014-02-11. Annotations include Variant Clinical Significance (unknown, untested, non-pathogenic, probable-non-pathogenic, probable-pathogenic, pathogenic, drug-response, histocompatibility, other) and Variant disease name."/>
  <param name="nci60" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with NCI60? (hg19 only)" help="NCI-60 exome allele frequency data"/>
  <param name="cgfortysix" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 46 Genomes?" help="Diversity Panel; 46 unrelated individuals"/>
  <param name="cgsixtynine" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 69 Genomes?" help="Diversity Panel, Pedigree, YRI trio and PUR trio"/>
- <param name="cosmic61" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC61? (hg19 only)"/>
- <param name="cosmic63" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC63? (hg19 only)"/>
- <param name="cosmic64" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC64? (hg19 only)"/>
- <param name="cosmic65" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC65? (hg19 only)"/>
- <param name="cosmic67" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC67? (hg19 only)"/>
- <param name="cosmic68" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC68? (hg19 only)"/>
+ <param name="cosmic61" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 61? (hg19 only)"/>
+ <param name="cosmic63" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 63? (hg19 only)"/>
+ <param name="cosmic64" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 64? (hg19 only)"/>
+ <param name="cosmic65" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 65? (hg19 only)"/>
+ <param name="cosmic67" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 67? (hg19 only)"/>
+ <param name="cosmic68" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 68? (hg19 only)"/>
+ <param name="cosmic70" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 70? (hg19 only)"/>
+
+       
 
-
  <param name="newimpactscores" type="select" label="Select functional impact scores (LJB2)" multiple="true" display="checkboxes" optional="true" help="LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341. ">
  <option value="ljb2_sift"> SIFT score </option>
  <option value="ljb2_pp2hdiv"> PolyPhen2 HDIV score </option>
b
diff -r e423536a0780 -r 4600be69b96f tools/tool-data/annovar.loc.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/tool-data/annovar.loc.sample Thu Oct 01 04:24:45 2015 -0400
[
@@ -0,0 +1,6 @@
+#loc file for annovar tool
+#
+# <columns>value, dbkey, name, ANNOVAR_scripts, ANNOVAR_humandb</columns>
+
+#hg18 hg18 hg18 [Human Mar. 2006 (NCBI36/hg18)] /path/to/annovarscripts /path/to/humandb
+#hg19 hg19 hg19 [Human Feb. 2009 (GRCh37/hg19)] /path/to/annovarscripts /path/to/humandb
b
diff -r e423536a0780 -r 4600be69b96f tools/tool_data_table_conf.xml.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/tool_data_table_conf.xml.sample Thu Oct 01 04:24:45 2015 -0400
b
@@ -0,0 +1,7 @@
+<!-- ANNOVAR files -->
+<tables>
+<table name="annovar_loc" comment_char="#">
+<columns>value, dbkey, name, ANNOVAR_scripts, ANNOVAR_humandb</columns>
+<file path="tool-data/annovar.loc" /> 
+</table>
+</tables>
b
diff -r e423536a0780 -r 4600be69b96f tools/tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/tool_dependencies.xml Thu Oct 01 04:24:45 2015 -0400
b
@@ -0,0 +1,23 @@
+<?xml version="1.0"?>
+<tool_dependency>
+ <package name="cgatools" version="1.7"> 
+        <install version="1.0">
+            <actions>                
+                <action type="download_by_url">http://sourceforge.net/projects/cgatools/files/1.7.1/cgatools-1.7.1.5-linux_binary-x86_64.tar.gz</action>
+ <action type="shell_command"> chmod a+x bin/cgatools</action>
+                <action type="move_file">
+                 <source>bin/cgatools</source>
+                 <destination>$INSTALL_DIR/bin</destination>
+                </action>     
+ <action type="set_environment">
+                    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>
+                    <environment_variable name="PATH" action="prepend_to">$REPOSITORY_INSTALL_DIR</environment_variable>
+                </action>                            
+            </actions>
+        </install>
+        <readme>
+ Downloads and installs the cgatools binary. 
+        </readme>
+    </package>      
+</tool_dependency>
+
b
diff -r e423536a0780 -r 4600be69b96f tools/tools/annovar/annovar.sh
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/tools/annovar/annovar.sh Thu Oct 01 04:24:45 2015 -0400
[
b'@@ -0,0 +1,1461 @@\n+#!/bin/bash\n+\n+test="N"\n+dofilter="N"\n+\n+#########################\n+#\t   DEFINE SOME\n+#\t    FUNCTIONS\n+#########################\n+\n+function usage(){\n+\techo "usage: $0 todo"\n+}\n+\n+function runfilter(){\n+\tifile=$1\t\n+\tcolumnname=$2\n+\tthreshold=$3\n+\n+\tif [[ $threshold == "-1" ]]\n+\tthen\n+\t\techo "not filtering"\n+\t\treturn\n+\tfi\n+\t\n+\techo "filtering: $columnname, $threshold"\n+\tcat $ifile\n+\n+\t#get column number corresponding to column header\n+\tcolumn=`awk \'BEGIN{\n+\t\t\t\t\tFS="\\t";\n+\t\t\t\t\tcol=-1\n+\t\t\t\t}{\n+\t\t\t\t\tif(FNR==1){\n+\t\t\t\t\t\tfor(i=1;i<=NF;i++){\n+\t\t\t\t\t\t\tif($i == "\'"${columnname}"\'") \n+\t\t\t\t\t\t\t\tcol=i \n+\t\t\t\t\t\t} \n+\t\t\t\t\t\tprint col \n+\t\t\t\t\t}\n+\t\t\t\t}\' $ifile `\n+\n+\tif [ $column == -1 ]\n+\tthen\n+\t\techo "no such column, exiting"\n+\t\treturn\n+\tfi\t\n+\n+\t#perform filtering using the threshold\n+\tawk \'BEGIN{\n+\t\tFS="\\t";\n+\t\tOFS="\\t";\n+\t}{\n+\t\tif(FNR==1) \n+\t\t\tprint $0; \n+\t\tif(FNR>1){\n+\t\t\tif( $"\'"${column}"\'" == "" )  # empty column, then print\n+\t\t\t\tprint $0\n+\t\t\telse if ("\'"${threshold}"\'" == "text"){}  #if set to text dont check threshold\n+\t\t\t\t\n+\t\t\telse if ($"\'"${column}"\'" < "\'"${threshold}"\'")  #else do check it\n+\t\t\t\tprint $0\t\n+\t\t}\n+\t}\' $ifile > tmpfile\n+\n+\tmv tmpfile $ifile\t\n+}\n+\n+# arguments: originalfile,resultfile,chrcol,startcol,endcol,refcol,obscol,addcols\n+function joinresults(){\n+\tofile=$1\n+\trfile=$2\n+\tcolchr=$3\n+\tcolstart=$4\n+\tcolend=$5\n+\tcolref=$6\n+\tcolobs=$7\n+\taddcols=$8 #e.g. "B.col1,B.col2"\n+\t\n+\ttest="N"\n+\t\n+\t# echo "joining result with original file"\n+\tif [ $test == "Y" ]\n+\tthen \t\n+\t\techo "ofile: $ofile"\n+\t\thead $ofile \n+\t\techo "rfile: $rfile"\n+\t\thead $rfile\n+\tfi\n+\tnumlines=`wc $rfile | cut -d" " -f2`\n+\t\n+\t# if empty results file, just add header fields\n+\tif [[ ! -s $rfile ]] \n+\tthen\t\t\t\n+\t\tdummycol=${addcols:2}\n+\t\toutputcol=${dummycol//",B."/"\t"}\n+\t\tnumcommas=`echo "$addcols" | grep -o "," | wc -l`\t\t\n+\t\t\n+\t\tawk \'BEGIN{FS="\\t";OFS="\\t"}{\n+\t\t\t\tif(FNR==1)\n+\t\t\t\t\tprint $0,"\'"$outputcol"\'"; \n+\t\t\t\telse{\n+\t\t\t\t\tprintf $0\n+\t\t\t\t\tfor(i=0;i<="\'"$numcommas"\'"+1;i++)\n+\t\t\t\t\t\tprintf "\\t"\n+\t\t\t\t\tprintf "\\n"\n+\t\t\t\t}\n+\t\t\t}END{}\' $ofile > tempofile\n+\t\t\t\n+\t\t\tmv tempofile $ofile\t\t\n+\t\treturn\n+\tfi\n+\t\n+\n+\t#get input file column names for cgatools join\n+\tcol_chr_name=`head -1  $rfile | cut -f${colchr}`\n+\tcol_start_name=`head -1  $rfile | cut -f${colstart}`\n+\tcol_end_name=`head -1  $rfile | cut -f${colend}`\n+\tcol_ref_name=`head -1  $rfile | cut -f${colref}`\n+\tcol_obs_name=`head -1  $rfile | cut -f${colobs}`\n+\n+\t#get annotation file column names for cgatools join\n+\tchr_name=`head -1  $ofile | cut -f${chrcol}`\n+\tstart_name=`head -1  $ofile | cut -f${startcol}`\n+\tend_name=`head -1  $ofile | cut -f${endcol}`\n+\tref_name=`head -1  $ofile | cut -f${refcol}`\n+\tobs_name=`head -1  $ofile | cut -f${obscol}`\n+\n+\tif [ $test == "Y" ]\t\n+\tthen\n+\t\techo "input file"\n+\t\techo "chr   col: $col_chr_name ($colchr)"\t\n+\t\techo "start col: $col_start_name ($colstart)"\t\n+\t\techo "end   col: $col_end_name ($colend)"\t\n+\t\techo "ref   col: $col_ref_name ($colref)"\t\n+\t\techo "obs   col: $col_obs_name ($colobs)"\t\n+\t\techo ""\n+\t\techo "annotation file"\n+\t\techo "chr   col: $chr_name ($chrcol)"\t\n+\t\techo "start col: $start_name ($startcol)"\t\n+\t\techo "end   col: $end_name ($endcol)"\t\n+\t\techo "ref   col: $ref_name ($refcol)"\t\n+\t\techo "obs   col: $obs_name ($obscol)"\t\n+\tfi\n+\n+\t#perform join\n+\tcgatools join --beta \\\n+\t\t--input $ofile $rfile \\\n+\t\t--output temporiginal \\\n+\t\t--match ${chr_name}:${col_chr_name} \\\n+\t\t--match ${start_name}:${col_start_name} \\\n+\t\t--match ${end_name}:${col_end_name} \\\n+\t\t--match ${ref_name}:${col_ref_name} \\\n+\t\t--match ${obs_name}:${col_obs_name} \\\n+\t\t--select A.*,$addcols \\\n+\t\t--always-dump \\\n+\t\t--output-mode compact \n+\n+\t#replace originalfile\n+\tsed -i \'s/^>//g\' temporiginal #join sometimes adds a \'>\' symbol to header\n+\tmv temporiginal originalfile\n+\t\t\n+\tif [ $test == "Y" ]\n+\tthen\n+\t\techo "joining complete"\n+\t\thead originalfile\n+\t\techo ""\t\n+\tfi\n+\t\n+}\n+\n+\n+\n+\n+#################################\n+#\n+#\t   PARSE PARAMETERS\n+#\n+#################################\n+\n+\n+set -- `getop'..b' $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cosmic67_dropped"\n+\t\tsed -i \'1i\\db\\tCOSMIC67\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC67\n+\n+\tfi\n+\t\n+\tif [[ $cosmic68 == "Y" && $buildver == "hg19" ]]\n+\tthen\n+\t\techo -e "\\nCOSMIC68 Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic68 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cosmic68_dropped"\n+\t\tsed -i \'1i\\db\\tCOSMIC68\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC68\n+\n+\tfi\n+\t\n+\tif [[ $cosmic70 == "Y" && $buildver == "hg19" ]]\n+\tthen\n+\t\techo -e "\\nCOSMIC70 Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic70 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cosmic70_dropped"\n+\t\tsed -i \'1i\\db\\tCOSMIC70\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC70\n+\n+\tfi\n+\n+\tif [[ $clinvar == "Y" && $buildver == "hg19" ]]\n+\tthen\n+\t\techo -e "\\nCLINVAR Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype clinvar_20140211 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_clinvar_20140211_dropped"\n+\t\tsed -i \'1i\\db\\tCLINVAR\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.CLINVAR\n+\n+\tfi\n+\t\n+\tif [[ $nci60 == "Y" && $buildver == "hg19" ]]\n+\tthen\n+\t\techo -e "\\nNCI60 Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype nci60 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_nci60_dropped"\n+\t\tsed -i \'1i\\db\\tNCI60\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.NCI60\n+\n+\tfi\n+\t\n+\t#cg46\n+\tif [[ $cg46 == "Y"  ]]\n+\tthen\n+\t\techo -e "\\nCG 46 genomes Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cg46 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cg46_dropped"\n+\t\tsed -i \'1i\\db\\t\'${cg46_colheader}\'\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.${cg46_colheader}\n+\n+\tfi\n+\n+\n+\t#cg69\n+\tif [[ $cg69 == "Y"  ]]\n+\tthen\n+\t\techo -e "\\nCG 69 genomes Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cg69 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cg69_dropped"\n+\t\tsed -i \'1i\\db\\t\'${cg69_colheader}\'\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.${cg69_colheader}\n+\n+\tfi\n+\n+\n+\t\n+\tif [ $convertcoords == "Y" ]\n+\tthen\n+\t\techo "converting back coordinates"\n+\t\tawk \'BEGIN{\n+\t\t\t\tFS="\\t";\n+\t\t\t\tOFS="\\t";\n+\t\t\t}{\n+\t\t\t\tif (FNR==1)\n+\t\t\t\t\tprint $0\n+\t\t\t\tif(FNR>1) { \n+\t\t\t\t\t$"\'"${chrcol}"\'" = "chr"$"\'"${chrcol}"\'"\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "snp" ){ $"\'"${startcol}"\'" -= 1 }; \t\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "ins" ){ $"\'"${refcol}"\'" = "" };\t\t\t\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "del" ){ $"\'"${startcol}"\'" -=1; $"\'"${obscol}"\'" = "" };\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "sub" ){ $"\'"${startcol}"\'" -= 1 }; \n+\t\t\t\t\tprint $0\n+\t\t\t\t\t\t\t\t\n+\t\t\t\t}\n+\t\t\t}\t\n+\t\t\tEND{\n+\t\t\t}\' originalfile > originalfile_coords\n+\telse\n+\t\tmv originalfile originalfile_coords\n+\tfi\n+\n+\t#restore "chr" prefix?\n+\n+\t#move to outputfile\n+\tif [ ! -s annovarinput.invalid_input ]\n+\tthen\n+\t\techo "Congrats, your input file contained no invalid lines!" > annovarinput.invalid_input\n+\tfi\n+\t\n+\tcp originalfile_coords $outfile_all\n+\tcp annovarinput.invalid_input $outfile_invalid 2>&1\n+\t\n+\tsed -i \'s/chrchr/chr/g\' $outfile_all\n+\tsed -i \'s/chrchr/chr/g\' $outfile_invalid\n+\t\n+fi #if $dorunannovar\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n'
b
diff -r e423536a0780 -r 4600be69b96f tools/tools/annovar/annovar.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/tools/annovar/annovar.xml Thu Oct 01 04:24:45 2015 -0400
b
b'@@ -0,0 +1,255 @@\n+<tool id="AnnovarShed" name="ANNOVAR" version="2015may">\n+\t<description> Annotate a file using ANNOVAR </description>\n+\t\n+\t<requirements>\t\t\n+\t\t<requirement type="package" version="1.7">cgatools</requirement>\n+\t</requirements>\n+\t\n+\t<command interpreter="bash">\n+\t\tannovar.sh\t\t\n+\t\t--esp ${esp}\n+\t\t--exac03 ${exac03}\n+\t\t--gerp ${gerp}\n+\t\t--cosmic61 ${cosmic61}\n+\t\t--cosmic63 ${cosmic63}\t\n+\t\t--cosmic64 ${cosmic64}\t\t\n+\t\t--cosmic65 ${cosmic65}\n+\t\t--cosmic67 ${cosmic67}\n+\t\t--cosmic68 ${cosmic68}\n+\t\t--outall ${annotated}\t\t\n+\t\t--outinvalid ${invalid}\n+\t\t--dorunannovar ${dorun}\n+\t\t--inputfile ${infile}\n+\t\t--buildver ${reference.fields.dbkey}\n+\t\t--humandb ${reference.fields.ANNOVAR_humandb}\n+\t\t--scriptsdir ${reference.fields.ANNOVAR_scripts}\t\n+\t\t--verdbsnp ${verdbsnp}\n+\t\t--geneanno ${geneanno}\n+\t\t--tfbs ${tfbs}\n+\t\t--mce ${mce}\n+\t\t--cytoband ${cytoband}\n+\t\t--segdup ${segdup}\n+        --dgv ${dgv}\n+\t\t--gwas ${gwas}\t\t\t\t\n+\t\t#if $filetype.type == "other"\n+\t\t\t--varfile N\n+\t\t\t--VCF N\n+\t\t\t--chrcol ${filetype.col_chr}\n+\t\t\t--startcol ${filetype.col_start}\n+\t\t\t--endcol ${filetype.col_end}\n+\t\t\t--obscol ${filetype.col_obs}\n+\t\t\t--refcol ${filetype.col_ref}\n+\t\t\n+\t\t\t#if $filetype.convertcoords.convert == "Y"\n+\t\t\t\t--vartypecol ${filetype.convertcoords.col_vartype}\n+\t\t\t\t--convertcoords Y\n+\t\t\t#else\n+\t\t\t\t--convertcoords N\n+\t\t\t#end if\n+\t\t#end if\n+\t\t#if $filetype.type == "vcf"\n+\t\t\t--varfile N\n+\t\t\t--VCF Y\n+\t\t\t--convertcoords N\n+\t\t#end if\n+\t\t#if $filetype.type == "varfile"\n+\t\t\t--varfile Y\n+\t\t\t--VCF N\t\t\t\n+\t\t#end if\t\t\t\n+\t\t--cg46 ${cgfortysix}\n+\t\t--cg69 ${cgsixtynine}\n+\t\t--ver1000g ${ver1000g}\n+\t\t--hgvs ${hgvs}\n+\t\t--otherinfo ${otherinfo}\n+\t\t--newimpactscores ${newimpactscores}\n+\t\t--clinvar ${clinvar}\n+\t\t\n+\t</command>\n+\t\t\n+\t<inputs>\n+\t\t<param name="dorun" type="hidden" value="Y"/> <!-- will add tool in future to filter on annovar columns, then will call annovar.sh with dorun==N -->\n+\t\t<param name="reference" type="select" label="Reference">\t\t\t        \n+\t\t\t<options from_data_table="annovar_loc" />\t\t\t\t\n+\t\t\t<filter type="data_meta" ref="infile" key="dbkey" column="0"/>\t\t\t\n+\t\t</param>\n+\t\t\t\t\n+\t\t<param name="infile" type="data" label="Select file to annotate" help="Must be CG varfile or a tab-separated file with a 1 line header"/>\n+\t\t<conditional name="filetype">\n+\t\t\t<param name="type" type="select" label="Select filetype" >\n+\t\t\t\t<option value="vcf" selected="false"> VCF4 file </option>\n+\t\t\t\t<option value="varfile" selected="false"> CG varfile </option>\n+\t\t\t\t<option value="other" selected="false"> Other </option>\n+\t\t\t</param>\n+\t\t\t<when value="other">\n+\t\t\t\t<param name="col_chr"     type="data_column"   data_ref="infile" multiple="False" label="Chromosome Column"  /> \n+\t\t\t\t<param name="col_start"   type="data_column"   data_ref="infile" multiple="False" label="Start Column"  /> \n+\t\t\t\t<param name="col_end"     type="data_column"   data_ref="infile" multiple="False" label="End Column"  /> \n+\t\t\t\t<param name="col_ref"     type="data_column"   data_ref="infile" multiple="False" label="Reference Allele Column"  /> \n+\t\t\t\t<param name="col_obs"     type="data_column"   data_ref="infile" multiple="False" label="Observed Allele Column"  /> \t\n+\t\t\t\t<conditional name="convertcoords">\n+\t\t\t\t\t<param name="convert" type="select" label="Is this file using Complete Genomics (0-based half-open) cooridinates?" >\n+\t\t\t\t\t\t<option value="Y"> Yes </option>\n+\t\t\t\t\t\t<option value="N" selected="True"> No </option>\n+\t\t\t\t\t</param>\n+\t\t\t\t\t<when value="Y">\n+\t\t\t\t\t\t<param name="col_vartype" type="data_column"   data_ref="infile" multiple="False" label="varType Column"  /> \n+\t\t\t\t\t</when>\n+\t\t\t\t</conditional>\n+\t\t\t</when>\n+\t\t</conditional>\n+\n+\n+\n+\t\t<param name="geneanno" type="select" label="Select Gene Annotation(s)" multiple="true" optional="true" display="checkboxes">\t\t\t\n+\t\t\t<option value="refSeq" selected="true"  > RefSeq </option>\n+\t\t\t<option value="knowngene"> UCSC KnownGene </option>\n+\t\t\t<option value="ensgene"  > Ensembl </option>\t\t\t\n+\t\t</param>\t\n+\t\t<param name="hgvs" type="boolean" check'..b'g19 only)"/>\n+\t\t<param name="cosmic70" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC 70? (hg19 only)"/>\n+\t\t\n+       \n+\n+\t\t<param name="newimpactscores" type="select" label="Select functional impact scores (LJB2)" multiple="true" display="checkboxes" optional="true" help="LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341. ">\t\t\t\t\t\t\n+\t\t\t<option value="ljb2_sift"> SIFT score </option>\n+\t\t\t<option value="ljb2_pp2hdiv"> PolyPhen2 HDIV score </option>\n+\t\t\t<option value="ljb2_pp2hvar" > PolyPhen2 HVAR score </option>\n+\t\t\t<option value="ljb2_mt" > MutationTaster score </option>\n+\t\t\t<option value="ljb2_ma" > MutationAssessor score </option>\n+\t\t\t<option value="ljb2_lrt"> LRT score (Likelihood Ratio Test) </option>\t\t\t\n+\t\t\t<option value="ljb2_phylop"> PhyloP score </option>\n+\t\t\t<option value="ljb2_fathmm" > FATHMM score </option>\n+\t\t\t<option value="ljb2_gerp"> GERP++ score </option>\t\t\t\n+\t\t\t<option value="ljb2_siphy"> SiPhy score </option>\n+\t\t</param>\t\n+\t\t<param name="otherinfo" type="boolean" checked="False" truevalue="-otherinfo" falsevalue="N" label="Also get predictions where possible?" help="e.g. annotated as -score,damaging- or -score,benign- instead of just score"/>\n+\t\t\n+\t\t<!--  OBSOLETE impact scores, uncomment for backwards compatibility, add argument impactscores to command\n+<param name="impactscores" type="select" label="Select functional impact scores annotate with (OBSOLETE)" multiple="true" display="checkboxes" optional="true" help="LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341.">\t\t\t\n+\t\t\t<option value="avsift"> AV SIFT </option>\n+\t\t\t<option value="ljbsift"> LJB SIFT (corresponds to 1-SIFT)</option>\n+\t\t\t<option value="pp2"> PolyPhen2 </option>\n+\t\t\t<option value="mutationtaster" > MutationTaster </option>\n+\t\t\t<option value="lrt"> LRT (Likelihood Ratio Test) </option>\t\t\t\n+\t\t\t<option value="phylop"> PhyloP </option>\n+\t\t</param>\t\n+\t\t\t-->\n+\n+\t\t<!-- prefix for output file so you dont have to manually rename history items -->\n+\t\t<param name="fname" type="text" value="" label="Prefix for your output file" help="Optional"/>\t\t\n+\t\t\t\t\n+\t</inputs>\n+\n+\t<outputs>\n+\t\t<data format="tabular" name="invalid"   label="$fname ANNOVAR Invalid input on ${on_string}"/>\t\n+\t\t<data format="tabular" name="annotated" label="$fname ANNOVAR Annotated variants on ${on_string}"/>\n+\t</outputs>\n+\n+\t<help> \n+**What it does**\n+\n+This tool will annotate a file using ANNOVAR.\n+\n+**ANNOVAR Website and Documentation**\n+\n+Website: http://www.openbioinformatics.org/annovar/\n+\n+Paper: http://nar.oxfordjournals.org/content/38/16/e164\n+\n+**Input Formats**\n+\n+Input Formats may be one of the following:\n+\t\n+VCF file\n+Complete Genomics varfile\n+\n+Custom tab-delimited file (specify chromosome, start, end, reference allele, observed allele columns)\t\n+\t\n+Custom tab-delimited CG-derived file (specify chromosome, start, end, reference allele, observed allele, varType columns)\n+\t\t\n+\t\t\n+**Database Notes**\n+\n+see ANNOVAR website for extensive documentation, a few notes on some of the databases:\n+\n+**LJB2 Database**\n+\n+PolyPhen2 HVAR should be used for diagnostics of Mendelian diseases, which requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles.The authors recommend calling probably damaging if the score is between 0.909 and 1, and possibly damaging if the score is between 0.447 and 0.908, and benign if the score is between 0 and 0.446.\n+\n+PolyPhen HDIV should be used when evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data. The authors recommend calling probably damaging if the score is between 0.957 and 1, and possibly damaging if the score is between 0.453 and 0.956, and benign is the score is between 0 and 0.452. \t\t\n+\t\t\n+\t</help>\n+\n+</tool>\n+\n+\n'