Galaxy |

Changeset 0:d3a72e55deca (2013-09-18)

Next changeset 1:7d9353127f8a (2013-11-05)

Commit message:
Uploaded

added:
README
README~
tool-data/annovar.loc.sample
tool_data_table_conf.xml.sample
tool_dependencies.xml
tools/annovar/annovar.sh
tools/annovar/annovar.xml

diff -r 000000000000 -r d3a72e55deca README
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README Wed Sep 18 10:51:20 2013 -0400

[

@@ -0,0 +1,213 @@
+ANNOVAR needs to be installed manually in the following way:
+
+
+1) If you already have ANNOVAR installed on your system, simply edit the tool-data/annovar.loc file to reflect locations of
+ the perl scripts (annotate_variation.pl and convert2annovar.pl) and humandb directory (directory containing the annovar database files)
+1b) Restart galaxy instance for changes in .loc file to take effect
+
+
+2) If you do not have ANNOVAR installed, request annovar download and sign license here:
+ http://www.openbioinformatics.org/annovar/annovar_download_form.php
+
+ 3) Once downloaded, install annovar per the installation instructions and edit annovar.loc file to reflect location of directory containing perl scripts.
+ tool uses annotate_variation.pl and convert2annovar.pl
+
+ 4) Then download all desired databases for all desired builds as follows:
+ annotate_variation.pl -downdb -buildver <build> [-webfrom annovar] <database> <humandb>
+
+ where <humandb> is location where all database files should be stored
+ and <database> is the database file to download, e.g. refGene (see bottom of document for all available database files at the time of writing this tool)
+ and <build> can be hg18 or hg19 for humans, also other organisms available.
+
+ list of all available databases can be found here: http://www.openbioinformatics.org/annovar/annovar_db.html
+
+ 5) edit the tool-data/annovar.loc file to reflect location of humandb folder
+ 5b) restart galaxy instance for changes in .loc file to take effect
+
+6) Tool uses cgatools join for combining of files, this should be installed automatically with repository. If not, get a copy from Complete Genomics directly:
+ wget http://sourceforge.net/projects/cgatools/files/1.7.1/cgatools-1.7.1.5-linux_binary-x86_64.tar.gz
+ tar xvzf cgatools-1.7.1.5-linux_binary-x86_64.tar.gz
+
+ and place the "cgatools" binary found in bin/ directory on your $PATH
+
+
+list of files in my own humandb folder:
+
+ hg18_ALL.sites.2012_04.txt
+ hg18_ALL.sites.2012_04.txt.idx
+ hg18_avsift.txt
+ hg18_avsift.txt.idx
+ hg18_CEU.sites.2010_07.txt
+ hg18_CEU.sites.2010_07.txt.idx
+ hg18_cg46.txt
+ hg18_cg46.txt.idx
+ hg18_cg69.txt
+ hg18_cg69.txt.idx
+ hg18_cytoBand.txt
+ hg18_dgv.txt
+ hg18_ensGeneMrna.fa
+ hg18_ensGene.txt
+ hg18_esp5400_aa.txt
+ hg18_esp5400_aa.txt.idx
+ hg18_esp5400_all.txt
+ hg18_esp5400_all.txt.idx
+ hg18_esp5400_ea.txt
+ hg18_esp5400_ea.txt.idx
+ hg18_esp6500_aa.txt
+ hg18_esp6500_aa.txt.idx
+ hg18_esp6500_all.txt
+ hg18_esp6500_all.txt.idx
+ hg18_esp6500_ea.txt
+ hg18_esp6500_ea.txt.idx
+ hg18_esp6500si_aa.txt
+ hg18_esp6500si_aa.txt.idx
+ hg18_esp6500si_all.txt
+ hg18_esp6500si_all.txt.idx
+ hg18_esp6500si_ea.txt
+ hg18_esp6500si_ea.txt.idx
+ hg18_example_db_generic.txt
+ hg18_example_db_gff3.txt
+ hg18_genomicSuperDups.txt
+ hg18_gerp++gt2.txt
+ hg18_gerp++gt2.txt.idx
+ hg18_gwasCatalog.txt
+ hg18_JPTCHB.sites.2010_07.txt
+ hg18_JPTCHB.sites.2010_07.txt.idx
+ hg18_keggMapDesc.txt
+ hg18_keggPathway.txt
+ hg18_kgXref.txt
+ hg18_knownGeneMrna.fa
+ hg18_knownGene.txt
+ hg18_ljb_all.txt
+ hg18_ljb_all.txt.idx
+ hg18_ljb_lrt.txt
+ hg18_ljb_lrt.txt.idx
+ hg18_ljb_mt.txt
+ hg18_ljb_mt.txt.idx
+ hg18_ljb_phylop.txt
+ hg18_ljb_phylop.txt.idx
+ hg18_ljb_pp2.txt
+ hg18_ljb_pp2.txt.idx
+ hg18_ljb_sift.txt
+ hg18_ljb_sift.txt.idx
+ hg18_phastConsElements44way.txt
+ hg18_refGeneMrna.fa
+ hg18_refGene.txt
+ hg18_refLink.txt
+ hg18_snp128NonFlagged.txt
+ hg18_snp128NonFlagged.txt.idx
+ hg18_snp128.txt
+ hg18_snp128.txt.idx
+ hg18_snp129NonFlagged.txt
+ hg18_snp129NonFlagged.txt.idx
+ hg18_snp129.txt
+ hg18_snp129.txt.idx
+ hg18_snp130NonFlagged.txt
+ hg18_snp130NonFlagged.txt.idx
+ hg18_snp130.txt
+ hg18_snp130.txt.idx
+ hg18_snp131NonFlagged.txt
+ hg18_snp131NonFlagged.txt.idx
+ hg18_snp131.txt
+ hg18_snp131.txt.idx
+ hg18_snp132NonFlagged.txt
+ hg18_snp132NonFlagged.txt.idx
+ hg18_snp132.txt
+ hg18_snp132.txt.idx
+ hg18_tfbsConsSites.txt
+ hg18_YRI.sites.2010_07.txt
+ hg18_YRI.sites.2010_07.txt.idx
+ hg19_AFR.sites.2012_04.txt
+ hg19_AFR.sites.2012_04.txt.idx
+ hg19_ALL.sites.2010_11.txt
+ hg19_ALL.sites.2010_11.txt.idx
+ hg19_ALL.sites.2012_02.txt
+ hg19_ALL.sites.2012_02.txt.idx
+ hg19_ALL.sites.2012_04.txt
+ hg19_ALL.sites.2012_04.txt.idx
+ hg19_AMR.sites.2012_04.txt
+ hg19_AMR.sites.2012_04.txt.idx
+ hg19_ASN.sites.2012_04.txt
+ hg19_ASN.sites.2012_04.txt.idx
+ hg19_avsift.txt
+ hg19_avsift.txt.idx
+ hg19_cg46.txt
+ hg19_cg46.txt.idx
+ hg19_cg69.txt
+ hg19_cg69.txt.idx
+ hg19_cosmic61.txt
+ hg19_cosmic61.txt.idx
+ hg19_cosmic63.txt
+ hg19_cosmic63.txt.idx
+ hg19_cosmic64.txt
+ hg19_cosmic64.txt.idx
+ hg19_cosmic65.txt
+ hg19_cosmic65.txt.idx
+ hg19_cytoBand.txt
+ hg19_dgv.txt
+ hg19_ensGeneMrna.fa
+ hg19_ensGene.txt
+ hg19_esp5400_aa.txt
+ hg19_esp5400_aa.txt.idx
+ hg19_esp5400_all.txt
+ hg19_esp5400_all.txt.idx
+ hg19_esp5400_ea.txt
+ hg19_esp5400_ea.txt.idx
+ hg19_esp6500_aa.txt
+ hg19_esp6500_aa.txt.idx
+ hg19_esp6500_all.txt
+ hg19_esp6500_all.txt.idx
+ hg19_esp6500_ea.txt
+ hg19_esp6500_ea.txt.idx
+ hg19_esp6500si_aa.txt
+ hg19_esp6500si_aa.txt.idx
+ hg19_esp6500si_all.txt
+ hg19_esp6500si_all.txt.idx
+ hg19_esp6500si_ea.txt
+ hg19_esp6500si_ea.txt.idx
+ hg19_EUR.sites.2012_04.txt
+ hg19_EUR.sites.2012_04.txt.idx
+ hg19_genomicSuperDups.txt
+ hg19_gerp++gt2.txt
+ hg19_gerp++gt2.txt.idx
+ hg19_gwasCatalog.txt
+ hg19_keggMapDesc.txt
+ hg19_keggPathway.txt
+ hg19_kgXref.txt
+ hg19_knownGeneMrna.fa
+ hg19_knownGene.txt
+ hg19_ljb_all.txt
+ hg19_ljb_all.txt.idx
+ hg19_ljb_lrt.txt
+ hg19_ljb_lrt.txt.idx
+ hg19_ljb_mt.txt
+ hg19_ljb_mt.txt.idx
+ hg19_ljb_phylop.txt
+ hg19_ljb_phylop.txt.idx
+ hg19_ljb_pp2.txt
+ hg19_ljb_pp2.txt.idx
+ hg19_ljb_sift.txt
+ hg19_ljb_sift.txt.idx
+ hg19_phastConsElements46way.txt
+ hg19_refGeneMrna.fa
+ hg19_refGene.txt
+ hg19_refLink.txt
+ hg19_snp130NonFlagged.txt
+ hg19_snp130NonFlagged.txt.idx
+ hg19_snp130.txt
+ hg19_snp130.txt.idx
+ hg19_snp131NonFlagged.txt
+ hg19_snp131NonFlagged.txt.idx
+ hg19_snp131.txt
+ hg19_snp132NonFlagged.txt
+ hg19_snp132NonFlagged.txt.idx
+ hg19_snp132.txt
+ hg19_snp132.txt.idx
+ hg19_snp135NonFlagged.txt
+ hg19_snp135NonFlagged.txt.idx
+ hg19_snp135.txt
+ hg19_snp137NonFlagged.txt
+ hg19_snp137NonFlagged.txt.idx
+ hg19_snp137.txt
+ hg19_tfbsConsSites.txt
+

diff -r 000000000000 -r d3a72e55deca README~
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README~ Wed Sep 18 10:51:20 2013 -0400

[

@@ -0,0 +1,211 @@
+ANNOVAR needs to be installed manually in the following way:
+
+
+1) If you already have ANNOVAR installed on your system, simply edit the tool-data/annovar.loc file to reflect locations of
+ the perl scripts (annotate_variation.pl and convert2annovar.pl) and humandb directory (directory containing the annovar database files)
+1b) Restart galaxy instance for changes in .loc file to take effect
+
+
+2) If you do not have ANNOVAR installed, request annovar download and sign license here:
+ http://www.openbioinformatics.org/annovar/annovar_download_form.php
+
+ 3) Once downloaded, install annovar per the installation instructions and edit annovar.loc file to reflect location of directory containing perl scripts.
+ tool uses annotate_variation.pl and convert2annovar.pl
+
+ 4) Then download all desired databases for all desired builds as follows:
+ annotate_variation.pl -downdb -buildver <build> [-webfrom annovar] <database> <humandb>
+
+ where <humandb> is location where all database files should be stored
+ and <database> is the database file to download, e.g. refGene (see bottom of document for all available database files at the time of writing this tool)
+ and <build> can be hg18 or hg19 for humans, also other organisms available.
+
+ list of all available databases can be found here: http://www.openbioinformatics.org/annovar/annovar_db.html
+
+ 5) edit the tool-data/annovar.loc file to reflect location of humandb folder
+ 5b) restart galaxy instance for changes in .loc file to take effect
+
+6) Tool uses cgatools join for combining of files, this should be installed automatically with repository. If not, get a copy from Complete Genomics directly:
+ wget http://sourceforge.net/projects/cgatools/files/1.7.1/cgatools-1.7.1.5-linux_binary-x86_64.tar.gz
+ tar xvzf cgatools-1.7.1.5-linux_binary-x86_64.tar.gz
+
+ and place the "cgatools" binary found in bin/ directory on your $PATH
+
+
+list of files in my own humandb folder:
+
+ hg18_ALL.sites.2012_04.txt
+ hg18_ALL.sites.2012_04.txt.idx
+ hg18_avsift.txt
+ hg18_avsift.txt.idx
+ hg18_CEU.sites.2010_07.txt
+ hg18_CEU.sites.2010_07.txt.idx
+ hg18_cg46.txt
+ hg18_cg46.txt.idx
+ hg18_cg69.txt
+ hg18_cg69.txt.idx
+ hg18_cytoBand.txt
+ hg18_dgv.txt
+ hg18_ensGeneMrna.fa
+ hg18_ensGene.txt
+ hg18_esp5400_aa.txt
+ hg18_esp5400_aa.txt.idx
+ hg18_esp5400_all.txt
+ hg18_esp5400_all.txt.idx
+ hg18_esp5400_ea.txt
+ hg18_esp5400_ea.txt.idx
+ hg18_esp6500_aa.txt
+ hg18_esp6500_aa.txt.idx
+ hg18_esp6500_all.txt
+ hg18_esp6500_all.txt.idx
+ hg18_esp6500_ea.txt
+ hg18_esp6500_ea.txt.idx
+ hg18_esp6500si_aa.txt
+ hg18_esp6500si_aa.txt.idx
+ hg18_esp6500si_all.txt
+ hg18_esp6500si_all.txt.idx
+ hg18_esp6500si_ea.txt
+ hg18_esp6500si_ea.txt.idx
+ hg18_example_db_generic.txt
+ hg18_example_db_gff3.txt
+ hg18_genomicSuperDups.txt
+ hg18_gerp++gt2.txt
+ hg18_gerp++gt2.txt.idx
+ hg18_gwasCatalog.txt
+ hg18_JPTCHB.sites.2010_07.txt
+ hg18_JPTCHB.sites.2010_07.txt.idx
+ hg18_keggMapDesc.txt
+ hg18_keggPathway.txt
+ hg18_kgXref.txt
+ hg18_knownGeneMrna.fa
+ hg18_knownGene.txt
+ hg18_ljb_all.txt
+ hg18_ljb_all.txt.idx
+ hg18_ljb_lrt.txt
+ hg18_ljb_lrt.txt.idx
+ hg18_ljb_mt.txt
+ hg18_ljb_mt.txt.idx
+ hg18_ljb_phylop.txt
+ hg18_ljb_phylop.txt.idx
+ hg18_ljb_pp2.txt
+ hg18_ljb_pp2.txt.idx
+ hg18_ljb_sift.txt
+ hg18_ljb_sift.txt.idx
+ hg18_phastConsElements44way.txt
+ hg18_refGeneMrna.fa
+ hg18_refGene.txt
+ hg18_refLink.txt
+ hg18_snp128NonFlagged.txt
+ hg18_snp128NonFlagged.txt.idx
+ hg18_snp128.txt
+ hg18_snp128.txt.idx
+ hg18_snp129NonFlagged.txt
+ hg18_snp129NonFlagged.txt.idx
+ hg18_snp129.txt
+ hg18_snp129.txt.idx
+ hg18_snp130NonFlagged.txt
+ hg18_snp130NonFlagged.txt.idx
+ hg18_snp130.txt
+ hg18_snp130.txt.idx
+ hg18_snp131NonFlagged.txt
+ hg18_snp131NonFlagged.txt.idx
+ hg18_snp131.txt
+ hg18_snp131.txt.idx
+ hg18_snp132NonFlagged.txt
+ hg18_snp132NonFlagged.txt.idx
+ hg18_snp132.txt
+ hg18_snp132.txt.idx
+ hg18_tfbsConsSites.txt
+ hg18_YRI.sites.2010_07.txt
+ hg18_YRI.sites.2010_07.txt.idx
+ hg19_AFR.sites.2012_04.txt
+ hg19_AFR.sites.2012_04.txt.idx
+ hg19_ALL.sites.2010_11.txt
+ hg19_ALL.sites.2010_11.txt.idx
+ hg19_ALL.sites.2012_02.txt
+ hg19_ALL.sites.2012_02.txt.idx
+ hg19_ALL.sites.2012_04.txt
+ hg19_ALL.sites.2012_04.txt.idx
+ hg19_AMR.sites.2012_04.txt
+ hg19_AMR.sites.2012_04.txt.idx
+ hg19_ASN.sites.2012_04.txt
+ hg19_ASN.sites.2012_04.txt.idx
+ hg19_avsift.txt
+ hg19_avsift.txt.idx
+ hg19_cg46.txt
+ hg19_cg46.txt.idx
+ hg19_cg69.txt
+ hg19_cg69.txt.idx
+ hg19_cosmic61.txt
+ hg19_cosmic61.txt.idx
+ hg19_cosmic63.txt
+ hg19_cosmic63.txt.idx
+ hg19_cosmic64.txt
+ hg19_cosmic64.txt.idx
+ hg19_cytoBand.txt
+ hg19_dgv.txt
+ hg19_ensGeneMrna.fa
+ hg19_ensGene.txt
+ hg19_esp5400_aa.txt
+ hg19_esp5400_aa.txt.idx
+ hg19_esp5400_all.txt
+ hg19_esp5400_all.txt.idx
+ hg19_esp5400_ea.txt
+ hg19_esp5400_ea.txt.idx
+ hg19_esp6500_aa.txt
+ hg19_esp6500_aa.txt.idx
+ hg19_esp6500_all.txt
+ hg19_esp6500_all.txt.idx
+ hg19_esp6500_ea.txt
+ hg19_esp6500_ea.txt.idx
+ hg19_esp6500si_aa.txt
+ hg19_esp6500si_aa.txt.idx
+ hg19_esp6500si_all.txt
+ hg19_esp6500si_all.txt.idx
+ hg19_esp6500si_ea.txt
+ hg19_esp6500si_ea.txt.idx
+ hg19_EUR.sites.2012_04.txt
+ hg19_EUR.sites.2012_04.txt.idx
+ hg19_genomicSuperDups.txt
+ hg19_gerp++gt2.txt
+ hg19_gerp++gt2.txt.idx
+ hg19_gwasCatalog.txt
+ hg19_keggMapDesc.txt
+ hg19_keggPathway.txt
+ hg19_kgXref.txt
+ hg19_knownGeneMrna.fa
+ hg19_knownGene.txt
+ hg19_ljb_all.txt
+ hg19_ljb_all.txt.idx
+ hg19_ljb_lrt.txt
+ hg19_ljb_lrt.txt.idx
+ hg19_ljb_mt.txt
+ hg19_ljb_mt.txt.idx
+ hg19_ljb_phylop.txt
+ hg19_ljb_phylop.txt.idx
+ hg19_ljb_pp2.txt
+ hg19_ljb_pp2.txt.idx
+ hg19_ljb_sift.txt
+ hg19_ljb_sift.txt.idx
+ hg19_phastConsElements46way.txt
+ hg19_refGeneMrna.fa
+ hg19_refGene.txt
+ hg19_refLink.txt
+ hg19_snp130NonFlagged.txt
+ hg19_snp130NonFlagged.txt.idx
+ hg19_snp130.txt
+ hg19_snp130.txt.idx
+ hg19_snp131NonFlagged.txt
+ hg19_snp131NonFlagged.txt.idx
+ hg19_snp131.txt
+ hg19_snp132NonFlagged.txt
+ hg19_snp132NonFlagged.txt.idx
+ hg19_snp132.txt
+ hg19_snp132.txt.idx
+ hg19_snp135NonFlagged.txt
+ hg19_snp135NonFlagged.txt.idx
+ hg19_snp135.txt
+ hg19_snp137NonFlagged.txt
+ hg19_snp137NonFlagged.txt.idx
+ hg19_snp137.txt
+ hg19_tfbsConsSites.txt
+

diff -r 000000000000 -r d3a72e55deca tool-data/annovar.loc.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool-data/annovar.loc.sample Wed Sep 18 10:51:20 2013 -0400

@@ -0,0 +1,6 @@
+#loc file for annovar tool
+
+# <columns>value, dbkey, name, ANNOVAR_scripts, ANNOVAR_humandb</columns>
+
+hg18 hg18 build 36 (hg18) /path/to/annovarscripts /path/to/humandb
+hg19 hg19 build 37 (hg19) /path/to/annovarscripts /path/to/humandb

diff -r 000000000000 -r d3a72e55deca tool_data_table_conf.xml.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_data_table_conf.xml.sample Wed Sep 18 10:51:20 2013 -0400

@@ -0,0 +1,5 @@
+
+<table name="annovar_loc" comment_char="#">
+<columns>value, dbkey, name, ANNOVAR_scripts, ANNOVAR_humandb</columns>
+<file path="tool-data/annovar.loc" />
+</table>

diff -r 000000000000 -r d3a72e55deca tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_dependencies.xml Wed Sep 18 10:51:20 2013 -0400

@@ -0,0 +1,23 @@
+<?xml version="1.0"?>
+<tool_dependency>
+ <package name="cgatools17" version="1">
+        <install version="1.0">
+            <actions>
+                <action type="download_by_url">http://sourceforge.net/projects/cgatools/files/1.7.1/cgatools-1.7.1.5-linux_binary-x86_64.tar.gz</action>
+ <action type="shell_command"> chmod a+x bin/cgatools</action>
+                <action type="move_file">
+                 <source>bin/cgatools</source>
+                 <destination>$INSTALL_DIR/bin</destination>
+                </action>
+ <action type="set_environment">
+                    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>
+                    <environment_variable name="PATH" action="prepend_to">$REPOSITORY_INSTALL_DIR</environment_variable>
+                </action>
+            </actions>
+        </install>
+        <readme>
+ Downloads and installs the cgatools binary.
+        </readme>
+    </package>
+</tool_dependency>
+

diff -r 000000000000 -r d3a72e55deca tools/annovar/annovar.sh
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/annovar/annovar.sh Wed Sep 18 10:51:20 2013 -0400

[

b'@@ -0,0 +1,1203 @@\n+#!/bin/bash\n+\n+test="N"\n+\n+\n+function usage(){\n+\techo "usage: $0 todo"\n+}\n+\n+function runfilter(){\n+\tifile=$1\t\n+\tcolumnname=$2\n+\tthreshold=$3\n+\n+\tif [[ $threshold == "-1" ]]\n+\tthen\n+\t\techo "not filtering"\n+\t\treturn\n+\tfi\n+\t\n+\techo "filtering: $columnname, $threshold"\n+\tcat $ifile\n+\n+\t#get column number corresponding to column header\n+\tcolumn=`awk \'BEGIN{\n+\t\t\t\t\tFS="\\t";\n+\t\t\t\t\tcol=-1\n+\t\t\t\t}{\n+\t\t\t\t\tif(FNR==1){\n+\t\t\t\t\t\tfor(i=1;i<=NF;i++){\n+\t\t\t\t\t\t\tif($i == "\'"${columnname}"\'") \n+\t\t\t\t\t\t\t\tcol=i \n+\t\t\t\t\t\t} \n+\t\t\t\t\t\tprint col \n+\t\t\t\t\t}\n+\t\t\t\t}\' $ifile `\n+\n+\tif [ $column == -1 ]\n+\tthen\n+\t\techo "no such column, exiting"\n+\t\treturn\n+\tfi\t\n+\n+\t#perform filtering using the threshold\n+\tawk \'BEGIN{\n+\t\tFS="\\t";\n+\t\tOFS="\\t";\n+\t}{\n+\t\tif(FNR==1) \n+\t\t\tprint $0; \n+\t\tif(FNR>1){\n+\t\t\tif( $"\'"${column}"\'" == "" ) # empty column, then print\n+\t\t\t\tprint $0\n+\t\t\telse if ("\'"${threshold}"\'" == "text"){} #if set to text dont check threshold\n+\t\t\t\t\n+\t\t\telse if ($"\'"${column}"\'" < "\'"${threshold}"\'") #else do check it\n+\t\t\t\tprint $0\t\n+\t\t}\n+\t}\' $ifile > tmpfile\n+\n+\tmv tmpfile $ifile\t\n+}\n+\n+# arguments: originalfile,resultfile,chrcol,startcol,endcol,refcol,obscol,addcols\n+function joinresults(){\n+\tofile=$1\n+\trfile=$2\n+\tcolchr=$3\n+\tcolstart=$4\n+\tcolend=$5\n+\tcolref=$6\n+\tcolobs=$7\n+\taddcols=$8 #e.g. "B.col1,B.col2"\n+\t\n+\ttest="N"\n+\t\n+\t# echo "joining result with original file"\n+\tif [ $test == "Y" ]\n+\tthen \t\n+\t\techo "ofile: $ofile"\n+\t\thead $ofile \n+\t\techo "rfile: $rfile"\n+\t\thead $rfile\n+\tfi\n+\tnumlines=`wc $rfile | cut -d" " -f2`\n+\t\n+\t# if empty results file, just add header fields\n+\tif [[ ! -s $rfile ]] \n+\tthen\t\t\t\n+\t\tdummycol=${addcols:2}\n+\t\toutputcol=${dummycol//",B."/"\t"}\n+\t\tnumcommas=`echo "$addcols" | grep -o "," | wc -l`\t\t\n+\t\t\n+\t\tawk \'BEGIN{FS="\\t";OFS="\\t"}{\n+\t\t\t\tif(FNR==1)\n+\t\t\t\t\tprint $0,"\'"$outputcol"\'"; \n+\t\t\t\telse{\n+\t\t\t\t\tprintf $0\n+\t\t\t\t\tfor(i=0;i<="\'"$numcommas"\'"+1;i++)\n+\t\t\t\t\t\tprintf "\\t"\n+\t\t\t\t\tprintf "\\n"\n+\t\t\t\t}\n+\t\t\t}END{}\' $ofile > tempofile\n+\t\t\t\n+\t\t\tmv tempofile $ofile\t\t\n+\t\treturn\n+\tfi\n+\t\n+\n+\t#get input file column names for cgatools join\n+\tcol_chr_name=`head -1 $rfile | cut -f${colchr}`\n+\tcol_start_name=`head -1 $rfile | cut -f${colstart}`\n+\tcol_end_name=`head -1 $rfile | cut -f${colend}`\n+\tcol_ref_name=`head -1 $rfile | cut -f${colref}`\n+\tcol_obs_name=`head -1 $rfile | cut -f${colobs}`\n+\n+\t#get annotation file column names for cgatools join\n+\tchr_name=`head -1 $ofile | cut -f${chrcol}`\n+\tstart_name=`head -1 $ofile | cut -f${startcol}`\n+\tend_name=`head -1 $ofile | cut -f${endcol}`\n+\tref_name=`head -1 $ofile | cut -f${refcol}`\n+\tobs_name=`head -1 $ofile | cut -f${obscol}`\n+\n+\tif [ $test == "Y" ]\t\n+\tthen\n+\t\techo "input file"\n+\t\techo "chr col: $col_chr_name ($colchr)"\t\n+\t\techo "start col: $col_start_name ($colstart)"\t\n+\t\techo "end col: $col_end_name ($colend)"\t\n+\t\techo "ref col: $col_ref_name ($colref)"\t\n+\t\techo "obs col: $col_obs_name ($colobs)"\t\n+\t\techo ""\n+\t\techo "annotation file"\n+\t\techo "chr col: $chr_name ($chrcol)"\t\n+\t\techo "start col: $start_name ($startcol)"\t\n+\t\techo "end col: $end_name ($endcol)"\t\n+\t\techo "ref col: $ref_name ($refcol)"\t\n+\t\techo "obs col: $obs_name ($obscol)"\t\n+\tfi\n+\n+\t#perform join\n+\tcgatools join --beta \\\n+\t\t--input $ofile $rfile \\\n+\t\t--output temporiginal \\\n+\t\t--match ${chr_name}:${col_chr_name} \\\n+\t\t--match ${start_name}:${col_start_name} \\\n+\t\t--match ${end_name}:${col_end_name} \\\n+\t\t--match ${ref_name}:${col_ref_name} \\\n+\t\t--match ${obs_name}:${col_obs_name} \\\n+\t\t--select A.*,$addcols \\\n+\t\t--always-dump \\\n+\t\t--output-mode compact \n+\n+\t#replace originalfile\n+\tsed -i \'s/^>//g\' temporiginal #join sometimes adds a \'>\' symbol to header\n+\tmv temporiginal originalfile\n+\t\t\n+\tif [ $test == "Y" ]\n+\tthen\n+\t\techo "joining complete"\n+\t\thead originalfile\n+\t\techo ""\t\n+\tfi\n+\t\n+}\n+\n+\n+\n+\n+set -- `getopt -n$0 -u -a --longoptions="inputfile: buildver: humandb: varfile: VCF: chrcol: startcol: endcol: refcol: obscol: vartypecol: convertcoords: geneanno: verdbsnp: tfbs: mce: cytoband: segdup: dgv: gwas: ver10'..b'ut $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cosmic64_dropped"\n+\t\tsed -i \'1i\\db\\tCOSMIC64\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC64\n+\n+\tfi\n+\t\n+\tif [[ $cosmic65 == "Y" && $buildver == "hg19" ]]\n+\tthen\n+\t\techo -e "\\nCOSMIC65 Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cosmic65 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cosmic65_dropped"\n+\t\tsed -i \'1i\\db\\tCOSMIC65\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.COSMIC65\n+\n+\tfi\n+\n+\t#cg46\n+\tif [[ $cg46 == "Y" ]]\n+\tthen\n+\t\techo -e "\\nCG 46 genomes Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cg46 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cg46_dropped"\n+\t\tsed -i \'1i\\db\\t\'${cg46_colheader}\'\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.${cg46_colheader}\n+\n+\tfi\n+\n+\n+\t#cg69\n+\tif [[ $cg69 == "Y" ]]\n+\tthen\n+\t\techo -e "\\nCG 69 genomes Annotation"\n+\t\t$scriptsdir/annotate_variation.pl --filter --buildver $buildver -dbtype cg69 annovarinput $humandb 2>&1\n+\t\n+\t\tannovarout="annovarinput.${buildver}_cg69_dropped"\n+\t\tsed -i \'1i\\db\\t\'${cg69_colheader}\'\\tchromosome\\tstart\\tend\\treference\\talleleSeq"\'"$vcfheader"\'"\' $annovarout \n+\t\tjoinresults originalfile $annovarout 3 4 5 6 7 B.${cg69_colheader}\n+\n+\tfi\n+\n+\n+\t\n+\tif [ $convertcoords == "Y" ]\n+\tthen\n+\t\techo "converting back coordinates"\n+\t\tawk \'BEGIN{\n+\t\t\t\tFS="\\t";\n+\t\t\t\tOFS="\\t";\n+\t\t\t}{\n+\t\t\t\tif (FNR==1)\n+\t\t\t\t\tprint $0\n+\t\t\t\tif(FNR>1) { \n+\t\t\t\t\t$"\'"${chrcol}"\'" = "chr"$"\'"${chrcol}"\'"\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "snp" ){ $"\'"${startcol}"\'" -= 1 }; \t\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "ins" ){ $"\'"${refcol}"\'" = "" };\t\t\t\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "del" ){ $"\'"${startcol}"\'" -=1; $"\'"${obscol}"\'" = "" };\n+\t\t\t\t\tif( $"\'"${vartypecol}"\'" == "sub" ){ $"\'"${startcol}"\'" -= 1 }; \n+\t\t\t\t\tprint $0\n+\t\t\t\t\t\t\t\t\n+\t\t\t\t}\n+\t\t\t}\t\n+\t\t\tEND{\n+\t\t\t}\' originalfile > originalfile_coords\n+\telse\n+\t\tmv originalfile originalfile_coords\n+\tfi\n+\n+\t#restore "chr" prefix?\n+\n+\t#move to outputfile\n+\tif [ ! -s annovarinput.invalid_input ]\n+\tthen\n+\t\techo "Congrats, your input file contained no invalid lines!" > annovarinput.invalid_input\n+\tfi\n+\t\n+\tcp originalfile_coords $outfile_all\n+\tcp annovarinput.invalid_input $outfile_invalid 2>&1\n+fi #if $dorunannovar\n+\n+\n+\n+\n+\n+############################################\n+#\n+# Filter Annotated Variants \n+#\n+############################################\n+\n+\n+if [[ $dofilter == "Y" ]]\n+then\n+\techo "starting filtering"\n+\tcp originalfile filteredfile\n+\t\n+\t### do the filtering\n+\t# usage: runfilter <column name> <threshold> (-1=do not filter, 0=filter any value)\n+\t\n+\t#1000genomes\n+\trunfilter filteredfile ${g1000_colheader_ALL} ${threshold_1000g_ALL}\n+\trunfilter filteredfile ${g1000_colheader_AFR} ${threshold_1000g_AFR}\n+\trunfilter filteredfile ${g1000_colheader_AMR} ${threshold_1000g_AMR}\n+\trunfilter filteredfile ${g1000_colheader_ASN} ${threshold_1000g_ASN}\n+\trunfilter filteredfile ${g1000_colheader_EUR} ${threshold_1000g_EUR}\n+\n+\t#esp\n+\trunfilter filteredfile ${esp6500_colheader_ALL} ${threshold_ESP6500_ALL}\n+\trunfilter filteredfile ${esp6500_colheader_EA} ${threshold_ESP6500_EA}\n+\trunfilter filteredfile ${esp6500_colheader_AA} ${threshold_ESP6500_AA}\t\t\n+\t\n+\t#dbsnp\n+\tfor version in $filt_dbsnpstr\n+\tdo\n+\t\tif [ $version == "None" ] \n+\t\tthen\n+\t\t\tbreak\n+\t\tfi \n+\t\trunfilter filteredfile "db$version" "text" #-42 will filter any non-empty string in that field\n+\n+\tdone\n+\t\n+\t#complete genomics\n+\trunfilter filteredfile ${cg46_colheader} ${threshold_cg46}\n+\trunfilter filteredfile ${cg69_colheader} ${threshold_cg69}\n+\n+\t#move filtered output file to galaxy output file\n+\tcp filteredfile $outfile_filt\n+\t\n+fi\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n+\n'

diff -r 000000000000 -r d3a72e55deca tools/annovar/annovar.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/annovar/annovar.xml Wed Sep 18 10:51:20 2013 -0400

b'@@ -0,0 +1,211 @@\n+<tool id="Annovar" name="ANNOVAR" version="2013aug">\n+\t<description> Annotate a file using ANNOVAR </description>\n+\t\n+\t<requirements>\t\t\n+\t\t<requirement type="package" version="1">cgatools17</requirement>\n+\t</requirements>\n+\t\n+\t<command interpreter="bash">\n+\t\tannovar.sh\t\t\n+\t\t--impactscores ${impactscores}\n+\t\t--esp ${esp}\n+\t\t--gerp ${gerp}\n+\t\t--cosmic61 ${cosmic61}\n+\t\t--cosmic63 ${cosmic63}\t\n+\t\t--cosmic64 ${cosmic64}\t\t\n+\t\t--cosmic65 ${cosmic65}\t\t\n+\t\t--outall ${annotated}\t\t\n+\t\t--outinvalid ${invalid}\n+\t\t--dorunannovar ${dorun}\n+\t\t--inputfile ${infile}\n+\t\t--buildver ${reference.fields.dbkey}\n+\t\t--humandb ${reference.fields.ANNOVAR_humandb}\n+\t\t--scriptsdir ${reference.fields.ANNOVAR_scripts}\t\n+\t\t--verdbsnp ${verdbsnp}\n+\t\t--geneanno ${geneanno}\n+\t\t--tfbs ${tfbs}\n+\t\t--mce ${mce}\n+\t\t--cytoband ${cytoband}\n+\t\t--segdup ${segdup}\n+ --dgv ${dgv}\n+\t\t--gwas ${gwas}\t\t\t\t\n+\t\t#if $filetype.type == "other"\n+\t\t\t--varfile N\n+\t\t\t--VCF N\n+\t\t\t--chrcol ${filetype.col_chr}\n+\t\t\t--startcol ${filetype.col_start}\n+\t\t\t--endcol ${filetype.col_end}\n+\t\t\t--obscol ${filetype.col_obs}\n+\t\t\t--refcol ${filetype.col_ref}\n+\t\t\n+\t\t\t#if $filetype.convertcoords.convert == "Y"\n+\t\t\t\t--vartypecol ${filetype.convertcoords.col_vartype}\n+\t\t\t\t--convertcoords Y\n+\t\t\t#else\n+\t\t\t\t--convertcoords N\n+\t\t\t#end if\n+\t\t#end if\n+\t\t#if $filetype.type == "vcf"\n+\t\t\t--varfile N\n+\t\t\t--VCF Y\n+\t\t\t--convertcoords N\n+\t\t#end if\n+\t\t#if $filetype.type == "varfile"\n+\t\t\t--varfile Y\n+\t\t\t--VCF N\t\t\t\n+\t\t#end if\t\t\t\n+\t\t--cg46 ${cgfortysix}\n+\t\t--cg69 ${cgsixtynine}\n+\t\t--ver1000g ${ver1000g}\n+\t\t\n+\t</command>\n+\t\t\n+\t<inputs>\n+\t\t<param name="dorun" type="hidden" value="Y"/> \n+\t\t<param name="reference" type="select" label="Reference">\n+\t\t\t<options from_data_table="annovar_loc" />\t\t\t\t\n+\t\t</param>\n+\t\t\t\t\n+\t\t<param name="infile" type="data" label="Select file to annotate" help="Must be CG varfile or a tab-separated file with a 1 line header"/>\n+\t\t<conditional name="filetype">\n+\t\t\t<param name="type" type="select" label="Select filetype" >\n+\t\t\t\t<option value="vcf" selected="false"> VCF4 file </option>\n+\t\t\t\t<option value="varfile" selected="false"> CG varfile </option>\n+\t\t\t\t<option value="other" selected="false"> Other </option>\n+\t\t\t</param>\n+\t\t\t<when value="other">\n+\t\t\t\t<param name="col_chr" type="data_column" data_ref="infile" multiple="False" label="Chromosome Column" /> \n+\t\t\t\t<param name="col_start" type="data_column" data_ref="infile" multiple="False" label="Start Column" /> \n+\t\t\t\t<param name="col_end" type="data_column" data_ref="infile" multiple="False" label="End Column" /> \n+\t\t\t\t<param name="col_ref" type="data_column" data_ref="infile" multiple="False" label="Reference Allele Column" /> \n+\t\t\t\t<param name="col_obs" type="data_column" data_ref="infile" multiple="False" label="Observed Allele Column" /> \t\n+\t\t\t\t<conditional name="convertcoords">\n+\t\t\t\t\t<param name="convert" type="select" label="Is this file using Complete Genomics (0-based half-open) cooridinates?" >\n+\t\t\t\t\t\t<option value="Y"> Yes </option>\n+\t\t\t\t\t\t<option value="N" selected="True"> No </option>\n+\t\t\t\t\t</param>\n+\t\t\t\t\t<when value="Y">\n+\t\t\t\t\t\t<param name="col_vartype" type="data_column" data_ref="infile" multiple="False" label="varType Column" /> \n+\t\t\t\t\t</when>\n+\t\t\t\t</conditional>\n+\t\t\t</when>\n+\t\t</conditional>\n+\n+\n+\n+\t\t<param name="geneanno" type="select" label="Select Gene Annotation(s)" multiple="true" optional="true" display="checkboxes">\t\t\t\n+\t\t\t<option value="refSeq" selected="true" > RefSeq </option>\n+\t\t\t<option value="knowngene"> UCSC KnownGene </option>\n+\t\t\t<option value="ensgene" > Ensembl </option>\t\t\t\n+\t\t</param>\t\n+\n+\n+\t\t\n+\t\t<param name="cytoband" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Cytogenic band Annotation?" help="This option identifies Giemsa-stained chromosomes bands, (e.g. 1q21.1-q23.3)."/>\n+\t\t<param name="tfbs" type='..b'e="N" label="Annotate with 1000genomes project? (version 2012april)"/>\n+\t\t-->\n+\n+\n+\t<param name="esp" type="select" label="Select Exome Variant Server version(s) to annotate with" multiple="true" display="checkboxes" optional="true" help="si versions of databases contain indels and chrY calls">\t\t\t\n+\t\t\t<option value="esp6500si_all" > ESP6500si ALL </option>\n+\t\t\t<option value="esp6500si_ea" > ESP6500si European Americans </option>\n+\t\t\t<option value="esp6500si_aa" > ESP6500si African Americans </option>\n+\t\t\t<option value="esp6500_all" > ESP6500 ALL </option>\n+\t\t\t<option value="esp6500_ea" > ESP6500 European Americans </option>\n+\t\t\t<option value="esp6500_aa" > ESP6500 African Americans </option>\t\t\t\n+\t\t\t<option value="esp5400_all" > ESP5400 ALL </option>\n+\t\t\t<option value="esp5400_ea" > ESP5400 European Americans </option>\n+\t\t\t<option value="esp5400_aa" > ESP5400 African Americans </option>\t\t\t\n+\t\t</param>\t\n+\n+\n+\t\t<param name="gerp" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="GERP++ Annotation?" help="GERP identifies constrained elements in multiple alignments by quantifying substitution deficits (see http://mendel.stanford.edu/SidowLab/downloads/gerp/ for details) This option annotates those variants having GERP++>2 in human genome, as this threshold is typically regarded as evolutionarily conserved and potentially functional"/>\n+\t\n+\t\t<param name="cgfortysix" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 46 Genomes?" help="Diversity Panel; 46 unrelated individuals"/>\n+\t\t<param name="cgsixtynine" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Complete Genomics 69 Genomes?" help="Diversity Panel, Pedigree, YRI trio and PUR trio"/>\n+\t\t<param name="cosmic61" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC61? (hg19 only)"/>\n+\t\t<param name="cosmic63" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC63? (hg19 only)"/>\n+\t\t<param name="cosmic64" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC64? (hg19 only)"/>\n+\t\t<param name="cosmic65" type="boolean" checked="False" truevalue="Y" falsevalue="N" label="Annotate with COSMIC65? (hg19 only)"/>\n+\t\t\n+<param name="impactscores" type="select" label="Select functional impact scores annotate with" multiple="true" display="checkboxes" optional="true" help="LJB refers to Liu, Jian, Boerwinkle paper in Human Mutation, pubmed ID 21520341.">\t\t\t\n+\t\t\t<option value="avsift"> AV SIFT </option>\n+\t\t\t<option value="ljbsift"> LJB SIFT (corresponds to 1-SIFT)</option>\n+\t\t\t<option value="pp2"> PolyPhen2 </option>\n+\t\t\t<option value="mutationtaster" > MutationTaster </option>\n+\t\t\t<option value="lrt"> LRT (Likelihood Ratio Test) </option>\t\t\t\n+\t\t\t<option value="phylop"> PhyloP </option>\n+\t\t</param>\t\n+\t\t\t\n+\n+\t\t\n+\t\t<param name="fname" type="text" value="" label="Prefix for your output file" help="Optional"/>\t\t\n+\t\t\t\t\n+\t</inputs>\n+\n+\t<outputs>\n+\t\t<data format="tabular" name="invalid" label="$fname ANNOVAR Invalid input on ${on_string}"/>\t\n+\t\t<data format="tabular" name="annotated" label="$fname ANNOVAR Annotated variants on ${on_string}"/>\n+\t</outputs>\n+\n+\t<help> \n+**What it does**\n+\n+This tool will annotate a file using ANNOVAR.\n+\n+**ANNOVAR Website and Documentation**\n+\n+Website: http://www.openbioinformatics.org/annovar/\n+\n+Paper: http://nar.oxfordjournals.org/content/38/16/e164\n+\n+**Input Formats**\n+\n+Input Formats may be one of the following:\n+\t\n+\tVCF file\n+\t\n+\tComplete Genomics varfile\n+\t\n+\tCustom tab-delimited file (specify chromosome, start, end, reference allele, observed allele columns)\t\t\n+\n+\tCustom tab-delimited CG-derived file (specify chromosome, start, end, reference allele, observed allele, varType columns)\n+\t\t\n+\t</help>\n+\n+</tool>\n+\n+\n'