Repository 'defuse'
hg clone https://toolshed.g2.bx.psu.edu/repos/jjohnson/defuse

Changeset 11:b22f8634ff84 (2016-01-17)
Previous changeset 10:f65857c1b92e (2013-01-14) Next changeset 12:4fe2e80d4ae1 (2016-05-03)
Commit message:
planemo upload for repository https://github.com/jj-umn/galaxytools/tree/master/defuse commit 23b94b5747c6956360cd2eca0a07a669929ea141-dirty
modified:
README
defuse.xml
tool_dependencies.xml
added:
create_reference_dataset.xml
data_manager_conf.xml
datamanager_create_reference.py
datamanager_create_reference.xml
datatypes_conf.xml
defuse_bamfastq.xml
defuse_results_to_vcf.py
defuse_results_to_vcf.xml
defuse_trinity_analysis.py
defuse_trinity_analysis.xml
macros.xml
test-data/mm10_results.filtered.tsv
test-data/mm10_results.filtered.vcf
test-data/tophat_out2h.bam
tool-data/defuse_reference.loc.sample
tool_data_table_conf.xml.sample
removed:
tool-data/defuse.loc.sample
b
diff -r f65857c1b92e -r b22f8634ff84 README
--- a/README Mon Jan 14 12:24:28 2013 -0600
+++ b/README Sun Jan 17 14:11:06 2016 -0500
b
@@ -1,11 +1,12 @@
-The DeFuse galaxy tool is based on DeFuse_Version_0.6.0
+The DeFuse galaxy tool is based on DeFuse_Version_0.6.2
 http://sourceforge.net/apps/mediawiki/defuse/index.php?title=Main_Page
+https://bitbucket.org/dranew/defuse
 
 DeFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.
 
 
 Manual:
-http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.0
+http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.2
 
 The included tool_dependencies.xml will download and install the defuse code.  
 It will set the environment variable: "DEFUSE_PATH" to the location of the defuse install.  
@@ -34,8 +35,13 @@
 
 These datasets should be referenced in the tool-data/defuse.loc file. 
 
+The create_reference_dataset will run the create_reference_dataset.pl script to generate deFuse genome reference data in a galaxy dataset.   
+This should me made available in the future as a Galaxy DataManager.
 
-External Tools  ( http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.0 )
+
+Galaxy will try to auto-install dependencies:
+
+External Tools  ( http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.2 )
 deFuse relies on other publically available tools as part of its pipeline. Some of these tools are not included with the deFuse download. Obtain these tools as detailed below.
 Download samtools
 The latest version of samtools can be downloaded from sourceforge: https://sourceforge.net/projects/samtools/files/samtools.
b
diff -r f65857c1b92e -r b22f8634ff84 create_reference_dataset.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/create_reference_dataset.xml Sun Jan 17 14:11:06 2016 -0500
b
b'@@ -0,0 +1,308 @@\n+<tool id="create_defuse_reference" name="Create DeFuse Reference" version="@DEFUSE_VERSION@.1">\n+ <description>create a defuse reference from Ensembl and UCSC sources</description>\n+    <macros>\n+        <import>macros.xml</import>\n+    </macros>\n+    <requirements>\n+        <expand macro="defuse_requirement" />\n+        <expand macro="mapping_requirements" />\n+    </requirements>\n+  <command interpreter="command"> /bin/bash $defuse_script </command>\n+ <inputs>\n+  <conditional name="genome">\n+    <param name="choice" type="select" label="Select a Genome Build">\n+      <option value="GRCh38">Homo_sapiens GRCh38  hg38</option>\n+      <option value="GRCh37">Homo_sapiens GRCh37  hg19</option>\n+      <option value="NCBI36">Homo_sapiens NCBI36 hg18</option>\n+      <option value="GRCm38">Mus_musculus GRCm38 mm10</option>\n+      <option value="NCBIM37">Mus_musculus NCBIM37 mm9</option>\n+      <option value="Rnor_5.0">Rattus_norvegicus Rnor_5.0 rn5</option>\n+      <option value="user_specified">User specified</option>\n+    </param>\n+    <when value="GRCh38">\n+      <param name="ensembl_organism" type="hidden" value="homo_sapiens"/>\n+      <param name="ensembl_prefix" type="hidden" value="Homo_sapiens"/>\n+      <param name="ensembl_genome_version" type="hidden" value="GRCh38"/>\n+      <param name="ensembl_version" type="hidden" value="80"/>\n+      <param name="ncbi_organism" type="hidden" value="Homo_sapiens"/>\n+      <param name="ncbi_prefix" type="hidden" value="Hs"/>\n+      <param name="ucsc_genome_version" type="hidden" value="hg38"/>\n+      <param name="chromosomes" type="hidden" value="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT"/>\n+      <param name="mt_chromosome" type="hidden" value="MT"/>\n+      <param name="gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding"/>\n+      <param name="ig_gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,IG_pseudogene"/>\n+      <param name="rrna_gene_sources" type="hidden" value="Mt_rRNA,rRNA,rRNA_pseudogene"/>\n+    </when>\n+    <when value="GRCh37">\n+      <param name="ensembl_organism" type="hidden" value="homo_sapiens"/>\n+      <param name="ensembl_prefix" type="hidden" value="Homo_sapiens"/>\n+      <param name="ensembl_genome_version" type="hidden" value="GRCh37"/>\n+      <param name="ensembl_version" type="hidden" value="71"/>\n+      <param name="ncbi_organism" type="hidden" value="Homo_sapiens"/>\n+      <param name="ncbi_prefix" type="hidden" value="Hs"/>\n+      <param name="ucsc_genome_version" type="hidden" value="hg19"/>\n+      <param name="chromosomes" type="hidden" value="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT"/>\n+      <param name="mt_chromosome" type="hidden" value="MT"/>\n+      <param name="gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding"/>\n+      <param name="ig_gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,IG_pseudogene"/>\n+      <param name="rrna_gene_sources" type="hidden" value="Mt_rRNA,rRNA,rRNA_pseudogene"/>\n+    </when>\n+    <when value="NCBI36">\n+      <param name="ensembl_organism" type="hidden" value="homo_sapiens"/>\n+      <param name="ensembl_prefix" type="hidden" value="Homo_sapiens"/>\n+      <param name="ensembl_genome_version" type="hidden" value="NCBI36"/>\n+      <param name="ensembl_version" type="hidden" value="54"/>\n+      <param name="ncbi_organism" type="hidden" value="Homo_sapiens"/>\n+      <param name="ncbi_prefix" type="hidden" value="Hs"/>\n+      <param name="ucsc_genome_version" type="hidden" value="hg18"/>\n+      <param name="chromosomes" type="hidden" value="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT"/>\n+      <param name="mt_chromosome" type="hidden" value="MT"/>\n+      <param name="gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding"/>\n+ '..b' <configfile name="defuse_script">\n+#!/bin/bash\n+## define some things for cheetah proccessing\n+#set $amp = chr(38)\n+#set $gt = chr(62)\n+## substitute pathnames into config file\n+if `grep __DEFUSE_PATH__ $defuse_config ${gt} /dev/null`;then sed -i\'.tmp\' "s#__DEFUSE_PATH__#\\${DEFUSE_PATH}#" $defuse_config; fi\n+if `grep __SAMTOOLS_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} SAMTOOLS_BIN=`which samtools`;then sed -i\'.tmp\' "s#__SAMTOOLS_BIN__#\\${SAMTOOLS_BIN}#" $defuse_config; fi\n+if `grep __BOWTIE_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} BOWTIE_BIN=`which bowtie`;then sed -i\'.tmp\' "s#__BOWTIE_BIN__#\\${BOWTIE_BIN}#" $defuse_config; fi\n+if `grep __BOWTIE_BUILD_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} BOWTIE_BUILD_BIN=`which bowtie-build`;then sed -i\'.tmp\' "s#__BOWTIE_BUILD_BIN__#\\${BOWTIE_BUILD_BIN}#" $defuse_config; fi\n+if `grep __BLAT_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} BLAT_BIN=`which blat`;then sed -i\'.tmp\' "s#__BLAT_BIN__#\\${BLAT_BIN}#" $defuse_config; fi\n+if `grep __FATOTWOBIT_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} FATOTWOBIT_BIN=`which faToTwoBit`;then sed -i\'.tmp\' "s#__FATOTWOBIT_BIN__#\\${FATOTWOBIT_BIN}#" $defuse_config; fi\n+if `grep __GMAP_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} GMAP_BIN=`which gmap`;then sed -i\'.tmp\' "s#__GMAP_BIN__#\\${GMAP_BIN}#" $defuse_config; fi\n+if `grep __GMAP_SETUP_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} GMAP_SETUP_BIN=`which gmap_setup`;then sed -i\'.tmp\' "s#__GMAP_SETUP_BIN__#\\${GMAP_SETUP_BIN}#" $defuse_config; fi\n+if `grep __GMAP_INDEX_DIR__ $defuse_config ${gt} /dev/null` ${amp}${amp} GMAP_INDEX_DIR=`pwd`/gmap;then sed -i\'.tmp\' "s#__GMAP_INDEX_DIR__#\\${GMAP_INDEX_DIR}#" $defuse_config; fi\n+if `grep __R_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} R_BIN=`which R`;then sed -i\'.tmp\' "s#__R_BIN__#\\${R_BIN}#" $defuse_config; fi\n+if `grep __RSCRIPT_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} RSCRIPT_BIN=`which Rscript`;then sed -i\'.tmp\' "s#__RSCRIPT_BIN__#\\${RSCRIPT_BIN}#" $defuse_config; fi\n+## copy config to output\n+cp $defuse_config $config_txt\n+## make a data_dir  and ln -s the input fastq\n+mkdir -p $config_txt.dataset.extra_files_path\n+## create_reference_dataset.pl\n+perl \\${DEFUSE_PATH}/scripts/create_reference_dataset.pl -c $defuse_config \n+  </configfile>\n+ </configfiles>\n+\n+ <tests>\n+ </tests>\n+ <help>\n+**DeFuse**\n+\n+DeFuse_ is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.  See the DeFuse_Version_0.6_ manual for details.\n+\n+DeFuse uses a Reference Dataset to search for gene fusions.  The Reference Dataset is generated from the following sources in DeFuse_Version_0.6_:\n+    - genome_fasta from Ensembl\n+    - gene_models from Ensembl\n+    - repeats_filename from UCSC RepeatMasker rmsk.txt\n+    - est_fasta from UCSC\n+    - est_alignments from UCSC intronEst.txt\n+    - unigene_fasta from NCBI\n+\n+The create_defuse_reference Galaxy tool downloads the reference genome and other source files, and builds any derivative files including bowtie indices, gmap indices, and 2bit files. Expect this step to take at least 12 hours.\n+\n+\n+It will generate a config.txt file that can be input into the deFuse Galaxy tool.  \n+\n+Journal reference: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001138\n+\n+.. _DeFuse: http://sourceforge.net/apps/mediawiki/defuse/index.php?title=Main_Page\n+\n+.. _DeFuse_Version_0.6: http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.1\n+\n+------\n+\n+**Outputs**\n+\n+The galaxy history will contain: the config.txt file that provides DeFuse with the reference data paths.  \n+\n+ </help>\n+    <expand macro="citations"/>\n+</tool>\n'
b
diff -r f65857c1b92e -r b22f8634ff84 data_manager_conf.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/data_manager_conf.xml Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,25 @@
+<?xml version="1.0"?>
+<data_managers>
+  <data_manager tool_file="datamanager_create_reference.xml" id="data_manager_defuse_reference" >
+    <data_table name="defuse_reference">  <!-- Defines a Data Table to be modified. -->
+            <output> <!-- Handle the output of the Data Manager Tool -->
+                <column name="value" /> <!-- columns that are going to be specified by the Data Manager Tool -->
+                <column name="dbkey" />
+                <column name="name" />
+                <column name="path" output_ref="out_file" >  <!-- The value of this column will be modified based upon data in "out_file". example value "phiX.fa" -->
+                    <move type="directory"> <!-- Moving a file from the extra files path of "out_file" -->
+                        <!-- <source>${path}</source>--> <!-- out_file.extra_files_path is used as base by default --> <!-- if no source, eg for type=directory, then refers to base -->
+                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">${value}/defuse</target> <!-- Target Location to store the file, directories are created as needed -->
+                    </move>
+                    <!-- datamanager_create_reference.py should have copied the defuse config file to the working directory.  
+                         so if we put the ${dbkey}.config path in this column,  defuse.xml can set the data_directory to this this directory.
+                     -->
+                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/${value}/defuse/${value}.config</value_translation> <!-- Store this value in the final Data Table -->
+                    <value_translation type="function">abspath</value_translation>
+                </column>
+            </output>
+        </data_table>
+  </data_manager>
+</data_managers>
+
+
b
diff -r f65857c1b92e -r b22f8634ff84 datamanager_create_reference.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/datamanager_create_reference.py Sun Jan 17 14:11:06 2016 -0500
[
@@ -0,0 +1,118 @@
+#!/usr/bin/env python
+
+import sys
+import os
+import re
+import tempfile
+import subprocess
+import fileinput
+import shutil
+import optparse
+import urllib2
+from ftplib import FTP
+import tarfile
+
+from galaxy.util.json import from_json_string, to_json_string
+
+
+def stop_err(msg):
+    sys.stderr.write(msg)
+    sys.exit(1)
+
+def get_config_dict(config,dataset_directory=None):
+    keys = ['dataset_directory','ensembl_organism','ensembl_prefix','ensembl_version','ensembl_genome_version','ucsc_genome_version','ncbi_organism','ncbi_prefix','chromosomes','mt_chromosome','gene_sources','ig_gene_sources','rrna_gene_sources']
+    pat = '^([^=]+?)\s*=\s*(.*)$'
+    config_dict = {}
+    try:
+        fh = open(config)
+        for i,l in enumerate(fh):
+           line = l.strip() 
+           if line.startswith('#'):
+               continue
+           m = re.match(pat,line)
+           if m and len(m.groups()) == 2:
+               (k,v) = m.groups()
+               if k in keys:
+                   config_dict[k] = v
+    except Exception, e:
+        stop_err( 'Error parsing %s %s\n' % (config,str( e )) )
+    else:
+        fh.close()
+    if dataset_directory:
+        config_dict['dataset_directory'] = dataset_directory
+    return config_dict
+
+def run_defuse_script(data_manager_dict, params, target_directory, dbkey, description, config, script):
+    if not os.path.isdir(target_directory):
+        os.makedirs(target_directory)
+    ## Name the config consistently with data_manager_conf.xml
+    #  copy the config file to the target_directory
+    #  when DataManager moves files to there tool-data location, the config will get moved as well,
+    #   and the value_translation in data_manager_conf.xml will tell us the new location
+    #  defuse.xml will use the path to this config file to set the dataset_directory
+    config_name = '%s.config' % dbkey
+    defuse_config = os.path.join( target_directory, config_name)
+    shutil.copyfile(config,defuse_config) 
+    cmd = "/bin/bash %s %s" % (script,target_directory)
+    # Run
+    try:
+        tmp_out = tempfile.NamedTemporaryFile().name
+        tmp_stdout = open( tmp_out, 'wb' )
+        tmp_err = tempfile.NamedTemporaryFile().name
+        tmp_stderr = open( tmp_err, 'wb' )
+        proc = subprocess.Popen( args=cmd, shell=True, cwd=".", stdout=tmp_stdout, stderr=tmp_stderr )
+        returncode = proc.wait()
+        tmp_stderr.close()
+        # get stderr, allowing for case where it's very large
+        tmp_stderr = open( tmp_err, 'rb' )
+        stderr = ''
+        buffsize = 1048576
+        try:
+            while True:
+                stderr += tmp_stderr.read( buffsize )
+                if not stderr or len( stderr ) % buffsize != 0:
+                    break
+        except OverflowError:
+            pass
+        tmp_stdout.close()
+        tmp_stderr.close()
+        if returncode != 0:
+            raise Exception, stderr
+
+        # TODO: look for errors in program output.
+    except Exception, e:
+        stop_err( 'Error creating defuse reference:\n' + str( e ) )
+    config_dict = get_config_dict(config, dataset_directory=target_directory)
+    data_table_entry = dict(value=dbkey, dbkey=dbkey, name=description, path=config_name)
+    _add_data_table_entry( data_manager_dict, data_table_entry )
+def _add_data_table_entry( data_manager_dict, data_table_entry ):
+    data_manager_dict['data_tables'] = data_manager_dict.get( 'data_tables', {} )
+    data_manager_dict['data_tables']['defuse_reference'] = data_manager_dict['data_tables'].get( 'defuse_reference', [] )
+    data_manager_dict['data_tables']['defuse_reference'].append( data_table_entry )
+    return data_manager_dict
+
+def main():
+    #Parse Command Line
+    parser = optparse.OptionParser()
+    parser.add_option( '-k', '--dbkey', dest='dbkey', action='store', type="string", default=None, help='dbkey' )
+    parser.add_option( '-d', '--description', dest='description', action='store', type="string", default=None, help='description' )
+    parser.add_option( '-c', '--defuse_config', dest='defuse_config', action='store', type="string", default=None, help='defuse_config' )
+    parser.add_option( '-s', '--defuse_script', dest='defuse_script', action='store', type="string", default=None, help='defuse_script' )
+    (options, args) = parser.parse_args()
+
+    filename = args[0]
+
+    params = from_json_string( open( filename ).read() )
+    target_directory = params[ 'output_data' ][0]['extra_files_path']
+    os.mkdir( target_directory )
+    data_manager_dict = {}
+
+     
+    #Create Defuse Reference Data
+    run_defuse_script( data_manager_dict, params, target_directory, options.dbkey, options.description,options.defuse_config,options.defuse_script)
+
+    #save info to json file
+    open( filename, 'wb' ).write( to_json_string( data_manager_dict ) )
+
+if __name__ == "__main__": main()
+
b
diff -r f65857c1b92e -r b22f8634ff84 datamanager_create_reference.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/datamanager_create_reference.xml Sun Jan 17 14:11:06 2016 -0500
b
b'@@ -0,0 +1,307 @@\n+<tool id="data_manager_defuse_reference" name="DeFuse Reference DataManager" version="1.6.1" tool_type="manage_data">\n+ <description>create a defuse reference from Ensembl and UCSC sources</description>\n+ <requirements>\n+  <requirement type="package" version="0.6.1">defuse</requirement>\n+  <requirement type="package" version="0.1.18">samtools</requirement>\n+  <requirement type="package" version="1.0.0">bowtie</requirement>\n+  <requirement type="package" version="2013-05-09">gmap</requirement>\n+  <requirement type="package" version="latest">kent</requirement>\n+ </requirements>\n+ <command interpreter="python"> datamanager_create_reference.py \n+    --dbkey $genome.ensembl_genome_version \n+    --description "$genome.ensembl_prefix $genome.ensembl_genome_version ($genome.ucsc_genome_version)"\n+    --defuse_config $defuse_config\n+    --defuse_script $defuse_script\n+    $out_file\n+ </command>\n+ <inputs>\n+  <conditional name="genome">\n+    <param name="choice" type="select" label="Select a Genome Build">\n+      <option value="GRCh38">Homo_sapiens GRCh38  hg38</option>\n+      <option value="GRCh37">Homo_sapiens GRCh37  hg19</option>\n+      <option value="NCBI36">Homo_sapiens NCBI36 hg18</option>\n+      <option value="GRCm38">Mus_musculus GRCm38 mm10</option>\n+      <option value="NCBIM37">Mus_musculus NCBIM37 mm9</option>\n+      <option value="Rnor_5.0">Rattus_norvegicus Rnor_5.0 rn5</option>\n+      <option value="user_specified">User specified</option>\n+    </param>\n+    <when value="GRCh38">\n+      <param name="ensembl_organism" type="hidden" value="homo_sapiens"/>\n+      <param name="ensembl_prefix" type="hidden" value="Homo_sapiens"/>\n+      <param name="ensembl_genome_version" type="hidden" value="GRCh38"/>\n+      <param name="ensembl_version" type="hidden" value="80"/>\n+      <param name="ncbi_organism" type="hidden" value="Homo_sapiens"/>\n+      <param name="ncbi_prefix" type="hidden" value="Hs"/>\n+      <param name="ucsc_genome_version" type="hidden" value="hg38"/>\n+      <param name="chromosomes" type="hidden" value="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT"/>\n+      <param name="mt_chromosome" type="hidden" value="MT"/>\n+      <param name="gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding"/>\n+      <param name="ig_gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,IG_pseudogene"/>\n+      <param name="rrna_gene_sources" type="hidden" value="Mt_rRNA,rRNA,rRNA_pseudogene"/>\n+    </when>\n+    <when value="GRCh37">\n+      <param name="ensembl_organism" type="hidden" value="homo_sapiens"/>\n+      <param name="ensembl_prefix" type="hidden" value="Homo_sapiens"/>\n+      <param name="ensembl_genome_version" type="hidden" value="GRCh37"/>\n+      <param name="ensembl_version" type="hidden" value="71"/>\n+      <param name="ncbi_organism" type="hidden" value="Homo_sapiens"/>\n+      <param name="ncbi_prefix" type="hidden" value="Hs"/>\n+      <param name="ucsc_genome_version" type="hidden" value="hg19"/>\n+      <param name="chromosomes" type="hidden" value="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT"/>\n+      <param name="mt_chromosome" type="hidden" value="MT"/>\n+      <param name="gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding"/>\n+      <param name="ig_gene_sources" type="hidden" value="IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,IG_pseudogene"/>\n+      <param name="rrna_gene_sources" type="hidden" value="Mt_rRNA,rRNA,rRNA_pseudogene"/>\n+    </when>\n+    <when value="NCBI36">\n+      <param name="ensembl_organism" type="hidden" value="homo_sapiens"/>\n+      <param name="ensembl_prefix" type="hidden" value="Homo_sapiens"/>\n+      <param name="ensembl_genome_version" type="hidden" value="NCBI36"/>\n+      <param name="ensembl_version" type="hidden" value="54"/>\n+      <param name="ncbi_organism" type="hidden" value="Homo_sapiens"/>\n+    '..b'igfile name="defuse_script">#slurp\n+#!/bin/bash\n+## define some things for cheetah proccessing\n+#set $amp = chr(38)\n+#set $gt = chr(62)\n+## substitute pathnames into config file\n+if `grep __DATASET_DIRECTORY__ $defuse_config ${gt} /dev/null`;then sed -i\'.tmp\' "s#__DATASET_DIRECTORY__#\\$1#" $defuse_config; fi\n+if `grep __DEFUSE_PATH__ $defuse_config ${gt} /dev/null`;then sed -i\'.tmp\' "s#__DEFUSE_PATH__#\\${DEFUSE_PATH}#" $defuse_config; fi\n+if `grep __SAMTOOLS_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} SAMTOOLS_BIN=`which samtools`;then sed -i\'.tmp\' "s#__SAMTOOLS_BIN__#\\${SAMTOOLS_BIN}#" $defuse_config; fi\n+if `grep __BOWTIE_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} BOWTIE_BIN=`which bowtie`;then sed -i\'.tmp\' "s#__BOWTIE_BIN__#\\${BOWTIE_BIN}#" $defuse_config; fi\n+if `grep __BOWTIE_BUILD_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} BOWTIE_BUILD_BIN=`which bowtie-build`;then sed -i\'.tmp\' "s#__BOWTIE_BUILD_BIN__#\\${BOWTIE_BUILD_BIN}#" $defuse_config; fi\n+if `grep __BLAT_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} BLAT_BIN=`which blat`;then sed -i\'.tmp\' "s#__BLAT_BIN__#\\${BLAT_BIN}#" $defuse_config; fi\n+if `grep __FATOTWOBIT_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} FATOTWOBIT_BIN=`which faToTwoBit`;then sed -i\'.tmp\' "s#__FATOTWOBIT_BIN__#\\${FATOTWOBIT_BIN}#" $defuse_config; fi\n+if `grep __GMAP_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} GMAP_BIN=`which gmap`;then sed -i\'.tmp\' "s#__GMAP_BIN__#\\${GMAP_BIN}#" $defuse_config; fi\n+if `grep __GMAP_SETUP_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} GMAP_SETUP_BIN=`which gmap_setup`;then sed -i\'.tmp\' "s#__GMAP_SETUP_BIN__#\\${GMAP_SETUP_BIN}#" $defuse_config; fi\n+if `grep __GMAP_INDEX_DIR__ $defuse_config ${gt} /dev/null` ${amp}${amp} GMAP_INDEX_DIR=`pwd`/gmap;then sed -i\'.tmp\' "s#__GMAP_INDEX_DIR__#\\${GMAP_INDEX_DIR}#" $defuse_config; fi\n+if `grep __R_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} R_BIN=`which R`;then sed -i\'.tmp\' "s#__R_BIN__#\\${R_BIN}#" $defuse_config; fi\n+if `grep __RSCRIPT_BIN__ $defuse_config ${gt} /dev/null` ${amp}${amp} RSCRIPT_BIN=`which Rscript`;then sed -i\'.tmp\' "s#__RSCRIPT_BIN__#\\${RSCRIPT_BIN}#" $defuse_config; fi\n+## copy config to output\n+cp $defuse_config \\$1/defuse_config.txt\n+## Run the create_reference_dataset.pl\n+perl \\${DEFUSE_PATH}/scripts/create_reference_dataset.pl -c $defuse_config \n+  </configfile>\n+ </configfiles>\n+\n+ <tests>\n+ </tests>\n+ <help>\n+**DeFuse**\n+\n+DeFuse_ is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion.  See the DeFuse_Version_0.6_ manual for details.\n+\n+DeFuse uses a Reference Dataset to search for gene fusions.  The Reference Dataset is generated from the following sources in DeFuse_Version_0.6_:\n+    - genome_fasta from Ensembl\n+    - gene_models from Ensembl\n+    - repeats_filename from UCSC RepeatMasker rmsk.txt\n+    - est_fasta from UCSC\n+    - est_alignments from UCSC intronEst.txt\n+    - unigene_fasta from NCBI\n+\n+The create_defuse_reference Galaxy tool downloads the reference genome and other source files, and builds any derivative files including bowtie indices, gmap indices, and 2bit files. Expect this step to take at least 12 hours.\n+\n+\n+It will generate the refernce data for deFuse Galaxy tool.  \n+\n+Journal reference: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001138\n+\n+.. _DeFuse: http://sourceforge.net/apps/mediawiki/defuse/index.php?title=Main_Page\n+\n+.. _DeFuse_Version_0.6: http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.1\n+\n+------\n+\n+**Outputs**\n+\n+The galaxy history will contain: the config.txt file that provides DeFuse with the reference data paths.  \n+\n+ </help>\n+</tool>\n'
b
diff -r f65857c1b92e -r b22f8634ff84 datatypes_conf.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/datatypes_conf.xml Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,7 @@
+<?xml version="1.0"?>
+<datatypes>
+    <registration>
+        <datatype extension="defuse.conf" type="galaxy.datatypes.data:Text" subclass="True" display_in_upload="true"/>
+        <datatype extension="defuse.results.tsv" type="galaxy.datatypes.tabular:Tabular" subclass="True" display_in_upload="true"/>
+    </registration>
+</datatypes>
b
diff -r f65857c1b92e -r b22f8634ff84 defuse.xml
--- a/defuse.xml Mon Jan 14 12:24:28 2013 -0600
+++ b/defuse.xml Sun Jan 17 14:11:06 2016 -0500
[
b'@@ -1,103 +1,150 @@\n-<tool id="defuse" name="DeFuse" version="1.6">\n- <description>identify fusion transcripts</description>\n- <requirements>\n-  <requirement type="package" version="0.6.0">defuse</requirement>\n-  <requirement type="package" version="0.1.18">samtools</requirement>\n-  <requirement type="package" version="0.12.7">bowtie</requirement>\n-  <requirement type="package" version="2012-07-20">gmap</requirement>\n-  <requirement type="package" version="34x10">blat</requirement>\n-  <requirement type="package" version="34x10">fatotwobit</requirement>\n- </requirements>\n+<tool id="defuse" name="DeFuse" version="@DEFUSE_VERSION@.1">\n+    <description>identify fusion transcripts</description>\n+    <macros>\n+        <import>macros.xml</import>\n+    </macros>\n+    <requirements>\n+        <expand macro="defuse_requirement" />\n+        <expand macro="mapping_requirements" />\n+        <expand macro="r_requirements" />\n+    </requirements>\n   <command interpreter="command"> /bin/bash $shscript </command>\n  <inputs>\n   <param name="left_pairendreads" type="data" format="fastq" label="left part of read pairs" help="The left and right reads pairs must be in the same order, and not have any unpaired reads.  (FASTQ interlacer will pair reads and remove the unpaired.   FASTQ de-interlacer will separate the result into left and right reads.)"/>\n   <param name="right_pairendreads" type="data" format="fastq" label="right part of read pairs" help="In the same order as the left reads"/>\n+  <param name="library_name" type="text" value="unknown" label="library name" help="Value to put in the results library_name column">\n+    <validator type="length" min="1"/>\n+  </param>\n   <conditional name="refGenomeSource">\n-      <param name="genomeSource" type="select" label="Will you select a built-in DeFuse Reference Dataset, or supply a configuration from your history" help="">\n-        <option value="indexed">Use a built-in DeFuse Reference Dataset</option>\n-        <option value="history">Use a configuration from your history that specifies the DeFuse Reference Dataset</option>\n+    <param name="genomeSource" type="select" label="Will you select a built-in DeFuse Reference Dataset, or supply a configuration from your history" help="">\n+      <option value="indexed">Use a built-in DeFuse Reference Dataset</option>\n+      <option value="history">Use a configuration from your history that specifies the DeFuse Reference Dataset</option>\n+    </param>\n+    <when value="indexed">\n+      <param name="index" type="select" label="Select a Reference Dataset" help="if your genome of interest is not listed - contact Galaxy team">\n+        <options from_file="defuse_reference.loc">\n+          <column name="name" index="1"/>\n+          <column name="value" index="3"/>\n+          <filter type="sort_by" column="0" />\n+          <validator type="no_options" message="No indexes are available" />\n+        </options>\n+      </param>\n+    </when>\n+    <when value="history">\n+      <param name="config" type="data" format="defuse.conf" label="Defuse Config file" help=""/>\n+    </when>  <!-- history -->\n+  </conditional>  <!-- refGenomeSource -->\n+  <conditional name="defuse_param">\n+    <param name="settings" type="select" label="Defuse parameter settings" help="">\n+      <option value="preSet">Default settings</option>\n+      <option value="full">Full parameter list</option>\n+    </param>\n+    <when value="preSet" />\n+    <when value="full">\n+      <param name="max_insert_size" type="integer" value="500" optional="true" label="Bowtie max_insert_size" />\n+      <param name="dna_concordant_length" type="integer" value="2000" optional="true" label="Minimum gene fusion range dna_concordant_length" />\n+      <param name="discord_read_trim" type="integer" value="50" optional="true" label="Trim length for discordant reads discord_read_trim" help="(split reads are not trimmed)" />\n+      <param name="calculate_extra_annotations" type="select" label="Calculate extra annotations, fus'..b'/defuse.pl -c $defuse_config -d data_dir -o output_dir  -p 8\n+perl \\${DEFUSE_PATH}/scripts/defuse.pl -name "$library_name" -c $defuse_config -1 data_dir/reads_1.fastq -2 data_dir/reads_2.fastq -o output_dir  -p \\$GALAXY_SLOTS\n ## copy primary results to output datasets\n if [ -e output_dir/log/defuse.log ]; then cp output_dir/log/defuse.log $defuse_log; fi\n-if [ -e output_dir/results.tsv ]; then cp output_dir/results.tsv $results_tsv; fi\n+## if [ -e output_dir/results.tsv ]; then cp output_dir/results.tsv $results_tsv; fi\n if [ -e output_dir/results.filtered.tsv ]; then cp output_dir/results.filtered.tsv $results_filtered_tsv; fi\n if [ -e output_dir/results.classify.tsv ]; then cp output_dir/results.classify.tsv $results_classify_tsv; fi\n+#if $breakpoints_bam:\n+if [ -e output_dir/results.filtered.tsv ] ${amp}${amp}  [ -e output_dir/breakpoints.genome.psl ]\n+then\n+  awk "\\\\$10 ~ /^(`awk \'\\\\$1 ~ /[0-9]+/{print \\\\$1}\' output_dir/results.filtered.tsv | tr \'\\n\' \'|\'`)\\\\$/{print \\\\$0}" output_dir/breakpoints.genome.psl > breakpoints.genome.filtered.psl ${amp}${amp}\n+  psl2sam.pl breakpoints.genome.filtered.psl > breakpoints.genome.filtered.sam ${amp}${amp}\n+  samtools view -b -T /panfs/roc/rissdb/galaxy/genomes/NCBIM37/defuse/defuse.reference.fa -o breakpoints.genome.filtered.bam breakpoints.genome.filtered.sam ${amp}${amp}\n+  samtools sort breakpoints.genome.filtered.bam breakpoints ${amp}${amp}\n+  ## samtools index breakpoints.bam\n+  cp breakpoints.bam $fusions_bam\n+fi\n+#end if\n ## create html with links for output_dir\n #if $defuse_out.__str__ != \'None\':\n if [ -e $defuse_out ]\n then\n   echo \'${lt}html${gt}${lt}head${gt}${lt}title${gt}Defuse Output${lt}/title${gt}${lt}/head${gt}${lt}body${gt}\' ${gt} $defuse_out\n   echo \'${lt}h2${gt}Defuse Output Files${lt}/h2${gt}${lt}ul${gt}\' ${gt}${gt}  $defuse_out\n-  pushd $defuse_out.extra_files_path\n+  pushd $defuse_out.dataset.extra_files_path\n   for f in `find -L . -maxdepth 1 -type f`; \n    do fn=`basename ${ds}f`; echo \'${lt}li${gt}${lt}a href="\'${ds}fn\'"${gt}\'${ds}fn\'${lt}/a${gt}${lt}/li${gt}\' ${gt}${gt}  $defuse_out; \n   done\n@@ -623,8 +662,8 @@\n #if $fusion_reads.__str__ != \'None\':\n if [ -e output_dir/results.filtered.tsv -a -e $fusion_reads ] \n then\n-  mkdir -p $fusion_reads.extra_files_path\n-  results2html output_dir/results.filtered.tsv $fusion_reads $fusion_reads.extra_files_path\n+  mkdir -p $fusion_reads.dataset.extra_files_path\n+  results2html output_dir/results.filtered.tsv $fusion_reads $fusion_reads.dataset.extra_files_path\n fi\n #end if\n   </configfile>\n@@ -753,4 +792,5 @@\n   3596\tTGGGGGTTGAGGCTTCTGTTCCCAGGTTCCATGACCTCAGAGGTGGCTGGTGAGGTTATGACCTTTGCCCTCCAGCCCTGGCTTAAAACCTCAGCCCTAGGACCTGGTTAAAGGAAGGGGAGATGGAGCTTTGCCCCGACCCCCCCCCGTTCCCCTCACCTGTCAGCCCGAGCTGGGCCAGGGCCCCTAGGTGGGGAACTGGGCCGGGGGGCGGGCACAAGCGGAGGTGGTGCCCCCAAAAGGGCTCCCGGTGGGGTCTTGCTGAGAAGGTGAGGGGTTCCCGGGGCCGCAGCAGGTGGTGGTGGAGGAGCCAAGCGGCTGTAGAGCAAGGGGTGAGCAGGTTCCAGACCGTAGAGGCGGGCAGCGGCCACGGCCCCGGGTCCAGTTAGCTCCTCACCCGCCTCATAGAAGCGGGGTGGCCTTGCCAGGCGTGGGGGTGCTGCC|TTCCTTGGATGTGGTAGCCGTTTCTCAGGCTCCCTCTCCGGAATCGAACCCTGATTCCCCGTCACCCGTGGTCACCATGGTAGGCACGGCGACTACCATCGAAAGTTGATAGGGCAGACGTTCGAATGGGTCGTCGCCGCCACGGGGGGCGTGCGATCAGCCCGAGGTTATCTAGAGTCACCAAAGCCGCCGGCGCCCGCCCCCCGGCCGGGGCCGGAGAGGGGCTGACCGGGTTGGTTTTGATCTGATAAATGCACGCATCCCCCCCGCGAAGGGGGTCAGCGCCCGTCGGCATGTATTAGCTCTAGAATTACCACAGTTATCCAAGTAGGAGAGGAGCGAGCGACCAAAGGAACCATAACTGATTTAATGAGCCATTCGCAGTTTCACTGTACCGGCCGTGCGTACTTAGACATGCATGGCTTAATCTTTGAGACAAGCATATGCTACTGGCAGG\t250\t7.00711162298275e-72\t0.00912124762512338\t0.00684237452309549\tN\tN\t3.31745197152461\t3.47233119514066\t3.31745197152461\tsplitr\t7\t0.0157657657657656\t0\t0\tN\t0.0135135135135136\tN\tN\t0\t0\tENSG00000156860\tENSG00000212932\t-\t+\t16\t21\t30682131\t48111157\tcoding\tupstream\tFBRS\tRPL23AP4\t30670289\t48110676\t+\t+\t0.0157657657657656\t30680678\t9827473\t-\t+\tY\t-\t-\tN\toutput_dir\t2\t1\t1.11111111111111\t1\t1\t1\tN\tN\t0\t1\t9\t0.325530693397641\t0.296465452915709\t0.325530693397641\t0.296465452915709\t2\t-\t-\t\n \n  </help>\n+    <expand macro="citations"/>\n </tool>\n'
b
diff -r f65857c1b92e -r b22f8634ff84 defuse_bamfastq.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/defuse_bamfastq.xml Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,63 @@
+<?xml version="1.0"?>
+<tool id="defuse_bamfastq" name="Defuse BamFastq" version="@DEFUSE_VERSION@.1">
+  <description>converts a bam file to fastq files.</description>
+    <macros>
+        <import>macros.xml</import>
+    </macros>
+    <requirements>
+        <expand macro="defuse_requirement" />
+    </requirements>
+  <command>bamfastq
+    #if $pair == True :
+      $pair
+    #end if
+    #if $multiple == True :
+      $multiple
+    #end if
+    #if $rename == True :
+      $rename
+    #end if
+    -b $bamfile
+    -1 $fastq1
+    -2 $fastq2
+  </command>
+  <inputs>
+    <param name="bamfile" type="data" format="bam" label="Bam file"/> 
+    <param name="pair" type="boolean" truevalue="-p" falsevalue="" checked="true" label="Name contains pair info as /1 /2."/>
+    <param name="multiple" type="boolean" truevalue="-m" falsevalue="" checked="true" label="Bam contains multiple mappings per read."/>
+    <param name="rename" type="boolean" truevalue="-r" falsevalue="" checked="true" label="Rename with integer IDs."/>
+  </inputs>
+  <stdio>
+    <exit_code range="1:" level="fatal" description="Error" />
+  </stdio>
+  <outputs>
+    <data format="fastqsanger" name="fastq1" label="fastq1"  />
+    <data format="fastqsanger" name="fastq2" label="fastq2"  />
+  </outputs>
+  <tests>
+    <test>
+      <param name="bamfile" ftype="bam" value="tophat_out2h.bam" />
+      <param name="pair" value="True" />
+      <param name="multiple" value="True" />
+      <param name="rename" value="True" />
+      <output name="fastq1">
+        <assert_contents>
+          <has_text text="@test_mRNA_36_146_27/1" />
+          <not_has_text text="@test_mRNA_36_146_27/2" />
+          <not_has_text text="test_mRNA_150_290_0" />
+        </assert_contents>
+      </output>
+      <output name="fastq2">
+        <assert_contents>
+          <has_text text="@test_mRNA_36_146_27/2" />
+          <not_has_text text="@test_mRNA_36_146_27/1" />
+          <not_has_text text="test_mRNA_150_290_0" />
+        </assert_contents>
+      </output>
+    </test>
+  </tests>
+  <help>
+    bamfastq converts a bam file input into a pair of fastq files that can be used as input to deFuse.
+  </help>
+    <expand macro="citations"/>
+</tool>
b
diff -r f65857c1b92e -r b22f8634ff84 defuse_results_to_vcf.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/defuse_results_to_vcf.py Sun Jan 17 14:11:06 2016 -0500
[
b'@@ -0,0 +1,273 @@\n+#!/usr/bin/env python\n+"""\n+#\n+#------------------------------------------------------------------------------\n+#                         University of Minnesota\n+#         Copyright 2012, Regents of the University of Minnesota\n+#------------------------------------------------------------------------------\n+# Author:\n+#\n+#  James E Johnson\n+#  Jesse Erdmann\n+#\n+#------------------------------------------------------------------------------\n+"""\n+\n+\n+"""\n+This tool takes the defuse results.tsv  tab-delimited file as input and creates a Variant Call Format file as output.\n+"""\n+\n+import sys,re,os.path\n+import optparse\n+from optparse import OptionParser\n+\n+"""\n+http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42\n+\n+5. INFO keys used for structural variants\n+When the INFO keys reserved for encoding structural variants are used for imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles).\n+The following INFO keys are reserved for encoding structural variants. In general, when these keys are used by imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles).\n+##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">\n+##INFO=<ID=NOVEL,Number=0,Type=Flag,Description="Indicates a novel structural variation">\n+##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">\n+For precise variants, END is POS + length of REF allele - 1, and the for imprecise variants the corresponding best estimate.\n+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">\n+Value should be one of DEL, INS, DUP, INV, CNV, BND. This key can be derived from the REF/ALT fields but is useful for filtering.\n+##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">\n+One value for each ALT allele. Longer ALT alleles (e.g. insertions) have positive values, shorter ALT alleles (e.g. deletions) have negative values.\n+##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">\n+##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">\n+##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">\n+##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">\n+##INFO=<ID=BKPTID,Number=.,Type=String,Description="ID of the assembled alternate allele in the assembly file">\n+For precise variants, the consensus sequence the alternate allele assembly is derivable from the REF and ALT fields. However, the alternate allele assembly file may contain additional information about the characteristics of the alt allele contigs.\n+##INFO=<ID=MEINFO,Number=4,Type=String,Description="Mobile element info of the form NAME,START,END,POLARITY">\n+##INFO=<ID=METRANS,Number=4,Type=String,Description="Mobile element transduction info of the form CHR,START,END,POLARITY">\n+##INFO=<ID=DGVID,Number=1,Type=String,Description="ID of this element in Database of Genomic Variation">\n+##INFO=<ID=DBVARID,Number=1,Type=String,Description="ID of this element in DBVAR">\n+##INFO=<ID=DBRIPID,Number=1,Type=String,Description="ID of this element in DBRIP">\n+##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">\n+##INFO=<ID=PARID,Number=1,Type=String,Description="ID of partner breakend">\n+##INFO'..b'enomic_strand2\')]\n+      gene1 = fields[columns.index(\'gene1\')]\n+      gene2 = fields[columns.index(\'gene2\')]\n+      gene_info = \'GENEID=%s,%s\' % (gene1,gene2)\n+      gene_name1 = fields[columns.index(\'gene_name1\')]\n+      gene_name2 = fields[columns.index(\'gene_name2\')]\n+      gene_name_info = \'GENE=%s,%s\' % (gene_name1,gene_name2)\n+      gene_location1 = fields[columns.index(\'gene_location1\')]\n+      gene_location2 = fields[columns.index(\'gene_location2\')]\n+      gene_loc = \'GENELOC=%s,%s\' % (gene_location1,gene_location2)\n+      expression1 = int(fields[columns.index(\'expression1\')])\n+      expression2 = int(fields[columns.index(\'expression2\')])\n+      expr = \'EXPR=%d,%d\' % (expression1,expression2)\n+      genomic_break_pos1 = int(fields[columns.index(\'genomic_break_pos1\')])\n+      genomic_break_pos2 = int(fields[columns.index(\'genomic_break_pos2\')])\n+      breakpoint_homology = int(fields[columns.index(\'breakpoint_homology\')])\n+      homlen = \'HOMLEN=%s\' % breakpoint_homology\n+      orf = fields[columns.index(\'orf\')] == \'Y\'\n+      exonboundaries = fields[columns.index(\'exonboundaries\')] == \'Y\'\n+      read_through = fields[columns.index(\'read_through\')] == \'Y\'\n+      interchromosomal = fields[columns.index(\'interchromosomal\')] == \'Y\'\n+      adjacent = fields[columns.index(\'adjacent\')] == \'Y\'\n+      altsplice = fields[columns.index(\'altsplice\')] == \'Y\'\n+      deletion = fields[columns.index(\'deletion\')] == \'Y\'\n+      eversion = fields[columns.index(\'eversion\')] == \'Y\'\n+      inversion = fields[columns.index(\'inversion\')] == \'Y\'\n+      span_count = int(fields[columns.index(\'span_count\')])\n+      splitr_count = int(fields[columns.index(\'splitr_count\')])\n+      splice_score = int(fields[columns.index(\'splice_score\')])\n+      probability = fields[columns.index(\'probability\')] if columns.index(\'probability\') else \'.\'\n+      splitr_sequence = fields[columns.index(\'splitr_sequence\')]\n+      split_seqs = splitr_sequence.split(\'|\')\n+      mate_id1 = "bnd_%s_1" % cluster_id\n+      mate_id2 = "bnd_%s_2" % cluster_id\n+      ref1 = split_seqs[0][-1]\n+      ref2 = split_seqs[1][0]\n+      b1 = \'[\' if genomic_strand1 == \'+\' else \']\'\n+      b2 = \'[\' if genomic_strand2 == \'+\' else \']\'\n+      alt1 = "%s%s%s:%d%s" %  (ref1,b2,gene_chromosome2,genomic_break_pos2,b2) \n+      alt2 = "%s%s:%d%s%s" %  (b1,gene_chromosome1,genomic_break_pos1,b1,ref2) \n+      #TODO evaluate what should be included in the INFO field\n+      info = [\'DP=%d\' % (span_count + splitr_count),\'SPLITCNT=%d\' % splitr_count,\'SPANCNT=%d\' % span_count,gene_name_info,gene_info,gene_loc,expr,homlen,\'SPLICESCORE=%d\' % splice_score]\n+      if orf:\n+        info.append(\'ORF\')\n+      if exonboundaries:\n+        info.append(\'EXONBND\')\n+      if interchromosomal:\n+        info.append(\'INTERCHROM\')\n+      if read_through:\n+        info.append(\'READTHROUGH\')\n+      if adjacent:\n+        info.append(\'ADJACENT\')\n+      if altsplice:\n+        info.append(\'ALTSPLICE\')\n+      if deletion:\n+        info.append(\'DELETION\')\n+      if eversion:\n+        info.append(\'EVERSION\')\n+      if inversion:\n+        info.append(\'INVERSION\')\n+      info1 = [svtype,\'MATEID=%s;MATELOC=%s:%d\' % (mate_id2,gene_chromosome2,genomic_break_pos2)] + info\n+      info2 = [svtype,\'MATEID=%s;MATELOC=%s:%d\' % (mate_id1,gene_chromosome1,genomic_break_pos1)] + info\n+      qual = int(float(fields[columns.index(\'probability\')]) * 255) if columns.index(\'probability\') else \'.\'\n+      vcf1 = \'%s\\t%d\\t%s\\t%s\\t%s\\t%s\\t%s\\t%s\'% (gene_chromosome1,genomic_break_pos1, mate_id1, ref1, alt1, qual, filt, \';\'.join(info1) )\n+      vcf2 = \'%s\\t%d\\t%s\\t%s\\t%s\\t%s\\t%s\\t%s\'% (gene_chromosome2,genomic_break_pos2, mate_id2, ref2, alt2, qual, filt, \';\'.join(info2) )\n+      add_vcf_line(gene_chromosome1,genomic_break_pos1,mate_id1,vcf1)\n+      add_vcf_line(gene_chromosome2,genomic_break_pos2,mate_id2,vcf2)\n+    write_vcf()\n+  except Exception, e:\n+    print >> sys.stderr, "failed: %s" % e\n+    sys.exit(1)\n+\n+if __name__ == "__main__" : __main__()\n+\n'
b
diff -r f65857c1b92e -r b22f8634ff84 defuse_results_to_vcf.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/defuse_results_to_vcf.xml Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,34 @@
+<?xml version="1.0"?>
+<tool id="defuse_results_to_vcf" name="Defuse Results to VCF" version="0.6.1">
+  <description>generate a VCF from a DeFuse Results file</description>
+  <requirements>
+    <requirement type="package" version="0.6.1">defuse</requirement>
+  </requirements>
+  <command interpreter="python">defuse_results_to_vcf.py  --input $defuse_results --reference ${defuse_results.metadata.dbkey} --output $vcf
+  </command>
+  <inputs>
+    <param name="defuse_results" type="data" format="defuse.results.tsv" label="Defuse Results file"/> 
+  </inputs>
+  <stdio>
+    <exit_code range="1:" level="fatal" description="Error" />
+  </stdio>
+  <outputs>
+    <data name="vcf" metadata_source="defuse_results" format="vcf"/>
+  </outputs>
+  <tests>
+    <test>
+      <param name="defuse_results" value="mm10_results.filtered.tsv" ftype="defuse.results.tsv" dbkey="mm10"/>
+      <output name="vcf" file="mm10_results.filtered.vcf"/>
+    </test>
+  </tests>
+  <help>
+**Defuse Results to VCF**
+
+Generates a VCF_ Variant Call Format file from a DeFuse_ results.tsv file.   
+
+This program relies on the header line of the results.tsv to determine which columns to use for genrating the VCF file.   
+
+.. _VCF: http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
+.. _DeFuse: http://sourceforge.net/apps/mediawiki/defuse
+  </help>
+</tool>
b
diff -r f65857c1b92e -r b22f8634ff84 defuse_trinity_analysis.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/defuse_trinity_analysis.py Sun Jan 17 14:11:06 2016 -0500
[
b'@@ -0,0 +1,466 @@\n+#!/usr/bin/env python\n+"""\n+#\n+#------------------------------------------------------------------------------\n+#                         University of Minnesota\n+#         Copyright 2014, Regents of the University of Minnesota\n+#------------------------------------------------------------------------------\n+# Author:\n+#\n+#  James E Johnson\n+#\n+#------------------------------------------------------------------------------\n+"""\n+\n+\n+"""\n+This tool takes the defuse results.tsv  tab-delimited file, trinity \n+and creates a tabular report\n+\n+Would it be possible to create 2 additional files from the deFuse-Trinity comparison program.  \n+One containing all the Trinity records matched to deFuse records (with the deFuse ID number), \n+and the other with the ORFs records matching back to the Trinity records in the first files?\n+\n+M045_Report.csv\n+"","deFuse_subset.count","deFuse.gene_name1","deFuse.gene_name2","deFuse.span_count","deFuse.probability","deFuse.gene_chromosome1","deFuse.gene_location1","deFuse.gene_chromosome2","deFuse.gene_location2","deFuse_subset.type"\n+"1",1,"Rps6","Dennd4c",7,0.814853504,"4","coding","4","coding","TIC  "\n+\n+\n+\n+OS03_Matched_Rev.csv\n+"count","gene1","gene2","breakpoint","fusion","Trinity_transcript_ID","Trinity_transcript","ID1","protein"\n+\n+"","deFuse.splitr_sequence","deFuse.gene_chromosome1","deFuse.gene_chromosome2","deFuse.gene_location1","deFuse.gene_location2","deFuse.gene_name1","deFuse.gene_name2","deFuse.span_count","deFuse.probability","word1","word2","fusion_part_1","fusion_part_2","fusion_point","fusion_point_rc","count","transcript"\n+\n+"""\n+\n+import sys,re,os.path,math\n+import textwrap\n+import optparse\n+from optparse import OptionParser\n+\n+revcompl = lambda x: \'\'.join([{\'A\':\'T\',\'C\':\'G\',\'G\':\'C\',\'T\':\'A\',\'a\':\'t\',\'c\':\'g\',\'g\':\'c\',\'t\':\'a\',\'N\':\'N\',\'n\':\'n\'}[B] for B in x][::-1])\n+\n+codon_map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",\n+    "UCU":"S", "UCC":"S", "UCA":"S", "UCG":"S",\n+    "UAU":"Y", "UAC":"Y", "UAA":"*", "UAG":"*",\n+    "UGU":"C", "UGC":"C", "UGA":"*", "UGG":"W",\n+    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",\n+    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",\n+    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",\n+    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",\n+    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",\n+    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",\n+    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",\n+    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",\n+    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",\n+    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",\n+    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",\n+    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}\n+\n+def translate(seq) :\n+  rna = seq.upper().replace(\'T\',\'U\')\n+  aa = []\n+  for i in range(0,len(rna) - 2, 3):\n+    codon = rna[i:i+3]\n+    aa.append(codon_map[codon] if codon in codon_map else \'X\')\n+  return \'\'.join(aa)\n+\n+def get_stop_codons(seq) :\n+  rna = seq.upper().replace(\'T\',\'U\')\n+  stop_codons = []\n+  for i in range(0,len(rna) - 2, 3):\n+    codon = rna[i:i+3]\n+    aa = codon_map[codon] if codon in codon_map else \'X\'\n+    if aa == \'*\':\n+      stop_codons.append(codon)\n+  return stop_codons\n+\n+def read_fasta(fp):\n+    name, seq = None, []\n+    for line in fp:\n+        line = line.rstrip()\n+        if line.startswith(">"):\n+            if name: yield (name, \'\'.join(seq))\n+            name, seq = line, []\n+        else:\n+            seq.append(line)\n+    if name: yield (name, \'\'.join(seq))\n+\n+\n+def test_rcomplement(seq, target):\n+  try:\n+    comp = revcompl(seq)\n+    return comp in target\n+  except:\n+    pass\n+  return False\n+\n+def test_reverse(seq,target):\n+  return options.test_reverse and seq and seq[::-1] in target\n+\n+def cmp_alphanumeric(s1,s2):\n+  if s1 == s2:\n+    return 0\n+  a1 = re.findall("\\d+|[a-zA-Z]+",s1)\n+  a2 = re.findall("\\d+|[a-zA-Z]+",s2)\n+  for i in range(min(len(a1),len(a2))):\n+    if a1[i] == a2[i]:\n+      continue\n+    if a1[i].isdigit() and a2[i].isdigit():\n+      return int(a1[i]) - int(a2[i]'..b'#fusion_id","cluster_id","gene1","gene2","breakpoint","fusion","Trinity_transcript_ID","Trinity_transcript","Trinity_ORF_Transcript","Trinity_ORF_ID","protein","stop_codons"])\n+    for i,fusion in enumerate(fusions):\n+      if len(fusion[\'transcripts\']) > 0:\n+        for tx_id in fusion[\'transcripts\'].keys():\n+          if tx_id in transcript_orfs:\n+            for orf_dict in transcript_orfs[tx_id]: \n+              if \'tx_seq\' not in orf_dict:\n+                print >> sys.stderr, "orf_dict %s" % orf_dict\n+              #fields = [str(fusion[\'ordinal\']),str(fusion[\'cluster_id\']),fusion[\'gene_name1\'],fusion[\'gene_name2\'],fusion[\'fwd_seq\'],fusion[\'splitr_sequence\'],tx_id, fusion[\'transcripts\'][tx_id][\'seq1\']+\'|\'+fusion[\'transcripts\'][tx_id][\'seq2\'],orf_dict[\'tx_seq\'],orf_dict[\'orf_id\'],orf_dict[\'seq\'],orf_dict[\'read_thru_pep\'],orf_dict[\'stop_codons\']]\n+              fields = [str(fusion[\'ordinal\']),str(fusion[\'cluster_id\']),fusion[\'gene_name1\'],fusion[\'gene_name2\'],fusion[\'fwd_seq\'],fusion[\'splitr_sequence\'],tx_id, fusion[\'transcripts\'][tx_id][\'seq1\']+\'|\'+fusion[\'transcripts\'][tx_id][\'seq2\'],orf_dict[\'tx_seq\'],orf_dict[\'orf_id\'],orf_dict[\'read_thru_pep\'],orf_dict[\'stop_codons\']]\n+              print >> outputMatchFile, \'\\t\'.join(fields)\n+    outputMatchFile.close()\n+  if options.transcripts and options.transcript_alignment: \n+    if outputTxFile:\n+      id_fields = [\'gene_name1\',\'alignments1\',\'gene_name2\',\'alignments2\',\'span_count\',\'probability\',\'gene_chromosome1\',\'gene_location1\',\'gene_chromosome2\',\'gene_location2\',\'fusion_type\',\'Transcript\',\'Protein\',\'flags\']\n+      fa_width = 80\n+      for i,fusion in enumerate(fusions):\n+        if len(fusion[\'transcripts\']) > 0:\n+          alignments1 = "%s%s%s" % (fusion[\'genomic_strand1\'], fusion[\'gene_strand1\'], fusion[\'gene_align_strand1\'])\n+          alignments2 = "%s%s%s" % (fusion[\'genomic_strand2\'], fusion[\'gene_strand2\'], fusion[\'gene_align_strand2\'])\n+          alignments = "%s%s%s %s%s%s" % (fusion[\'genomic_strand1\'], fusion[\'gene_strand1\'], fusion[\'gene_align_strand1\'], fusion[\'genomic_strand2\'], fusion[\'gene_strand2\'], fusion[\'gene_align_strand2\'])\n+          fusion_id = "%s (%s) %s" % (i + 1,alignments,\' \'.join([str(fusion[x]) for x in report_fields]))\n+          for tx_id in fusion[\'transcripts\'].keys():\n+            m1 = fusion[\'transcripts\'][tx_id][\'match1\']\n+            f_seq1 = fusion[\'split_seqs\'][0][:-m1].lower() +  fusion[\'split_seqs\'][0][-m1:]\n+            t_seq1 = fusion[\'transcripts\'][tx_id][\'seq1\'][:-m1].lower() + fusion[\'transcripts\'][tx_id][\'seq1\'][-m1:]\n+            if len(f_seq1) > len(t_seq1):\n+              t_seq1 = t_seq1.rjust(len(f_seq1),\'.\')\n+            elif len(f_seq1) < len(t_seq1):\n+              f_seq1 = f_seq1.rjust(len(t_seq1),\'.\')\n+            m2 = fusion[\'transcripts\'][tx_id][\'match2\']\n+            f_seq2 = fusion[\'split_seqs\'][1][:m2] +  fusion[\'split_seqs\'][1][m2:].lower()\n+            t_seq2 = fusion[\'transcripts\'][tx_id][\'seq2\'][:m2] + fusion[\'transcripts\'][tx_id][\'seq2\'][m2:].lower()\n+            if len(f_seq2) > len(t_seq2):\n+              t_seq2 = t_seq2.ljust(len(f_seq2),\'.\')\n+            elif len(f_seq2) < len(t_seq2):\n+              f_seq2 = f_seq2.ljust(len(t_seq2),\'.\')\n+            print >> outputTxFile, ">%s\\n%s\\n%s" % (fusion_id,\'\\n\'.join(textwrap.wrap(f_seq1,fa_width)),\'\\n\'.join(textwrap.wrap(f_seq2,fa_width)))\n+            print >> outputTxFile, "%s bkpt:%d rev_compl:%s\\n%s\\n%s" % (fusion[\'transcripts\'][tx_id][\'full_id\'],fusion[\'transcripts\'][tx_id][\'bkpt\'],str(fusion[\'transcripts\'][tx_id][\'revcompl\']),\'\\n\'.join(textwrap.wrap(t_seq1,fa_width)),\'\\n\'.join(textwrap.wrap(t_seq2,fa_width)))\n+  """\n+  if options.peptides and options.orf_alignment: \n+    pass\n+  """\n+  print >> outputFile,"%s\\t%s" % (\'#\',\'\\t\'.join([report_colnames[x] for x in report_fields]))\n+  for i,fusion in enumerate(fusions): \n+    print >> outputFile,"%s\\t%s" % (i + 1,\'\\t\'.join([str(fusion[x]) for x in report_fields]))\n+\n+if __name__ == "__main__" : __main__()\n+\n'
b
diff -r f65857c1b92e -r b22f8634ff84 defuse_trinity_analysis.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/defuse_trinity_analysis.xml Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,55 @@
+<?xml version="1.0"?>
+<tool id="defuse_trinity_analysis" name="Defuse Trinity" version="0.6.1">
+  <description>verify fusions with trinity</description>
+  <stdio>
+    <exit_code range="1:" level="fatal" description="Error" />
+  </stdio>
+  <command interpreter="python">defuse_trinity_analysis.py --input $defuse_results --transcripts $trinity_transcripts --peptides $trinity_orfs 
+  --nbases $nbases --min_pep_len $min_pep_len --ticdist $ticdist --readthrough=$readthrough
+  #if 'matched' in str($outputs).split(','):
+    --matched="$matched_output"
+  #end if  
+  #if 'aligned' in str($outputs).split(','):
+    --transcript_alignment="$aligned_output"
+  #end if  
+  --output $output 
+  </command>
+  <inputs>
+    <param name="defuse_results" type="data" format="defuse.results.tsv" label="Defuse Results file"/> 
+    <param name="trinity_transcripts" type="data" format="fasta" label="TrinityRNAseq: Assembled Transcripts"/> 
+    <param name="trinity_orfs" type="data" format="fasta" label="transcriptsToOrfs: Candidate Peptide Sequences"/> 
+    <param name="nbases" type="integer" value="12" min="1" label="Number of bases on either side of the fusion to compare"/> 
+    <param name="min_pep_len" type="integer" value="100" min="0" label="Minimum length of peptide to report"/> 
+    <param name="ticdist" type="integer" value="1000000" min="0" label="Maximum intrachromosomal distance to be classified a Transcription-induced chimera (TIC)"/> 
+    <param name="readthrough" type="integer" value="4" min="0" label="Number of stop_codons to read through"/> 
+    <param name="outputs" type="select" multiple="true" display="checkboxes" label="Additional outputs">
+      <option value="matched">Matched Fusions Trinity Tanscripts and ORFs Tabular</option>
+      <option value="aligned">Aligned Fusion and Trinity Transcipts Fasta</option>
+    </param>
+  </inputs>
+  <outputs>
+    <data name="matched_output" metadata_source="defuse_results" format="tabular" label="${tool.name} on ${on_string}: Fusions Trinity Matched ">
+      <filter>(outputs and 'matched' in outputs)</filter>
+    </data>
+    <data name="aligned_output" metadata_source="defuse_results" format="fasta" label="${tool.name} on ${on_string}: Fusion Trinity Sequences">
+      <filter>(outputs and 'aligned' in outputs)</filter>
+    </data>
+    <data name="output" metadata_source="defuse_results" format="tabular" label="${tool.name} on ${on_string}: Fusion Report"/>
+  </outputs>
+  <tests>
+    <test>
+      <param name="defuse_results" value="mm10_results.filtered.tsv" ftype="defuse.results.tsv" dbkey="mm10"/>
+      <output name="vcf" file="mm10_results.filtered.vcf"/>
+    </test>
+  </tests>
+  <help>
+**Defuse Results**
+
+Verifies DeFuse_ fusion predictions in results.tsv with TrinityRNAseq_ assembled transcripts and ORFs.   
+
+This program relies on the header line of the results.tsv to determine which columns to use for analysis.   
+
+.. _DeFuse: http://sourceforge.net/apps/mediawiki/defuse
+.. _TrinityRNAseq: http://trinityrnaseq.github.io/
+  </help>
+</tool>
b
diff -r f65857c1b92e -r b22f8634ff84 macros.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/macros.xml Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,28 @@
+<macros>
+    <token name="@DEFUSE_VERSION@">0.6.2</token>
+    <xml name="defuse_requirement">
+            <requirement type="package" version="@DEFUSE_VERSION@">defuse</requirement>
+    </xml>
+    <xml name="mapping_requirements">
+            <requirement type="package" version="0.1.19">samtools</requirement>
+            <requirement type="package" version="1.0.0">bowtie</requirement>
+            <requirement type="package" version="2013-05-09">gmap</requirement>
+            <requirement type="package" version="35x1">blat</requirement>
+    </xml>
+    <xml name="r_requirements">
+            <requirement type="package" version="3.1.2">R</requirement>
+            <requirement type="package" version="2.0.3">ada</requirement>
+    </xml>
+    <xml name="stdio">
+        <stdio>
+            <exit_code range=":-1"  level="fatal" description="Error: Cannot open file" />
+            <exit_code range="1:"  level="fatal" description="Error" />
+        </stdio>
+  </xml>
+  <xml name="citations">
+      <citations>
+        <citation type="doi">10.1371/journal.pcbi.1001138</citation>
+        <yield />
+      </citations>
+  </xml>
+</macros>
b
diff -r f65857c1b92e -r b22f8634ff84 test-data/mm10_results.filtered.tsv
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/mm10_results.filtered.tsv Sun Jan 17 14:11:06 2016 -0500
b
b'@@ -0,0 +1,46 @@\n+cluster_id\tsplitr_sequence\tsplitr_count\tsplitr_span_pvalue\tsplitr_pos_pvalue\tsplitr_min_pvalue\tadjacent\taltsplice\tbreak_adj_entropy1\tbreak_adj_entropy2\tbreak_adj_entropy_min\tbreakpoint_homology\tbreakseqs_estislands_percident\tcdna_breakseqs_percident\tdeletion\test_breakseqs_percident\teversion\texonboundaries\texpression1\texpression2\tgene1\tgene2\tgene_align_strand1\tgene_align_strand2\tgene_chromosome1\tgene_chromosome2\tgene_end1\tgene_end2\tgene_location1\tgene_location2\tgene_name1\tgene_name2\tgene_start1\tgene_start2\tgene_strand1\tgene_strand2\tgenome_breakseqs_percident\tgenomic_break_pos1\tgenomic_break_pos2\tgenomic_strand1\tgenomic_strand2\tinterchromosomal\tinterrupted_index1\tinterrupted_index2\tinversion\tlibrary_name\tmax_map_count\tmax_repeat_proportion\tmean_map_count\tmin_map_count\tnum_multi_map\tnum_splice_variants\torf\tread_through\trepeat_proportion1\trepeat_proportion2\tspan_count\tspan_coverage1\tspan_coverage2\tspan_coverage_max\tspan_coverage_min\tsplice_score\tsplicing_index1\tsplicing_index2\tprobability\n+8647\tGTGCTCCTGCTGCCAGGCGCAGCTGGGCGACATTGGCACGTCCTGTTACACCAAGAGCGGCATGATCCTTTGCAGAAATGACTACATTAGGT|GAAGATTGTAAAAAATTGACATCAGAAATATTTACAGAAATAGATACCTGTTTGAATAAAGTTAGAGATGAAATTTTTGCTAAACTTCAACCGAAGCTTAGATGCACATTAGGTGACATGGAAAGTCCTGTGTTTGCACTTCCTG\t4\t0.849232794977309\t0.875860929877954\t0.794775556794258\tN\tN\t3.72551845106187\t3.02448896101185\t3.02448896101185\t1\t0.0366733649981732\t0\tN\t0\tN\tY\t2482\t2085\tENSMUSG00000028266\tENSMUSG00000041264\t+\t-\t3\t5\t144205220\t149215434\tcoding\tcoding\tLmo4\tUspl1\t144188530\t149184350\t-\t+\t0\t144201813\t149198645\t-\t-\tY\t-\t-\tN\tdataset_6344_files\t1\t0\t1\t1\t0\t1\tN\tN\t0\t0\t5\t0.831776131589327\t0.982288003019777\t0.982288003019777\t0.831776131589327\t4\t-\t-\t0.950339354539546\n+12095\tCTTCCAGGGTCCCCCGAGCCTAATGGATGCCGAGACAGACGAGGGCATGGACTATACAGGCTGTAGCCCTGGAGCGGCGTCCTCAGAGTCTTCCACCATGGACCGTAGCTGTTCCAGCACCC|CTGGCCCTTGACATCTAGCACCCCTTCACCCTCTTCCTGGGGACCCAGCAGGTGGTATGTGGCCGTGGAGCCCTCCGGGCTGTGGCTGTCCTTCCCAGGAGAGGATGACGTAGACTCGTTGCTGACAGGGGAGATGTCACTGCTGC\t6\t0.869976758916331\t0.907802910133282\t0.774396849342807\tY\tY\t3.64400585547602\t3.28189243439864\t3.28189243439864\t0\t0.983667499542934\t0\tY\t0\tN\tN\t0\t1453\tENSMUSG00000086606\tENSMUSG00000028975\t-\t+\t4\t4\t148948914\t149099876\tintron\tcoding\tGm13205\tPex14\t148947492\t148960535\t-\t-\t0\t148948907\t148961548\t+\t-\tN\t-\t-\tN\tdataset_6344_files\t1\t0\t1\t1\t0\t1\tN\tY\t0\t0\t5\t0.760481034595956\t1.02189639023832\t1.02189639023832\t0.760481034595956\t4\t-\t-\t0.976704594819493\n+4068\tCGACACGCCGGGGCTGGCTGAGGAAAACAAAACGAAGCCCCTGGAGACCCGGCTTATCTCAGAGACCACAGCTATTTGCAAACCAGAGCAGGTGGCCAAACAAATTGTCAAAGATGCCATA|TACCGTTCCCCAGCTGAAGAGTTCTGAATCCACGCCGGATCCTTCTCAACAGTCTGTTTTACGGGAACTTTTATTAACCACTCCTTCCCCGTGATGCAGTTCTGAATCCTCCCTGTAGCAGGGGGTCTTCACTCATGCCTGAAGATGTTTCTTTTCC\t8\t1\t0.541521196503195\t0.26082452496772\tN\tY\t3.46128890676658\t3.81976875220098\t3.46128890676658\t2\t0\t0\tN\t0.794128973014604\tN\tN\t775\t35409\tENSMUSG00000009905\tENSMUSG00000022816\t+\t-\t1\t16\t106759742\t37836514\tcoding\tintron\tKdsr\tFstl1\t106720410\t37776873\t-\t+\t0.00358422939068104\t106734547\t37799068\t-\t-\tY\t-\t-\tN\tdataset_6344_files\t4\t1\t1.81818181818182\t1\t4\t1\tN\tN\t0\t1\t11\t0.974366325576069\t1.17240826166877\t1.17240826166877\t0.974366325576069\t4\t-\t-\t0.913877182950088\n+12868\tGCGGTCTCGGCTCCAGCGGCAGTAGCAGCGGCGCCGGTCCCGTGTGCAGGAGCTCCTTTGCGGCCCAGTTTCTTGGCCATCGCCTGCTCTCCCCACAGCGCCAGGACGAGTCCCGTGCGCGTCCGTCCGCGGAGGTCTTTCTCATCTCGCTCGGCTGCGGGAAATCGGGCTGAAGCGACTGAGTCCGCGATGGAGA|AAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGAGAGAAAAGGAAAACTT\t69\t1\t0.439895614069599\t0.332872660216425\tN\tY\t3.30124852419771\t2.96787791690762\t2.96787791690762\t0\t0.0997837503861599\t0.00401606425702794\tN\t0.540315106580167\tN\tN\t41631\t0\tENSMUSG00000004980\tENSMUSG00000085456\t+\t+\t6\t10\t51469894\t73327027\tcoding\tintron\tHnrnpa2b1\tGm15398\t51460434\t73201399\t-\t-\t0.0997837503861599\t51467295\t73201702\t-\t-\tY\t-\t-\tN\tdataset_6344_files\t1\t0\t1\t1\t0\t1\tN\tN\t0\t0\t41\t1.14864322933764\t0.720872647377417\t1.14864322933764\t0.720872647377417\t1\t-\t-\t0.520098627306064\n+5160\tTACGGATTCATTCAGTGTTCAGAACGGCAAGCTAGACTTTTCTTCCACTGTTCACAATATAATGGCAACCTCCAAGACTTAAAAGTAGGAGATGATGTTGAATTTGAAGTATCA'..b'201612903226\tN\tY\t23545\t2145\tENSMUSG00000073411\tENSMUSG00000035929\t+\t-\t17\t17\t35267499\t35385290\tcoding\tutr3p\tH2-D1\tH2-Q4\t35262730\t35379617\t+\t+\t0\t35266514\t35383995\t+\t-\tN\t-\t-\tN\tdataset_6344_files\t4\t0.22972972972973\t2.57142857142857\t2\t7\t1\tN\tN\t0\t0.22972972972973\t7\t0.562439098503259\t1.03773974512573\t1.03773974512573\t0.562439098503259\t4\t-\t-\t0.85501289117385\n+9158\tCCGAATTTCAACCTCCTTATCAACAGTGGGATCTTCAAAGAGTTGTACCCTGAAGTTGCTCTTTCTCAGTGGAGTCCCACACTCAGGACAGTTTCCAGCTCCTCTTACAAACAGTAAGTCCACACAACTCTCACAC|CTTCCCCACCAGCCTGGTCCGGCTGCCCACCTCTCCCCGCCCCCCACCTCGCTTCCCTACCGGGGTGGTAGGGGGGACGACGGTGGCAACGAGCGGGCGGGGGATCCTCCC\t5\t1\t0.770112168741176\t0.964592797110167\tY\tY\t3.1490579347374\t3.11232376639641\t3.11232376639641\t0\t0\t0\tY\t0\tN\tN\t1882\t677\tENSMUSG00000021103\tENSMUSG00000034460\t-\t-\t12\t12\t73273988\t73114037\tcoding\tintron\tMnat1\tSix4\t73123717\t73099609\t+\t-\t0\t73168000\t73112254\t-\t+\tN\t-\t-\tN\tdataset_6344_files\t1\t0\t1\t1\t0\t1\tN\tY\t0\t0\t5\t1.0060530353509\t0.78424606692708\t1.0060530353509\t0.78424606692708\t4\t-\t-\t0.934413835802699\n+1851\tGCTGCAGGTGGGCTTATTCTACCATTGCTACTGTTCTGTCTTTGAAGATGTTCTTTATAGCATACTGAACACATGCCATTTGTACGAGGGTTTCCATAAAATCCGCAGCCAGTGGAGCAAAGCATAGGTGCTTGGCTGTGATTAGTTTCTTGAGCCATGTTCTTCTGTTGCA|CACCTGGCGGCGGCCCTCTCCGCCGGACGCGCTGCCGCCGCCGCCTCTCGCCGCCGCTGAGAGTGAGGACAGGTGAGGCCGCCAAACCCCCACTCGCTCCCGGCCCGCCGCCGCCGGCCCTCCGTCCGC\t21\t0.826443478467522\t0.547250352165575\t0.543160934834641\tY\tY\t3.38908332958226\t2.95732075160099\t2.95732075160099\t0\t0\t0\tY\t0.992273730684327\tN\tN\t1944\t1\tENSMUSG00000030629\tENSMUSG00000085236\t-\t-\t7\t7\t84679361\t84776549\tutr5p\tintron\tZfand6\t2610206C17Rik\t84615054\t84689640\t-\t+\t0\t84634406\t84689743\t+\t-\tN\t-\t-\tN\tdataset_6344_files\t2\t0.159420289855072\t1.04878048780488\t1\t2\t1\tN\tY\t0\t0.159420289855072\t41\t1.14864322933764\t1.09319148723169\t1.14864322933764\t1.09319148723169\t1\t-\t-\t0.715005473321084\n+9962\tCTGCGGCCCGCCGGGTCCCGGAGCCCACTGCCCCAGCACCCCGCGCTCGGCGCCCGCAGACGGCGCGGACCTCAGCGCGCACTTATGGGCTCGTTACCAGGACATGCGGAGGCTGGTGCACG|ACCTTCTGCCCCCTGAGGTCTGCAGCCTCCTAAACCCAGCAGCTATTTATGCCAACAATGAGATCAGCCTGAGTGACGTCGAAGTCTATGGCTTTGACTACGACTACACGCTGGCCCAGTATGCGGATGCACTGCACCCTGAGATCTTCAATGCTGCCCGGGACATCTTGATAGAGC\t93\t0.0971360740885943\t0.213119196903586\t0.467950871740144\tY\tY\t3.71720464963688\t3.27448372166755\t3.27448372166755\t0\t0.983661202185792\t0\tY\t0.991830601092896\tN\tN\t211\t4122\tENSMUSG00000058351\tENSMUSG00000071547\t-\t-\t14\t14\t31128930\t31139121\tupstream\tupstream\t2010107H07Rik\tNt5dc2\t31088869\t31134853\t-\t+\t0\t31131376\t31134739\t+\t-\tN\t-\t-\tN\tdataset_6344_files\t1\t0\t1\t1\t0\t1\tN\tY\t0\t0\t33\t1.1011131646754\t0.808011099258204\t1.1011131646754\t0.808011099258204\t4\t-\t-\t0.743419539119423\n+12549\tTTGGAGATGCCAGTACCATGAGATGACCACCAAGAGCAGCAGCAGCAGTGGAGTACAGGCATCTGGAGCCTAGAGGATGACACATGTGCTACAAAGGGCCTGGCTGGAGAAGTGACCCAAGCCCTTGGAGGAGCCCAGAAGATC|GTCCATCCTGATAAAAATCACCATCCCCGGGCTGAGGAGGCCTTCAAAATTTTGCGGGCAGCTTGGGACATTGTCAGCAACCCAGAGAGGCGGAAGGAATATGAGATGAAACGGAT\t18\t1\t0.96278392593137\t0.394484707065412\tN\tY\t3.2808195118354\t3.46448140203209\t3.2808195118354\t1\t0.965649359228432\t0\tN\t0.974237019421324\tN\tY\t1732\t2437\tENSMUSG00000039307\tENSMUSG00000025354\t+\t-\t11\t10\t121222655\t128819446\tutr5p\tcoding\tHexdc\tDnajc14\t121204433\t128805676\t+\t+\t0\t121206748\t128814038\t+\t-\tY\t-\t-\tN\tdataset_6344_files\t3\t0.993103448275862\t1.72222222222222\t1\t11\t1\tN\tN\t0.993103448275862\t0\t18\t1.14864322933764\t0.879306196251575\t1.14864322933764\t0.879306196251575\t4\t-\t-\t0.927135541379163\n+3179\tCTGGAGCAGTCCCCGTGACGCCGGGTGGCGACTGGCTCCCGGGTCTGAGGGGCTTCTGCTTGTCAGGTTCT|AGATATGTGCTGACTAGCAGGCTCACGTGCACAGTGTGGAGGATAAGCTATATCTTACAAAATGGGATTTGGGAGTGACCTGAAGAACTCACAGGAAGCTGTGTTAAAGTTGCAAGACTGGGAACTACGGTTGCTGGAGACAGTGAAGAAATTTATGGCTCTGAG\t10\t1\t0.557528435226891\t0.364482194613623\tY\tY\t3.15741158895574\t3.54759612884129\t3.15741158895574\t0\t0.985974921257503\t0\tY\t0.873774291317525\tN\tN\t10\t1325\tENSMUSG00000045506\tENSMUSG00000000127\t-\t-\t17\t17\t63863791\t64139494\tupstream\tupstream\tA930002H24Rik\tFer\t63863300\t63896018\t-\t+\t0\t63864053\t63896016\t+\t-\tN\t-\t-\tN\tdataset_6344_files\t1\t0\t1\t1\t0\t1\tN\tY\t0\t0\t7\t0.499065678953596\t1.12487819700652\t1.12487819700652\t0.499065678953596\t3\t-\t-\t0.767822892174251\n'
b
diff -r f65857c1b92e -r b22f8634ff84 test-data/mm10_results.filtered.vcf
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/mm10_results.filtered.vcf Sun Jan 17 14:11:06 2016 -0500
[
b'@@ -0,0 +1,115 @@\n+##fileformat=VCFv4.1\n+##source=defuse\n+##reference=mm10\n+##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">\n+##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">\n+##INFO=<ID=MATEID,Number=1,Type=String,Description="ID of the BND mate">\n+##INFO=<ID=DP,Number=1,Type=Integer,Description="Read Depth of segment containing breakend">\n+##INFO=<ID=SPLITCNT,Number=1,Type=Integer,Description="number of split reads supporting the prediction">\n+##INFO=<ID=SPANCNT,Number=1,Type=Integer,Description="number of spanning reads supporting the fusion">\n+##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">\n+##INFO=<ID=SPLICESCORE,Number=1,Type=Integer,Description="number of nucleotides similar to GTAG at fusion splice">\n+##INFO=<ID=GENE,Number=2,Type=String,Description="Gene Names at each breakend">\n+##INFO=<ID=GENEID,Number=2,Type=String,Description="Gene IDs at each breakend">\n+##INFO=<ID=GENELOC,Number=2,Type=String,Description="location of breakpoint releative to genes">\n+##INFO=<ID=EXPR,Number=2,Type=Integer,Description="expression of genes as number of concordant pairs aligned to exons">\n+##INFO=<ID=ORF,Number=0,Type=Flag,Description="fusion combines genes in a way that preserves a reading frame">\n+##INFO=<ID=EXONBND,Number=0,Type=Flag,Description="fusion splice at exon boundaries">\n+##INFO=<ID=INTERCHROM,Number=0,Type=Flag,Description="fusion produced by an interchromosomal translocation">\n+##INFO=<ID=READTHROUGH,Number=0,Type=Flag,Description="fusion involving adjacent potentially resulting from co-transcription rather than genome rearrangement">\n+##INFO=<ID=ADJACENT,Number=0,Type=Flag,Description="fusion between adjacent genes">\n+##INFO=<ID=ALTSPLICE,Number=0,Type=Flag,Description="fusion likely the product of alternative splicing between adjacent genes">\n+##INFO=<ID=DELETION,Number=0,Type=Flag,Description="fusion produced by a genomic deletion">\n+##INFO=<ID=EVERSION,Number=0,Type=Flag,Description="fusion produced by a genomic eversion">\n+##INFO=<ID=INVERSION,Number=0,Type=Flag,Description="fusion produced by a genomic inversion">\n+#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\n+1\t106734547\tbnd_4068_1\tA\tA]16:37799068]\t233\tPASS\tSVTYPE=BND;MATEID=bnd_4068_2;DP=19;SPLITCNT=8;SPANCNT=11;GENE=Kdsr,Fstl1;GENEID=ENSMUSG00000009905,ENSMUSG00000022816;GENELOC=coding,intron;EXPR=775,35409;HOMLEN=2;SPLICESCORE=4;INTERCHROM;ALTSPLICE\n+1\t127753955\tbnd_3783_1\tT\tT]1:127773930]\t181\tPASS\tSVTYPE=BND;MATEID=bnd_3783_2;DP=8;SPLITCNT=2;SPANCNT=6;GENE=Acmsd,Ccnt2;GENEID=ENSMUSG00000026348,ENSMUSG00000026349;GENELOC=intron,upstream;EXPR=0,752;HOMLEN=0;SPLICESCORE=3;READTHROUGH;ADJACENT;ALTSPLICE;DELETION\n+1\t127773930\tbnd_3783_2\tT\t[1:127753955[T\t181\tPASS\tSVTYPE=BND;MATEID=bnd_3783_1;DP=8;SPLITCNT=2;SPANCNT=6;GENE=Acmsd,Ccnt2;GENEID=ENSMUSG00000026348,ENSMUSG00000026349;GENELOC=intron,upstream;EXPR=0,752;HOMLEN=0;SPLICESCORE=3;READTHROUGH;ADJACENT;ALTSPLICE;DELETION\n+1\t161036581\tbnd_5020_2\tA\t[5:79638362[A\t208\tPASS\tSVTYPE=BND;MATEID=bnd_5020_1;DP=36;SPLITCNT=27;SPANCNT=9;GENE=7SK,Gas5;GENEID=ENSMUSG00000088422,ENSMUSG00000053332;GENELOC=downstream,intron;EXPR=0,0;HOMLEN=0;SPLICESCORE=1;INTERCHROM;ALTSPLICE\n+1\t161036831\tbnd_4912_2\tG\t]5:79638354]G\t151\tPASS\tSVTYPE=BND;MATEID=bnd_4912_1;DP=20;SPLITCNT=15;SPANCNT=5;GENE=7SK,Gas5;GENEID=ENSMUSG00000088422,ENSMUSG00000053332;GENELOC=downstream,intron;EXPR=0,0;HOMLEN=0;SPLICESCORE=1;INTERCHROM;ALTSPLICE\n+1\t168121480\tbnd_3839_2\tG\t]1:168183530]G\t249\tPASS\tSVTYPE=BND;MATEID=bnd_3839_1;DP=29;SPLITCNT=13;SPANCNT=16;GENE=Pbx1,Gm20711;GENEID=ENSMUSG00000052534,ENSMUSG00000093538;GENELOC=coding,intron;EXPR=3895,0;HOMLEN=0;SPLICESCORE=4;READTHROUGH;ADJACENT;ALTSPLICE;DELETION\n+1\t168183530\tbnd_3839_1\tT\tT[1:168121480[\t249\tPASS\tSVTYPE=BND;MATEID=bnd_3839_2;DP=29;SPLITCNT=13;SPANCNT=16;GENE=Pbx1,Gm20711;GENEID=ENSMUSG00000052534,ENSMUSG00000093538;GENELOC=coding,intron;EXPR=389'..b'NCNT=10;GENE=Pik3c3,Olfr166;GENEID=ENSMUSG00000033628,ENSMUSG00000056822;GENELOC=coding,downstream;EXPR=1791,0;HOMLEN=0;SPLICESCORE=1;INTERCHROM\n+16\t37799068\tbnd_4068_2\tT\t]1:106734547]T\t233\tPASS\tSVTYPE=BND;MATEID=bnd_4068_1;DP=19;SPLITCNT=8;SPANCNT=11;GENE=Kdsr,Fstl1;GENEID=ENSMUSG00000009905,ENSMUSG00000022816;GENELOC=coding,intron;EXPR=775,35409;HOMLEN=2;SPLICESCORE=4;INTERCHROM;ALTSPLICE\n+17\t17395298\tbnd_3153_1\tC\tC]17:17411818]\t206\tPASS\tSVTYPE=BND;MATEID=bnd_3153_2;DP=41;SPLITCNT=16;SPANCNT=25;GENE=AC154200.1,Lix1;GENEID=ENSMUSG00000097379,ENSMUSG00000047786;GENELOC=intron,intron;EXPR=0,30;HOMLEN=0;SPLICESCORE=1;READTHROUGH;ADJACENT;ALTSPLICE;DELETION\n+17\t17411818\tbnd_3153_2\tT\t[17:17395298[T\t206\tPASS\tSVTYPE=BND;MATEID=bnd_3153_1;DP=41;SPLITCNT=16;SPANCNT=25;GENE=AC154200.1,Lix1;GENEID=ENSMUSG00000097379,ENSMUSG00000047786;GENELOC=intron,intron;EXPR=0,30;HOMLEN=0;SPLICESCORE=1;READTHROUGH;ADJACENT;ALTSPLICE;DELETION\n+17\t35266514\tbnd_3184_1\tG\tG]17:35383995]\t218\tPASS\tSVTYPE=BND;MATEID=bnd_3184_2;DP=47;SPLITCNT=40;SPANCNT=7;GENE=H2-D1,H2-Q4;GENEID=ENSMUSG00000073411,ENSMUSG00000035929;GENELOC=coding,utr3p;EXPR=23545,2145;HOMLEN=0;SPLICESCORE=4;EXONBND;ALTSPLICE;DELETION\n+17\t35383995\tbnd_3184_2\tA\t[17:35266514[A\t218\tPASS\tSVTYPE=BND;MATEID=bnd_3184_1;DP=47;SPLITCNT=40;SPANCNT=7;GENE=H2-D1,H2-Q4;GENEID=ENSMUSG00000073411,ENSMUSG00000035929;GENELOC=coding,utr3p;EXPR=23545,2145;HOMLEN=0;SPLICESCORE=4;EXONBND;ALTSPLICE;DELETION\n+17\t63864053\tbnd_3179_1\tT\tT]17:63896016]\t195\tPASS\tSVTYPE=BND;MATEID=bnd_3179_2;DP=17;SPLITCNT=10;SPANCNT=7;GENE=A930002H24Rik,Fer;GENEID=ENSMUSG00000045506,ENSMUSG00000000127;GENELOC=upstream,upstream;EXPR=10,1325;HOMLEN=0;SPLICESCORE=3;READTHROUGH;ADJACENT;ALTSPLICE;DELETION\n+17\t63896016\tbnd_3179_2\tA\t[17:63864053[A\t195\tPASS\tSVTYPE=BND;MATEID=bnd_3179_1;DP=17;SPLITCNT=10;SPANCNT=7;GENE=A930002H24Rik,Fer;GENEID=ENSMUSG00000045506,ENSMUSG00000000127;GENELOC=upstream,upstream;EXPR=10,1325;HOMLEN=0;SPLICESCORE=3;READTHROUGH;ADJACENT;ALTSPLICE;DELETION\n+18\t4198107\tbnd_5326_2\tG\t]2:165827729]G\t136\tPASS\tSVTYPE=BND;MATEID=bnd_5326_1;DP=20;SPLITCNT=7;SPANCNT=13;GENE=Zmynd8,Gm10557;GENEID=ENSMUSG00000039671,ENSMUSG00000073647;GENELOC=utr3p,downstream;EXPR=2963,0;HOMLEN=9;SPLICESCORE=1;INTERCHROM;ALTSPLICE\n+18\t28188917\tbnd_5160_2\tA\t[3:103040040[A\t213\tPASS\tSVTYPE=BND;MATEID=bnd_5160_1;DP=63;SPLITCNT=33;SPANCNT=30;GENE=Csde1,SNORA17;GENEID=ENSMUSG00000068823,ENSMUSG00000087940;GENELOC=coding,upstream;EXPR=10681,0;HOMLEN=91;SPLICESCORE=3;INTERCHROM;ALTSPLICE\n+18\t30311220\tbnd_7456_1\tG\tG]16:19493776]\t189\tPASS\tSVTYPE=BND;MATEID=bnd_7456_2;DP=19;SPLITCNT=9;SPANCNT=10;GENE=Pik3c3,Olfr166;GENEID=ENSMUSG00000033628,ENSMUSG00000056822;GENELOC=coding,downstream;EXPR=1791,0;HOMLEN=0;SPLICESCORE=1;INTERCHROM\n+18\t53932206\tbnd_5141_1\tT\tT[X:53418862[\t201\tPASS\tSVTYPE=BND;MATEID=bnd_5141_2;DP=26;SPLITCNT=9;SPANCNT=17;GENE=Csnk1g3,Gm14584;GENEID=ENSMUSG00000073563,ENSMUSG00000083798;GENELOC=coding,intron;EXPR=2227,0;HOMLEN=266;SPLICESCORE=2;INTERCHROM;ALTSPLICE\n+19\t37678397\tbnd_13800_1\tC\tC]19:37696729]\t204\tPASS\tSVTYPE=BND;MATEID=bnd_13800_2;DP=15;SPLITCNT=3;SPANCNT=12;GENE=Exoc6,Cyp26a1;GENEID=ENSMUSG00000053799,ENSMUSG00000024987;GENELOC=intron,upstream;EXPR=575,443;HOMLEN=0;SPLICESCORE=1;ALTSPLICE;DELETION\n+19\t37696729\tbnd_13800_2\tA\t[19:37678397[A\t204\tPASS\tSVTYPE=BND;MATEID=bnd_13800_1;DP=15;SPLITCNT=3;SPANCNT=12;GENE=Exoc6,Cyp26a1;GENEID=ENSMUSG00000053799,ENSMUSG00000024987;GENELOC=intron,upstream;EXPR=575,443;HOMLEN=0;SPLICESCORE=1;ALTSPLICE;DELETION\n+X\t37143984\tbnd_8690_2\tG\t]9:7872842]G\t132\tPASS\tSVTYPE=BND;MATEID=bnd_8690_1;DP=9;SPLITCNT=4;SPANCNT=5;GENE=Birc3,Nkap;GENEID=ENSMUSG00000032000,ENSMUSG00000016409;GENELOC=utr5p,intron;EXPR=4781,556;HOMLEN=39;SPLICESCORE=1;INTERCHROM;ALTSPLICE\n+X\t53418862\tbnd_5141_2\tA\t]18:53932206]A\t201\tPASS\tSVTYPE=BND;MATEID=bnd_5141_1;DP=26;SPLITCNT=9;SPANCNT=17;GENE=Csnk1g3,Gm14584;GENEID=ENSMUSG00000073563,ENSMUSG00000083798;GENELOC=coding,intron;EXPR=2227,0;HOMLEN=266;SPLICESCORE=2;INTERCHROM;ALTSPLICE\n'
b
diff -r f65857c1b92e -r b22f8634ff84 test-data/tophat_out2h.bam
b
Binary file test-data/tophat_out2h.bam has changed
b
diff -r f65857c1b92e -r b22f8634ff84 tool-data/defuse.loc.sample
--- a/tool-data/defuse.loc.sample Mon Jan 14 12:24:28 2013 -0600
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,11 +0,0 @@
-## Configurstion info for prepared data references for DeFuse
-## http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.4.2
-## 3 columns separated by the TAB character
-## The 3rd column has dictionary values that will be substituted in the config file for defuse
-## It should likely contain keys:   dataset_directory gene_models genome_fasta repeats_filename est_fasta est_alignments unigene_fasta
-## If this is not a Homo_sapiens reference also need keys:  gene_id_pattern transcript_id_pattern chromosomes
-
-#db_key name {'config_key':'config_value'}
-#hg19 GRCh37(hg19) {'gene_id_pattern':'ENSG\d+', 'transcript_id_pattern':'ENST\d+', 'dataset_directory':'/data/genomes/Hsapiens/hg19/defuse', 'gene_models':'$(dataset_directory)/Homo_sapiens.GRCh37.62.gtf', 'genome_fasta':'$(dataset_directory)/Homo_sapiens.GRCh37.62.dna.chromosome.fa', 'repeats_filename':'$(dataset_directory)/rmsk.txt', 'est_fasta':'$(dataset_directory)/est.fa', 'est_alignments':'$(dataset_directory)/intronEst.txt', 'unigene_fasta':'$(dataset_directory)/Hs.seq.uniq', 'chromosomes':'1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,MT', 'mt_chromosome':'MT', 'gene_sources':'IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding', 'ig_gene_sources':'IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,IG_pseudogene', 'rrna_gene_sources':'Mt_rRNA,rRNA,rRNA_pseudogene'}
-#mm9 NCBIM37(mm9) {'gene_id_pattern':'ENSMUSG\d+', 'transcript_id_pattern':'ENSMUST\d+', 'dataset_directory':'/data/genomes/Mmusculus/mm9/defuse', 'gene_models':'$(dataset_directory)/Mus_musculus.NCBIM37.63.gtf', 'genome_fasta':'$(dataset_directory)/Mus_musculus.NCBIM37.63.dna.chromosome.fa', 'repeats_filename':'$(dataset_directory)/rmsk.txt', 'est_fasta':'$(dataset_directory)/est.fa', 'est_alignments':'$(dataset_directory)/intronEst.txt', 'unigene_fasta':'$(dataset_directory)/Mm.seq.uniq', 'chromosomes':'1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X,Y,MT', 'mt_chromosome':'MT', 'gene_sources':'IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding', 'ig_gene_sources':'IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,IG_pseudogene', 'rrna_gene_sources':'Mt_rRNA,rRNA,rRNA_pseudogene'}
-#mm8 NCBIM36(mm8) {'gene_id_pattern':'ENSMUSG\d+', 'transcript_id_pattern':'ENSMUST\d+', 'dataset_directory':'/data/genomes/Mmusculus/mm9/defuse', 'gene_models':'$(dataset_directory)/Mus_musculus.NCBIM36.46.gtf', 'genome_fasta':'$(dataset_directory)/Mus_musculus.NCBIM36.46.dna.chromosome.fa', 'repeats_filename':'$(dataset_directory)/rmsk.txt', 'est_fasta':'$(dataset_directory)/est.fa', 'est_alignments':'$(dataset_directory)/intronEst.txt', 'unigene_fasta':'$(dataset_directory)/Mm.seq.uniq', 'mt_chromosome':'MT', 'gene_sources':'IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,processed_transcript,protein_coding', 'ig_gene_sources':'IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,IG_pseudogene', 'rrna_gene_sources':'Mt_rRNA,rRNA,rRNA_pseudogene'}
b
diff -r f65857c1b92e -r b22f8634ff84 tool-data/defuse_reference.loc.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool-data/defuse_reference.loc.sample Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,7 @@
+## Configurstion info for prepared data references for DeFuse
+## http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.0
+## 4 columns separated by the TAB character
+## The 4th column has the path to the defuse config.txt file, it needs to have the dataset_directory set the directory path where the defuse reference data resides.
+## The defuse galaxy tool  will substitute the directory path of config.txt if the dataset_directory property is not set '__DATASET_DIRECTORY__'
+#<unique_build_id>   <dbkey>   <display_name>   <file_base_path>
+GRCh37 GRCh37 Human GRCh37 (hg19) /depot/GRCh37/defuse/GRCh37.config
b
diff -r f65857c1b92e -r b22f8634ff84 tool_data_table_conf.xml.sample
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_data_table_conf.xml.sample Sun Jan 17 14:11:06 2016 -0500
b
@@ -0,0 +1,7 @@
+<tables>
+    <!-- Locations of all fasta files under genome directory -->
+    <table name="defuse_reference" comment_char="#">
+        <columns>value, dbkey, name, path</columns>
+        <file path="tool-data/defuse_reference.loc" />
+    </table>
+</tables>
b
diff -r f65857c1b92e -r b22f8634ff84 tool_dependencies.xml
--- a/tool_dependencies.xml Mon Jan 14 12:24:28 2013 -0600
+++ b/tool_dependencies.xml Sun Jan 17 14:11:06 2016 -0500
b
b'@@ -1,180 +1,24 @@\n <?xml version="1.0"?>\n <tool_dependency>\n-    <package name="defuse" version="0.6.0">\n-        <install version="1.0">\n-            <actions>\n-                <action type="download_by_url">http://sourceforge.net/projects/defuse/files/defuse/0.6/defuse-0.6.0.tar.gz</action>\n-                <action type="shell_command">cd tools &amp;&amp; make</action>\n-                <action type="move_directory_files">\n-                    <source_directory>.</source_directory>\n-                    <destination_directory>$INSTALL_DIR</destination_directory>\n-                </action>\n-                <action type="set_environment">\n-                    <environment_variable name="DEFUSE_PATH" action="set_to">$INSTALL_DIR</environment_variable>\n-                </action>\n-            </actions>\n-        </install>\n-        <readme>\n-deFuse code\n-To build the deFuse toolset you must have the boost c++ development libraries installed. If they are not installed on your system you can download them from the boost website. A full install of boost is not required. The easiest thing to do is to download the latest boost source tar.gz, extract it, then add the extracted path to the CPLUS_INCLUDE_PATH environment variable (in bash, `export CPLUS_INCLUDE_PATH=/boost/directory/:$CPLUS_INCLUDE_PATH`)\n-        </readme>\n+    <package name="defuse" version="0.6.2">\n+        <repository changeset_revision="5a4237bbe6bf" name="package_defuse_0_6_2" owner="jjohnson" toolshed="https://toolshed.g2.bx.psu.edu" />\n     </package>\n-\n-    <package name="samtools" version="0.1.18">\n-        <install version="1.0">\n-            <actions>\n-                <action type="download_by_url">http://sourceforge.net/projects/samtools/files/samtools/0.1.18/samtools-0.1.18.tar.bz2</action>\n-                <action type="shell_command">sed -i.bak -e \'s/-lcurses/-lncurses/g\' Makefile</action>\n-                <action type="shell_command">make</action>\n-                <action type="move_file">\n-                    <source>samtools</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="move_file">\n-                    <source>misc/maq2sam-long</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="set_environment">\n-                    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>\n-                </action>\n-            </actions>\n-        </install>\n-        <readme>\n-Compiling SAMtools requires the ncurses and zlib development libraries.\n-        </readme>\n+    <package name="samtools" version="0.1.19">\n+        <repository changeset_revision="96aab723499f" name="package_samtools_0_1_19" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" />\n     </package>\n-\n-\n-    <package name="bowtie" version="0.12.7">\n-        <install version="1.0">\n-            <actions>\n-                <action type="download_by_url">http://downloads.sourceforge.net/project/bowtie-bio/bowtie/0.12.7/bowtie-0.12.7-src.zip</action>\n-                <action type="shell_command">make</action>\n-                <action type="move_file">\n-                    <source>bowtie</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="move_file">\n-                    <source>bowtie-build</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="move_file">\n-                    <source>bowtie-inspect</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="set_environment">\n-                    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>\n-            </action>\n-            </actions>\n-        </install>\n-        <readme>\n-      '..b'tion>\n-                </action>\n-                <action type="move_file">\n-                    <source>src/snpindex</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="move_file">\n-                    <source>src/cmetindex</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="move_file">\n-                    <source>src/get-genome</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="move_directory_files">\n-                    <source_directory>util</source_directory>\n-                    <destination_directory>$INSTALL_DIR/bin</destination_directory>\n-                </action>\n-                <action type="set_environment">\n-                    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>\n-                </action>\n-            </actions>\n-        </install>\n-        <readme>\n-        </readme>\n+    <package name="gmap" version="2013-05-09">\n+        <repository changeset_revision="953f5eb53593" name="package_gmap_2013_05_09" owner="jjohnson" toolshed="https://toolshed.g2.bx.psu.edu" />\n+    </package>\n+    <package name="blat" version="35x1">\n+        <repository changeset_revision="cc0f4b49b6f1" name="package_blat_35x1" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" />\n     </package>\n-\n-    <package name="blat" version="34x10">\n-        <install version="1.0">\n-            <actions>\n-                <action type="download_by_url">http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/blat</action>\n-                <action type="shell_command">chmod 755 blat</action>\n-                <action type="move_file">\n-                    <source>blat</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="set_environment">\n-                    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>\n-                </action>\n-            </actions>\n-        </install>\n-        <readme>\n-This only handles blat for a non-commercial linux system.\n-\n-Please note that the Blat source and executables are freely available for\n-academic, nonprofit and personal use. Commercial licensing information is\n-available on the Kent Informatics website (http://www.kentinformatics.com/).\n-        </readme>\n+    <package name="R" version="3.1.2">\n+        <repository changeset_revision="c987143177d4" name="package_r_3_1_2" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" />\n     </package>\n-\n-    <package name="fatotwobit" version="34x10">\n-        <install version="1.0">\n-            <actions>\n-                <action type="download_by_url">http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/faToTwoBit</action>\n-                <action type="shell_command">chmod 755 faToTwoBit</action>\n-                <action type="move_file">\n-                    <source>faToTwoBit</source>\n-                    <destination>$INSTALL_DIR/bin</destination>\n-                </action>\n-                <action type="set_environment">\n-                    <environment_variable name="PATH" action="prepend_to">$INSTALL_DIR/bin</environment_variable>\n-                </action>\n-            </actions>\n-        </install>\n-        <readme>\n-This only handles faToTwoBit for a non-commercial linux system.\n-\n-Please note that the source and executables are freely available for\n-academic, nonprofit and personal use. Commercial licensing information is\n-available on the Kent Informatics website (http://www.kentinformatics.com/).\n-        </readme>\n+    <package name="ada" version="2.0.3">\n+        <repository changeset_revision="f0e6af8a95e5" name="package_r_ada_2_0_3" owner="jjohnson" toolshed="https://toolshed.g2.bx.psu.edu" />\n     </package>\n-\n </tool_dependency>\n'