Galaxy |

Changeset 1:ebb02ba5987c (2014-03-21)

Previous changeset 0:6820983ba5d5 (2014-03-18)

Commit message:
Rewrite of param handling. interPairEnd param moved to "paired" section. Add param for '-a' option. Remove basic parallelism tag, which does not work with multiple inputs (thanks Bjoern Gruening for the notice). Simplify Python code.

modified:
bwa_mem.py
bwa_mem.xml
readme.rst

diff -r 6820983ba5d5 -r ebb02ba5987c bwa_mem.py
--- a/bwa_mem.py Tue Mar 18 07:49:22 2014 -0400
+++ b/bwa_mem.py Fri Mar 21 12:56:15 2014 -0400

[

b'@@ -18,25 +18,6 @@\n \n import optparse, os, shutil, subprocess, sys, tempfile\n \n-def stop_err( msg ):\n- sys.stderr.write( \'%s\\n\' % msg )\n- sys.exit()\n-\n-def check_is_double_encoded( fastq ):\n- # check that first read is bases, not one base followed by numbers\n- bases = [ \'A\', \'C\', \'G\', \'T\', \'a\', \'c\', \'g\', \'t\', \'N\' ]\n- nums = [ \'0\', \'1\', \'2\', \'3\' ]\n- for line in file( fastq, \'rb\'):\n- if not line.strip() or line.startswith( \'@\' ):\n- continue\n- if len( [ b for b in line.strip() if b in nums ] ) > 0:\n- return False\n- elif line.strip()[0] in bases and len( [ b for b in line.strip() if b in bases ] ) == len( line.strip() ):\n- return True\n- else:\n- raise Exception, \'First line in first read does not appear to be a valid FASTQ read in either base-space or color-space\'\n- raise Exception, \'There is no non-comment and non-blank line in your FASTQ file\'\n-\n def __main__():\n descr = "bwa_mem.py: Map (long length) reads against a reference genome with BWA-MEM."\n parser = optparse.OptionParser(description=descr)\n@@ -50,20 +31,20 @@\n parser.add_option( \'-s\', \'--fileSource\', help=\'Whether to use a previously indexed reference sequence or one form history (indexed or history)\' )\n parser.add_option( \'-D\', \'--dbkey\', help=\'Dbkey for reference genome\' )\n \n- parser.add_option( \'-k\', \'--minEditDistSeed\', default=19, type=int, help=\'Minimum edit distance to the seed [19]\' )\n- parser.add_option( \'-w\', \'--bandWidth\', default=100, type=int, help=\'Band width for banded alignment [100]\' )\n- parser.add_option( \'-d\', \'--offDiagonal\', default=100, type=int, help=\'off-diagonal X-dropoff [100]\' )\n- parser.add_option( \'-r\', \'--internalSeeds\', default=1.5, type=float, help=\'look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]\' )\n- parser.add_option( \'-c\', \'--seedsOccurrence\', default=10000, type=int, help=\'skip seeds with more than INT occurrences [10000]\' )\n- parser.add_option( \'-S\', \'--mateRescue\', default=False, help=\'skip mate rescue\' )\n- parser.add_option( \'-P\', \'--skipPairing\', default=False, help=\'skpe pairing, mate rescue performed unless -S also in use\' )\n- parser.add_option( \'-A\', \'--seqMatch\', default=1, type=int, help=\'score of a sequence match\' )\n- parser.add_option( \'-B\', \'--mismatch\', default=4, type=int, help=\'penalty for a mismatch\' )\n- parser.add_option( \'-O\', \'--gapOpen\', default=6, type=int, help=\'gap open penalty\' )\n- parser.add_option( \'-E\', \'--gapExtension\', default=None, help=\'gap extension penalty; a gap of size k cost {-O} + {-E}*k [1]\' )\n- parser.add_option( \'-L\', \'--clipping\', default=5, type=int, help=\'penalty for clipping [5]\' )\n- parser.add_option( \'-U\', \'--unpairedReadpair\', default=17, type=int, help=\'penalty for an unpaired read pair [17]\' )\n- parser.add_option( \'-p\', \'--interPairEnd\', default=False, help=\'first query file consists of interleaved paired-end sequences\' )\n+ parser.add_option( \'-k\', \'--minSeedLength\', type=int, help=\'Minimum seed length [19]\' )\n+ parser.add_option( \'-w\', \'--bandWidth\', type=int, help=\'Band width for banded alignment [100]\' )\n+ parser.add_option( \'-d\', \'--offDiagonal\', type=int, help=\'Off-diagonal X-dropoff [100]\' )\n+ parser.add_option( \'-r\', \'--internalSeeds\', type=float, help=\'Look for internal seeds inside a seed longer than {-k} * FLOAT [1.5]\' )\n+ parser.add_option( \'-c\', \'--seedsOccurrence\', type=int, help=\'Skip seeds with more than INT occurrences [10000]\' )\n+ parser.add_option( \'-S\', \'--mateRescue\', action=\'store_true\', help=\'Skip mate rescue\' )\n+ parser.add_option( \'-P\', \'--skipPairing\', action=\'store_true\', help=\'Skip pairing\' )\n+ parser.add_option( \'-A\', \'--seqMatch\', type=int, help=\'Score for a sequence match [1]\' )\n+ parser.add_option( \'-B\', \'--mismatch\', type=int, help=\'Penalty for a mismatch [4]\' )\n+ parser.add_option( \'-O\', \'--gapOpen\', type=int, help=\'Gap open penalty [6]\' )\n+ parser.add_'..b' tmp_stderr = open( tmp, \'rb\' )\n+ stderr = \'\'\n try:\n- tmp = tempfile.NamedTemporaryFile( dir=tmp_dir ).name\n- tmp_stderr = open( tmp, \'wb\' )\n- print "The cmd is %s" % cmd\n- proc = subprocess.Popen( args=cmd, shell=True, cwd=tmp_dir, stderr=tmp_stderr.fileno() )\n- returncode = proc.wait()\n- tmp_stderr.close()\n- # get stderr, allowing for case where it\'s very large\n- tmp_stderr = open( tmp, \'rb\' )\n- stderr = \'\'\n- try:\n- while True:\n- stderr += tmp_stderr.read( buffsize )\n- if not stderr or len( stderr ) % buffsize != 0:\n- break\n- except OverflowError:\n- pass\n- tmp_stderr.close()\n- if returncode != 0:\n- raise Exception, stderr\n+ while True:\n+ stderr += tmp_stderr.read( buffsize )\n+ if not stderr or len( stderr ) % buffsize != 0:\n+ break\n+ except OverflowError:\n+ pass\n+ tmp_stderr.close()\n+ if returncode != 0:\n+ raise Exception, stderr\n+ except Exception, e:\n+ raise Exception, \'Error generating alignments. \' + str( e )\n+ # remove header if necessary\n+ if options.suppressHeader:\n+ tmp_out = tempfile.NamedTemporaryFile( dir=tmp_dir)\n+ tmp_out_name = tmp_out.name\n+ tmp_out.close()\n+ try:\n+ shutil.move( options.output, tmp_out_name )\n except Exception, e:\n- raise Exception, \'Error generating alignments. \' + str( e )\n- # remove header if necessary\n- if options.suppressHeader == \'true\':\n- tmp_out = tempfile.NamedTemporaryFile( dir=tmp_dir)\n- tmp_out_name = tmp_out.name\n- tmp_out.close()\n- try:\n- shutil.move( options.output, tmp_out_name )\n- except Exception, e:\n- raise Exception, \'Error moving output file before removing headers. \' + str( e )\n- fout = file( options.output, \'w\' )\n- for line in file( tmp_out.name, \'r\' ):\n- if not ( line.startswith( \'@HD\' ) or line.startswith( \'@SQ\' ) or line.startswith( \'@RG\' ) or line.startswith( \'@PG\' ) or line.startswith( \'@CO\' ) ):\n- fout.write( line )\n- fout.close()\n- # check that there are results in the output file\n- if os.path.getsize( options.output ) > 0:\n- sys.stdout.write( \'BWA run on %s-end data\' % options.genAlignType )\n- else:\n- raise Exception, \'The output file is empty. You may simply have no matches, or there may be an error with your input file or settings.\'\n- except Exception, e:\n- stop_err( \'The alignment failed.\\n\' + str( e ) )\n+ raise Exception, \'Error moving output file before removing headers. \' + str( e )\n+ fout = file( options.output, \'w\' )\n+ for line in file( tmp_out.name, \'r\' ):\n+ if not ( line.startswith( \'@HD\' ) or line.startswith( \'@SQ\' ) or line.startswith( \'@RG\' ) or line.startswith( \'@PG\' ) or line.startswith( \'@CO\' ) ):\n+ fout.write( line )\n+ fout.close()\n+ # check that there are results in the output file\n+ if os.path.getsize( options.output ) > 0:\n+ sys.stdout.write( \'BWA run on %s-end data\' % options.genAlignType )\n+ else:\n+ raise Exception, \'The output file is empty. You may simply have no matches, or there may be an error with your input file or settings.\'\n finally:\n # clean up temp dir\n if os.path.exists( tmp_index_dir ):\n'

diff -r 6820983ba5d5 -r ebb02ba5987c bwa_mem.xml
--- a/bwa_mem.xml Tue Mar 18 07:49:22 2014 -0400
+++ b/bwa_mem.xml Fri Mar 21 12:56:15 2014 -0400

[

b'@@ -1,26 +1,29 @@\n-<tool id="bwa_mem" name="Map with BWA-MEM" version="0.7.7">\n+<tool id="bwa_mem" name="Map with BWA-MEM" version="0.8.0">\n <requirements>\n <requirement type="package" version="0.7.7">bwa</requirement>\n </requirements>\n <description></description>\n- <parallelism method="basic"></parallelism>\n <version_command>bwa 2>&1 | grep "Version: " | sed -e \'s/Version: //\'</version_command>\n <command interpreter="python">\n bwa_mem.py\n --threads="\\${GALAXY_SLOTS:-1}"\n --fileSource="${genomeSource.refGenomeSource}"\n- #if $genomeSource.refGenomeSource == "history":\n+ #if $genomeSource.refGenomeSource == "history"\n ##build index on the fly\n --ref="${genomeSource.ownFile}"\n --dbkey="${dbkey}"\n- #else:\n+ #else\n ##use precomputed indexes\n --ref="${genomeSource.indices.fields.path}"\n #end if\n \n ## input file(s)\n --fastq="${paired.fastq}"\n- #if $paired.sPaired == "paired":\n+ #if $paired.sPaired == "single"\n+ #if $paired.interPairEnd\n+ --interPairEnd\n+ #end if\n+ #else\n --rfastq="${paired.rfastq}"\n #end if\n \n@@ -30,28 +33,60 @@\n ## run parameters\n --genAlignType="${paired.sPaired}"\n --params="${params.source_select}"\n- #if $params.source_select != "pre_set":\n- --minEditDistSeed="${params.minEditDistSeed}"\n- --bandWidth="${params.bandWidth}"\n- --offDiagonal="${params.offDiagonal}"\n- --internalSeeds="${params.internalSeeds}"\n- --seedsOccurrence="${params.seedsOccurrence}"\n- --mateRescue="${params.mateRescue}"\n- --skipPairing="${params.skipPairing}"\n- --seqMatch="${params.seqMatch}"\n- --mismatch="${params.mismatch}"\n- --gapOpen="${params.gapOpen}"\n- --gapExtension="${params.gapExtension}"\n- --clipping="${params.clipping}"\n- --unpairedReadpair="${params.unpairedReadpair}"\n- --interPairEnd="${params.interPairEnd}"\n- --minScore="${params.minScore}"\n- --mark="${params.mark}"\n+ #if $params.source_select != "pre_set"\n+ #if str($params.minEditDistSeed)\n+ --minSeedLength ${params.minEditDistSeed}\n+ #end if\n+ #if str($params.bandWidth)\n+ --bandWidth ${params.bandWidth}\n+ #end if\n+ #if str($params.offDiagonal)\n+ --offDiagonal ${params.offDiagonal}\n+ #end if\n+ #if str($params.internalSeeds)\n+ --internalSeeds ${params.internalSeeds}\n+ #end if\n+ #if str($params.seedsOccurrence)\n+ --seedsOccurrence ${params.seedsOccurrence}\n+ #end if\n+ #if $params.mateRescue\n+ --mateRescue\n+ #end if\n+ #if $params.skipPairing\n+ --skipPairing\n+ #end if\n+ #if str($params.seqMatch)\n+ --seqMatch ${params.seqMatch}\n+ #end if\n+ #if str($params.mismatch)\n+ --mismatch ${params.mismatch}\n+ #end if\n+ #if str($params.gapOpen)\n+ --gapOpen ${params.gapOpen}\n+ #end if\n+ #if str($params.gapExtension)\n+ --gapExtension ${params.gapExtension}\n+ #end if\n+ #if $params.clipping\n+ --clipping "${params.clipping}"\n+ #end if\n+ #if str($params.unpairedReadpair)\n+ --unpairedReadpair ${params.unpairedReadpair}\n+ #end if\n+ #if str($params.minScore)\n+ --minScore ${params.minScore}\n+ #end if\n+ #if $params.outputAll\n+ --outputAll\n+ #end if\n+ #if $params.mark\n+ --mark\n+ #end if\n \n #if $params.readGroup.specReadGroup == "yes"\n --rgid="${params.readGroup.rgid}"\n --rgsm="${params.readGroup.rgsm}"\n- --rgpl="${params.readGroup.rgpl}"\n+ --rgpl ${params.readGroup.rgpl}\n --rglb="${params.readGroup.rglb}"\n --rgpu="${params.readGroup.rgpu}"\n --rgcn="${params.readGroup.rg'..b'lues : CAPILLARY, LS454, ILLUMINA, \n-SOLID, HELICOS, IONTORRENT and PACBIO" />\n- <param name="rglb" type="text" size="25" label="[Essential]Library name (LB)" help="Required if RG specified" />\n- <param name="rgsm" type="text" size="25" label="[Essential]Sample (SM)" help="Required if RG specified. Use pool name where a pool is being sequenced" />\n+ <param name="rgid" type="text" size="25" label="Read group identifier (ID). Each @RG line must have a unique ID. The value of ID is used in the RG tags of alignment records. Must be unique among all read groups in header section." help="Required if RG specified. Read group IDs may be modified when merging SAM files in order to handle collisions.">\n+ <validator type="empty_field" />\n+ </param>\n+ <param name="rgpl" type="select" label="Platform/technology used to produce the reads (PL)">\n+ <option value="CAPILLARY">CAPILLARY</option>\n+ <option value="LS454">LS454</option>\n+ <option value="ILLUMINA">ILLUMINA</option>\n+ <option value="SOLID">SOLID</option>\n+ <option value="HELICOS">HELICOS</option>\n+ <option value="IONTORRENT">IONTORRENT</option>\n+ <option value="PACBIO">PACBIO</option>\n+ </param>\n+ <param name="rglb" type="text" size="25" label="Library name (LB)" help="Required if RG specified">\n+ <validator type="empty_field" />\n+ </param>\n+ <param name="rgsm" type="text" size="25" label="Sample (SM)" help="Required if RG specified. Use pool name where a pool is being sequenced">\n+ <validator type="empty_field" />\n+ </param>\n <param name="rgpu" type="text" size="25" label="Platform unit (PU)" help="Optional. Unique identifier (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD)" />\n <param name="rgcn" type="text" size="25" label="Sequencing center that produced the read (CN)" help="Optional" />\n <param name="rgds" type="text" size="25" label="Description (DS)" help="Optional" />\n <param name="rgdt" type="text" size="25" label="Date that run was produced (DT)" help="Optional. ISO8601 format date or date/time, like YYYY-MM-DD" />\n- <param name="rgfo" type="text" size="25" label="Flow order (FO). The array of nucleotide bases that correspond to the nucleotides used for each \n-\xef\xac\x82ow of each read." help="Optional. Multi-base \xef\xac\x82ows are encoded in IUPAC format, and non-nucleotide \xef\xac\x82ows by \n-various other characters. Format : /\\*|[ACMGRSVTWYHKDBN]+/" />\n+ <param name="rgfo" type="text" size="25" optional="true" label="Flow order (FO). The array of nucleotide bases that correspond to the nucleotides used for each flow of each read" help="Optional. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\\*|[ACMGRSVTWYHKDBN]+/">\n+ <validator type="regex">\\*|[ACMGRSVTWYHKDBN]+$</validator>\n+ </param>\n <param name="rgks" type="text" size="25" label="The array of nucleotide bases that correspond to the key sequence of each read (KS)" help="Optional" />\n <param name="rgpg" type="text" size="25" label="Programs used for processing the read group (PG)" help="Optional" />\n <param name="rgpi" type="text" size="25" label="Predicted median insert size (PI)" help="Optional" />\n@@ -151,7 +201,7 @@\n </conditional>\n </when>\n </conditional>\n- <param name="suppressHeader" type="boolean" truevalue="true" falsevalue="false" checked="False" label="Suppress the header in the output SAM file" help="BWA produces SAM with several lines of header information" />\n+ <param name="suppressHeader" type="boolean" checked="false" label="Suppress the header in the output SAM file" help="BWA produces SAM with several lines of header information" />\n </inputs>\n \n <outputs>\n'

diff -r 6820983ba5d5 -r ebb02ba5987c readme.rst
--- a/readme.rst Tue Mar 18 07:49:22 2014 -0400
+++ b/readme.rst Fri Mar 21 12:56:15 2014 -0400

@@ -25,9 +25,10 @@
Version history
---------------

-- Release 0: Initial release in the Tool Shed. This is a fork of http://toolshed.g2.bx.psu.edu/view/yufei-luo/bwa_0_7_5 repository with the following changes: Remove .loc file, only .loc.sample should be included. Fix bwa_index.loc.sample file to contain only comments. Add suppressHeader param as in bwa_wrappers. Use $GALAXY_SLOTS environment variable when available. Add <version_command> and <help>. Remove unused import. Fix spacing and typos. Use new recommended citation. Add tool_dependencies.xml . Rename to bwa_mem. Remove definitively colorspace support. Use optparse instead of argparse since Galaxy still supports Python 2.6 .
+- Release 1 (bwa_mem 0.8.0): Rewrite of param handling. interPairEnd param moved to "paired" section. Add param for '-a' option. Remove basic parallelism tag, which does not work with multiple inputs (thanks Björn Grüning for the notice). Simplify Python code.
+- Release 0 (bwa_mem 0.7.7): Initial release in the Tool Shed. This is a fork of http://toolshed.g2.bx.psu.edu/view/yufei-luo/bwa_0_7_5 repository with the following changes: Remove .loc file, only .loc.sample should be included. Fix bwa_index.loc.sample file to contain only comments. Add suppressHeader param as in bwa_wrappers. Use $GALAXY_SLOTS environment variable when available. Add <version_command> and <help>. Remove unused import. Fix spacing and typos. Use new recommended citation. Add tool_dependencies.xml . Rename to bwa_mem. Remove definitively colorspace support. Use optparse instead of argparse since Galaxy still supports Python 2.6 . Add COPYING and readme.rst files.

Development
-----------

-Development is hosted at https://bitbucket.org/nsoranzo/bwa_mem . Contributions and bug reports are very welcome!
+Development is hosted at https://bitbucket.org/crs4/orione-tools . Contributions and bug reports are very welcome!