Previous changeset 1:70248e6e3efc (2015-08-05) Next changeset 3:a4f602cc3aa9 (2015-10-02) |
Commit message:
v0.0.8 - renamed folder, added note about mirabait |
added:
tools/mira4_0/README.rst tools/mira4_0/mira4.py tools/mira4_0/mira4_bait.py tools/mira4_0/mira4_convert.py tools/mira4_0/mira4_de_novo.xml tools/mira4_0/mira4_make_bam.py tools/mira4_0/mira4_mapping.xml tools/mira4_0/mira4_validator.py tools/mira4_0/repository_dependencies.xml tools/mira4_0/tool_dependencies.xml |
removed:
tools/mira4_assembler/README.rst tools/mira4_assembler/mira4.py tools/mira4_assembler/mira4_bait.py tools/mira4_assembler/mira4_convert.py tools/mira4_assembler/mira4_de_novo.xml tools/mira4_assembler/mira4_make_bam.py tools/mira4_assembler/mira4_mapping.xml tools/mira4_assembler/mira4_validator.py tools/mira4_assembler/repository_dependencies.xml tools/mira4_assembler/tool_dependencies.xml |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/README.rst Wed Sep 02 07:46:29 2015 -0400 |
b |
@@ -0,0 +1,157 @@ +Galaxy wrapper for the MIRA assembly program (v4.0) +=================================================== + +This tool is copyright 2011-2015 by Peter Cock, The James Hutton Institute +(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. +See the licence text below (MIT licence). + +This tool is a short Python script (to collect the MIRA output and move it +to where Galaxy expects the files) and associated Galaxy wrapper XML file. + +It is available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler + +It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format, +http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes + +A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler + +Automated Installation +====================== + +This should be straightforward. Via the Tool Shed, Galaxy should automatically +install the 'mira' datatype, samtools, and download and install the precompiled +binary for MIRA v4.0.2 for the Galaxy wrapper, and run any tests. + +For MIRA 4, the Galaxy wrapper has been split in two, allowing separate +cluster settings for de novo usage (high RAM) and mapping (lower RAM). +Consult the Galaxy adminstration documentation for your cluster setup. + +WARNING: For larger tasks, be aware that MIRA can require vast amounts +of RAM and run-times of over a week are possible. This tool wrapper makes +no attempt to spot and reject such large jobs. + + +Manual Installation +=================== + +First install the 'mira' datatype for Galaxy, available here: + +* http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes + +There are various Python and XML files to install into Galaxy: + +* ``mira4_de_novo.xml`` (the Galaxy tool definition for de novo usage) +* ``mira4_mapping.xml`` (the Galaxy tool definition for mapping usage) +* ``mira4_convert.xml`` (the Galaxy tool definition for converting MIRA files) +* ``mira4_bait.xml`` (the Galaxy tool definition for mirabait) +* ``mira4.py`` (the Python wrapper script) +* ``mira4_convert.py`` (the Python wrapper script for miraconvert) +* ``mira4_bait.py`` (the Python wrapper script for mirabait) +* ``mira4_validator.py`` (the XML parameter validation script) + +The suggested location is a new ``tools/mira4_0`` folder. You will also need to +modify the ``tools_conf.xml`` file to tell Galaxy to offer the tool:: + + <tool file="mira4_0/mira4_de_novo.xml" /> + <tool file="mira4_0/mira4_mapping.xml" /> + ... + +You will also need to install MIRA, we used version 4.0.2, and define the +environment variable ``$MIRA4`` pointing at the folder containing the binaries. +See: + +* http://chevreux.org/projects_mira.html +* http://sourceforge.net/projects/mira-assembler/ + +You may wish to use different cluster setups for the de novo and mapping +tools, see above. + +You will also need to install samtools (for generating a BAM file from MIRA's +SAM output). + +If you wish to run the unit tests, also move/copy the ``test-data/`` files +under Galaxy's ``test-data/`` folder. Then:: + + $ ./run_tests.sh -id mira_4_0_bait + $ ./run_tests.sh -id mira_4_0_de_novo + $ ./run_tests.sh -id mira_4_0_mapping + $ ./run_tests.sh -id mira_4_0_convert + + +History +======= + +======= ====================================================================== +Version Changes +------- ---------------------------------------------------------------------- +v0.0.1 - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4) +v0.0.2 - Include BAM output (using ``miraconvert`` and ``samtools``). + - Updated to target MIRA 4.0.1 + - Simplified XML to apply input format to output data. + - Sets temporary folder at run time to respect environment variables + (``$TMPDIR``, ``$TEMP``, or ``$TMP`` in that order). This was + previously hard coded as ``/tmp``. +v0.0.3 - Updated to target MIRA 4.0.2 +v0.0.4 - Using ``optparse`` for the Python wrapper script API + - Made MAF and BAM outputs optional + - Include wrapper for ``miraconvert`` +v0.0.5 - Tool definition now embeds citation information. +v0.0.6 - Fixed error handling in ``mira4_convert.py``. +v0.0.7 - Renamed folder (internal change only). + - Reorder XML elements (internal change only). + - Use the ``format_source=...`` tag in the MIRA bait wrapper. + - Planemo for Tool Shed upload (``.shed.yml``, internal change only). + - MIRA 4.0.2 dependency now declared via dedicated Tool Shed package. +v0.0.8 - Renamed folder now have a MIRA 4.9.x wrapper (internal change only). +======= ====================================================================== + + +Developers +========== + +Development is on a dedicated GitHub repository: +https://github.com/peterjc/pico_galaxy/tree/master/tools/mira_4_0 + +For pushing a release to the test or main "Galaxy Tool Shed", use the following +Planemo commands (which requires you have set your Tool Shed access details in +``~/.planemo.yml`` and that you have access rights on the Tool Shed):: + + $ planemo shed_update -t testtoolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_0/ + ... + +or:: + + $ planemo shed_update -t toolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_0/ + ... + +To just build and check the tar ball, use:: + + $ planemo shed_upload --tar_only ~/repositories/pico_galaxy/tools/mira4_0/ + ... + $ tar -tzf shed_upload.tar.gz + test-data/U13small_m.fastq + ... + + +Licence (MIT) +============= + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/mira4.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/mira4.py Wed Sep 02 07:46:29 2015 -0400 |
[ |
b'@@ -0,0 +1,313 @@\n+#!/usr/bin/env python\n+"""A simple wrapper script to call MIRA and collect its output.\n+"""\n+import os\n+import sys\n+import subprocess\n+import shutil\n+import time\n+import tempfile\n+from optparse import OptionParser\n+\n+#Do we need any PYTHONPATH magic?\n+from mira4_make_bam import make_bam\n+\n+WRAPPER_VER = "0.0.4" #Keep in sync with the XML file\n+\n+def sys_exit(msg, err=1):\n+ sys.stderr.write(msg+"\\n")\n+ sys.exit(err)\n+\n+\n+def get_version(mira_binary):\n+ """Run MIRA to find its version number"""\n+ # At the commend line I would use: mira -v | head -n 1\n+ # however there is some pipe error when doing that here.\n+ cmd = [mira_binary, "-v"]\n+ try:\n+ child = subprocess.Popen(cmd,\n+ stdout=subprocess.PIPE,\n+ stderr=subprocess.STDOUT)\n+ except Exception, err:\n+ sys.stderr.write("Error invoking command:\\n%s\\n\\n%s\\n" % (" ".join(cmd), err))\n+ sys.exit(1)\n+ ver, tmp = child.communicate()\n+ del child\n+ return ver.split("\\n", 1)[0].strip()\n+\n+#Parse Command Line\n+usage = """Galaxy MIRA4 wrapper script v%s - use as follows:\n+\n+$ python mira4.py ...\n+\n+This will run the MIRA binary and collect its output files as directed.\n+""" % WRAPPER_VER\n+parser = OptionParser(usage=usage)\n+parser.add_option("-m", "--manifest", dest="manifest",\n+ default=None, metavar="FILE",\n+ help="MIRA manifest filename")\n+parser.add_option("--maf", dest="maf",\n+ default="-", metavar="FILE",\n+ help="MIRA MAF output filename")\n+parser.add_option("--bam", dest="bam",\n+ default="-", metavar="FILE",\n+ help="Unpadded BAM output filename")\n+parser.add_option("--fasta", dest="fasta",\n+ default="-", metavar="FILE",\n+ help="Unpadded FASTA output filename")\n+parser.add_option("--log", dest="log",\n+ default="-", metavar="FILE",\n+ help="MIRA logging output filename")\n+parser.add_option("-v", "--version", dest="version",\n+ default=False, action="store_true",\n+ help="Show version and quit")\n+options, args = parser.parse_args()\n+manifest = options.manifest\n+out_maf = options.maf\n+out_bam = options.bam\n+out_fasta = options.fasta\n+out_log = options.log\n+\n+try:\n+ mira_path = os.environ["MIRA4"]\n+except KeyError:\n+ sys_exit("Environment variable $MIRA4 not set")\n+mira_binary = os.path.join(mira_path, "mira")\n+if not os.path.isfile(mira_binary):\n+ sys_exit("Missing mira under $MIRA4, %r\\nFolder contained: %s"\n+ % (mira_binary, ", ".join(os.listdir(mira_path))))\n+mira_convert = os.path.join(mira_path, "miraconvert")\n+if not os.path.isfile(mira_convert):\n+ sys_exit("Missing miraconvert under $MIRA4, %r\\nFolder contained: %s"\n+ % (mira_convert, ", ".join(os.listdir(mira_path))))\n+\n+mira_ver = get_version(mira_binary)\n+if not mira_ver.strip().startswith("4.0"):\n+ sys_exit("This wrapper is for MIRA V4.0, not:\\n%s\\n%s" % (mira_ver, mira_binary))\n+mira_convert_ver = get_version(mira_convert)\n+if not mira_convert_ver.strip().startswith("4.0"):\n+ sys_exit("This wrapper is for MIRA V4.0, not:\\n%s\\n%s" % (mira_ver, mira_convert))\n+if options.version:\n+ print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)\n+ if mira_ver != mira_convert_ver:\n+ print "WARNING: miraconvert %s" % mira_convert_ver\n+ sys.exit(0)\n+\n+if not manifest:\n+ sys_exit("Manifest is required")\n+elif not os.path.isfile(manifest):\n+ sys_exit("Missing input MIRA manifest file: %r" % manifest)\n+\n+\n+try:\n+ threads = int(os.environ.get("GALAXY_SLOTS", "1"))\n+except ValueError:\n+ threads = 1\n+assert 1 <= threads, threads\n+\n+\n+def override_temp(manifest):\n+ """Override ``-DI:trt=/tmp`` in manifest with environment variable.\n+\n+ Currently MIRA 4 does not allow envronment variables like ``$TMP``\n+ inside the manifest, which'..b't_maf, ref_fasta, out_bam, handle)\n+ else:\n+ #Not collecting the MAF file, use original location \n+ msg = make_bam(mira_convert, old_maf, ref_fasta, out_bam, handle)\n+ if msg:\n+ sys_exit(msg)\n+\n+def clean_up(temp, name):\n+ folder = "%s/%s_assembly" % (temp, name)\n+ if os.path.isdir(folder):\n+ shutil.rmtree(folder)\n+\n+#TODO - Run MIRA in /tmp or a configurable directory?\n+#Currently Galaxy puts us somewhere safe like:\n+#/opt/galaxy-dist/database/job_working_directory/846/\n+temp = "."\n+\n+name = "MIRA"\n+\n+override_temp(manifest)\n+\n+start_time = time.time()\n+cmd_list = [mira_binary, "-t", str(threads), manifest]\n+cmd = " ".join(cmd_list)\n+\n+assert os.path.isdir(temp)\n+d = "%s_assembly" % name\n+#This can fail on my development machine if stale folders exist\n+#under Galaxy\'s .../database/job_working_directory/ tree:\n+assert not os.path.isdir(d), "Path %r already exists:\\n%s" % (d, os.path.abspath(d))\n+try:\n+ #Check path access\n+ os.mkdir(d)\n+except Exception, err:\n+ log_manifest(manifest)\n+ sys.stderr.write("Error making directory %s\\n%s" % (d, err))\n+ sys.exit(1)\n+\n+#print os.path.abspath(".")\n+#print cmd\n+\n+if out_log and out_log != "-":\n+ handle = open(out_log, "w")\n+else:\n+ handle = open(os.devnull, "w")\n+handle.write("======================== MIRA manifest (instructions) ========================\\n")\n+m = open(manifest, "rU")\n+for line in m:\n+ handle.write(line)\n+m.close()\n+del m\n+handle.write("\\n")\n+handle.write("============================ Starting MIRA now ===============================\\n")\n+handle.flush()\n+try:\n+ #Run MIRA\n+ child = subprocess.Popen(cmd_list,\n+ stdout=handle,\n+ stderr=subprocess.STDOUT)\n+except Exception, err:\n+ log_manifest(manifest)\n+ sys.stderr.write("Error invoking command:\\n%s\\n\\n%s\\n" % (cmd, err))\n+ #TODO - call clean up?\n+ handle.write("Error invoking command:\\n%s\\n\\n%s\\n" % (cmd, err))\n+ handle.close()\n+ sys.exit(1)\n+#Use .communicate as can get deadlocks with .wait(),\n+stdout, stderr = child.communicate()\n+assert not stdout and not stderr #Should be empty as sent to handle\n+run_time = time.time() - start_time\n+return_code = child.returncode\n+handle.write("\\n")\n+handle.write("============================ MIRA has finished ===============================\\n")\n+handle.write("MIRA took %0.2f hours\\n" % (run_time / 3600.0))\n+if return_code:\n+ print "MIRA took %0.2f hours" % (run_time / 3600.0)\n+ handle.write("Return error code %i from command:\\n" % return_code)\n+ handle.write(cmd + "\\n")\n+ handle.close()\n+ clean_up(temp, name)\n+ log_manifest(manifest)\n+ sys_exit("Return error code %i from command:\\n%s" % (return_code, cmd),\n+ return_code)\n+handle.flush()\n+\n+if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):\n+ handle.write("\\n")\n+ handle.write("====================== Extract Large Contigs failed ==========================\\n")\n+ e = open("MIRA_assembly/MIRA_d_results/ec.log", "rU")\n+ for line in e:\n+ handle.write(line)\n+ e.close()\n+ handle.write("============================ (end of ec.log) =================================\\n")\n+ handle.flush()\n+\n+#print "Collecting output..."\n+start_time = time.time()\n+collect_output(temp, name, handle)\n+collect_time = time.time() - start_time\n+handle.write("MIRA took %0.2f hours; collecting output %0.2f minutes\\n" % (run_time / 3600.0, collect_time / 60.0))\n+print("MIRA took %0.2f hours; collecting output %0.2f minutes\\n" % (run_time / 3600.0, collect_time / 60.0))\n+\n+if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):\n+ #Treat as an error, but doing this AFTER collect_output\n+ sys.stderr.write("Extract Large Contigs failed\\n")\n+ handle.write("Extract Large Contigs failed\\n")\n+ handle.close()\n+ sys.exit(1)\n+\n+#print "Cleaning up..."\n+clean_up(temp, name)\n+\n+handle.write("\\nDone\\n")\n+handle.close()\n+print("Done")\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/mira4_bait.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/mira4_bait.py Wed Sep 02 07:46:29 2015 -0400 |
[ |
@@ -0,0 +1,114 @@ +#!/usr/bin/env python +"""A simple wrapper script to call MIRA4's mirabait and collect its output. +""" +import os +import sys +import subprocess +import shutil +import time + +WRAPPER_VER = "0.0.5" #Keep in sync with the XML file + +def sys_exit(msg, err=1): + sys.stderr.write(msg+"\n") + sys.exit(err) + + +def get_version(mira_binary): + """Run MIRA to find its version number""" + # At the commend line I would use: mira -v | head -n 1 + # however there is some pipe error when doing that here. + cmd = [mira_binary, "-v"] + try: + child = subprocess.Popen(cmd, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT) + except Exception, err: + sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err)) + sys.exit(1) + ver, tmp = child.communicate() + del child + #Workaround for -v not working in mirabait 4.0RC4 + if "invalid option" in ver.split("\n", 1)[0]: + for line in ver.split("\n", 1): + if " version " in line: + line = line.split() + return line[line.index("version")+1].rstrip(")") + sys_exit("Could not determine MIRA version:\n%s" % ver) + return ver.split("\n", 1)[0] + +try: + mira_path = os.environ["MIRA4"] +except KeyError: + sys_exit("Environment variable $MIRA4 not set") +mira_binary = os.path.join(mira_path, "mirabait") +if not os.path.isfile(mira_binary): + sys_exit("Missing mirabait under $MIRA4, %r\nFolder contained: %s" + % (mira_binary, ", ".join(os.listdir(mira_path)))) +mira_ver = get_version(mira_binary) +if not mira_ver.strip().startswith("4.0"): + sys_exit("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver) +if "-v" in sys.argv or "--version" in sys.argv: + print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER) + sys.exit(0) + + +format, output_choice, strand_choice, kmer_length, min_occurance, bait_file, in_file, out_file = sys.argv[1:] + +if format.startswith("fastq"): + format = "fastq" +elif format == "mira": + format = "maf" +elif format != "fasta": + sys_exit("Was not expected format %r" % format) + +assert out_file.endswith(".dat") +out_file_stem = out_file[:-4] + +cmd_list = [mira_binary, "-f", format, "-t", format, + "-k", kmer_length, "-n", min_occurance, + bait_file, in_file, out_file_stem] +if output_choice == "pos": + pass +elif output_choice == "neg": + #Invert the selection... + cmd_list.insert(1, "-i") +else: + sys_exit("Output choice should be 'pos' or 'neg', not %r" % output_choice) +if strand_choice == "both": + pass +elif strand_choice == "fwd": + #Ingore reverse strand... + cmd_list.insert(1, "-r") +else: + sys_exit("Strand choice should be 'both' or 'fwd', not %r" % strand_choice) + +cmd = " ".join(cmd_list) +#print cmd +start_time = time.time() +try: + #Run MIRA + child = subprocess.Popen(cmd_list, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT) +except Exception, err: + sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) + sys.exit(1) +#Use .communicate as can get deadlocks with .wait(), +stdout, stderr = child.communicate() +assert stderr is None # Due to way we ran with subprocess +run_time = time.time() - start_time +return_code = child.returncode +print "mirabait took %0.2f minutes" % (run_time / 60.0) + +if return_code: + sys.stderr.write(stdout) + sys_exit("Return error code %i from command:\n%s" % (return_code, cmd), + return_code) + +#Capture output +out_tmp = out_file_stem + "." + format +if not os.path.isfile(out_tmp): + sys.stderr.write(stdout) + sys_exit("Missing output file from mirabait: %s" % out_tmp) +shutil.move(out_tmp, out_file) |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/mira4_convert.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/mira4_convert.py Wed Sep 02 07:46:29 2015 -0400 |
[ |
b'@@ -0,0 +1,226 @@\n+#!/usr/bin/env python\n+"""A simple wrapper script to call MIRA and collect its output.\n+\n+This focuses on the miraconvert binary.\n+"""\n+import os\n+import sys\n+import subprocess\n+import shutil\n+import time\n+import tempfile\n+from optparse import OptionParser\n+try:\n+ from io import BytesIO\n+except ImportError:\n+ #Should we worry about Python 2.5 or older?\n+ from StringIO import StringIO as BytesIO\n+\n+#Do we need any PYTHONPATH magic?\n+from mira4_make_bam import depad\n+\n+WRAPPER_VER = "0.0.7" # Keep in sync with the XML file\n+\n+def sys_exit(msg, err=1):\n+ sys.stderr.write(msg+"\\n")\n+ sys.exit(err)\n+\n+def run(cmd):\n+ #Avoid using shell=True when we call subprocess to ensure if the Python\n+ #script is killed, so too is the child process.\n+ try:\n+ child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n+ except Exception, err:\n+ sys_exit("Error invoking command:\\n%s\\n\\n%s\\n" % (" ".join(cmd), err))\n+ #Use .communicate as can get deadlocks with .wait(),\n+ stdout, stderr = child.communicate()\n+ return_code = child.returncode\n+ if return_code:\n+ cmd_str = " ".join(cmd) # doesn\'t quote spaces etc\n+ if stderr and stdout:\n+ sys_exit("Return code %i from command:\\n%s\\n\\n%s\\n\\n%s" % (return_code, cmd_str, stdout, stderr))\n+ else:\n+ sys_exit("Return code %i from command:\\n%s\\n%s" % (return_code, cmd_str, stderr))\n+\n+def get_version(mira_binary):\n+ """Run MIRA to find its version number"""\n+ # At the commend line I would use: mira -v | head -n 1\n+ # however there is some pipe error when doing that here.\n+ cmd = [mira_binary, "-v"]\n+ try:\n+ child = subprocess.Popen(cmd,\n+ stdout=subprocess.PIPE,\n+ stderr=subprocess.STDOUT)\n+ except Exception, err:\n+ sys.stderr.write("Error invoking command:\\n%s\\n\\n%s\\n" % (" ".join(cmd), err))\n+ sys.exit(1)\n+ ver, tmp = child.communicate()\n+ del child\n+ return ver.split("\\n", 1)[0].strip()\n+\n+#Parse Command Line\n+usage = """Galaxy MIRA4 wrapper script v%s - use as follows:\n+\n+$ python mira4_convert.py ...\n+\n+This will run the MIRA miraconvert binary and collect its output files as directed.\n+""" % WRAPPER_VER\n+parser = OptionParser(usage=usage)\n+parser.add_option("--input", dest="input",\n+ default=None, metavar="FILE",\n+ help="MIRA input filename")\n+parser.add_option("-x", "--min_length", dest="min_length",\n+ default="0",\n+ help="Minimum contig length")\n+parser.add_option("-y", "--min_cover", dest="min_cover",\n+ default="0",\n+ help="Minimum average contig coverage")\n+parser.add_option("-z", "--min_reads", dest="min_reads",\n+ default="0",\n+ help="Minimum reads per contig")\n+parser.add_option("--maf", dest="maf",\n+ default="", metavar="FILE",\n+ help="MIRA MAF output filename")\n+parser.add_option("--ace", dest="ace",\n+ default="", metavar="FILE",\n+ help="ACE output filename")\n+parser.add_option("--bam", dest="bam",\n+ default="", metavar="FILE",\n+ help="Unpadded BAM output filename")\n+parser.add_option("--fasta", dest="fasta",\n+ default="", metavar="FILE",\n+ help="Unpadded FASTA output filename")\n+parser.add_option("--cstats", dest="cstats",\n+ default="", metavar="FILE",\n+ help="Contig statistics filename")\n+parser.add_option("-v", "--version", dest="version",\n+ default=False, action="store_true",\n+ help="Show version and quit")\n+options, args = parser.parse_args()\n+if args:\n+ sys_exit("Expected options (e.g. --input example.maf), not arguments")\n+\n+input_maf = options.input\n+out_maf = options.maf\n+out_bam = options.bam\n+out_fasta '..b's.fasta\n+out_ace = options.ace\n+out_cstats = options.cstats\n+\n+try:\n+ mira_path = os.environ["MIRA4"]\n+except KeyError:\n+ sys_exit("Environment variable $MIRA4 not set")\n+mira_convert = os.path.join(mira_path, "miraconvert")\n+if not os.path.isfile(mira_convert):\n+ sys_exit("Missing miraconvert under $MIRA4, %r\\nFolder contained: %s"\n+ % (mira_convert, ", ".join(os.listdir(mira_path))))\n+\n+mira_convert_ver = get_version(mira_convert)\n+if not mira_convert_ver.strip().startswith("4.0"):\n+ sys_exit("This wrapper is for MIRA V4.0, not:\\n%s\\n%s" % (mira_convert_ver, mira_convert))\n+if options.version:\n+ print("%s, MIRA wrapper version %s" % (mira_convert_ver, WRAPPER_VER))\n+ sys.exit(0)\n+\n+if not input_maf:\n+ sys_exit("Input MIRA file is required")\n+elif not os.path.isfile(input_maf):\n+ sys_exit("Missing input MIRA file: %r" % input_maf)\n+\n+if not (out_maf or out_bam or out_fasta or out_ace or out_cstats):\n+ sys_exit("No output requested")\n+\n+\n+def check_min_int(value, name):\n+ try:\n+ i = int(value)\n+ except:\n+ sys_exit("Bad %s setting, %r" % (name, value))\n+ if i < 0:\n+ sys_exit("Negative %s setting, %r" % (name, value))\n+ return i\n+\n+min_length = check_min_int(options.min_length, "minimum length")\n+min_cover = check_min_int(options.min_cover, "minimum cover")\n+min_reads = check_min_int(options.min_reads, "minimum reads")\n+\n+#TODO - Run MIRA in /tmp or a configurable directory?\n+#Currently Galaxy puts us somewhere safe like:\n+#/opt/galaxy-dist/database/job_working_directory/846/\n+temp = "."\n+\n+\n+cmd_list = [mira_convert]\n+if min_length:\n+ cmd_list.extend(["-x", str(min_length)])\n+if min_cover:\n+ cmd_list.extend(["-y", str(min_cover)])\n+if min_reads:\n+ cmd_list.extend(["-z", str(min_reads)])\n+cmd_list.extend(["-f", "maf", input_maf, os.path.join(temp, "converted")])\n+if out_maf:\n+ cmd_list.append("maf")\n+if out_bam:\n+ cmd_list.append("samnbb")\n+ if not out_fasta:\n+ #Need this for samtools depad\n+ out_fasta = os.path.join(temp, "depadded.fasta")\n+if out_fasta:\n+ cmd_list.append("fasta")\n+if out_ace:\n+ cmd_list.append("ace")\n+if out_cstats:\n+ cmd_list.append("cstats")\n+run(cmd_list)\n+\n+def collect(old, new):\n+ if not os.path.isfile(old):\n+ sys_exit("Missing expected output file %s" % old)\n+ shutil.move(old, new)\n+\n+if out_maf:\n+ collect(os.path.join(temp, "converted.maf"), out_maf)\n+if out_fasta:\n+ #Can we look at the MAF file to see if there are multiple strains?\n+ old = os.path.join(temp, "converted_AllStrains.unpadded.fasta")\n+ if os.path.isfile(old):\n+ collect(old, out_fasta)\n+ else:\n+ #Might the output be filtered down to zero contigs?\n+ old = os.path.join(temp, "converted.fasta")\n+ if not os.path.isfile(old):\n+ sys_exit("Missing expected output FASTA file")\n+ elif os.path.getsize(old) == 0:\n+ print("Warning - no contigs (harsh filters?)")\n+ collect(old, out_fasta)\n+ else:\n+ sys_exit("Missing expected output FASTA file (only generic file present)")\n+if out_ace:\n+ collect(os.path.join(temp, "converted.maf"), out_ace)\n+if out_cstats:\n+ collect(os.path.join(temp, "converted_info_contigstats.txt"), out_cstats)\n+\n+if out_bam:\n+ assert os.path.isfile(out_fasta)\n+ old = os.path.join(temp, "converted.samnbb")\n+ if not os.path.isfile(old):\n+ old = os.path.join(temp, "converted.sam")\n+ if not os.path.isfile(old):\n+ sys_exit("Missing expected intermediate file %s" % old)\n+ h = BytesIO()\n+ msg = depad(out_fasta, old, out_bam, h)\n+ if msg:\n+ print(msg)\n+ print(h.getvalue())\n+ h.close()\n+ sys.exit(1)\n+ h.close()\n+ if out_fasta == os.path.join(temp, "depadded.fasta"):\n+ #Not asked for by Galaxy, no longer needed\n+ os.remove(out_fasta)\n+\n+if min_length or min_cover or min_reads:\n+ print("Filtered.")\n+else:\n+ print("Converted.")\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/mira4_de_novo.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/mira4_de_novo.xml Wed Sep 02 07:46:29 2015 -0400 |
b |
b'@@ -0,0 +1,275 @@\n+<tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.8">\n+ <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>\n+ <requirements>\n+ <requirement type="binary">mira</requirement>\n+ <requirement type="binary">miraconvert</requirement>\n+ <requirement type="package" version="4.0.2">MIRA</requirement>\n+ <requirement type="binary">samtools</requirement>\n+ <requirement type="package" version="0.1.19">samtools</requirement>\n+ </requirements>\n+ <code file="mira4_validator.py" />\n+ <stdio>\n+ <!-- Assume anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ </stdio>\n+ <version_command interpreter="python">mira4.py --version</version_command>\n+ <command interpreter="python">mira4.py\n+--manifest "$manifest"\n+#if str($maf_wanted)=="true":\n+--maf "$out_maf"\n+#end if\n+#if str($bam_wanted)=="true":\n+--bam "$out_bam"\n+#end if\n+--fasta "$out_fasta"\n+--log "$out_log"\n+ </command>\n+ <configfiles>\n+ <configfile name="manifest">\n+project = MIRA\n+job = denovo,${job_type},${job_quality}\n+parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no\n+## -GE:not is short for -GENERAL:number_of_threads and using one (1)\n+## can be useful for repeatability of assemblies and bug hunting.\n+## This is overriden by the command line -t switch which is easier\n+## to set from within Galaxy.\n+##\n+## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength\n+## and without this MIRA aborts with read names over 40 characters\n+## due to limitations of some downstream tools.\n+##\n+## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should\n+## point to a local hard drive (not something like NFS on network).\n+## We replace /tmp with an environment variable via mira4.py\n+##\n+## -OUT:orc=no is short for -OUTPUT:output_result_caf=no \n+## which turns off an output file we don\'t want anyway.\n+\n+#for $rg in $read_group\n+\n+##This bar goes into the manifest as a comment line\n+#------------------------------------------------------------------------------\n+\n+readgroup\n+technology = ${rg.technology}\n+##Record the segment placement (if any)\n+#if str($rg.segments.type) == "paired"\n+segment_placement = ${rg.segments.placement}\n+segment_naming = ${rg.segments.naming}\n+#if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != ""\n+##If our min/max validation failed I trust MIRA to give an error message...\n+template_size = $rg.segments.min_size $rg.segments.max_size\n+#end if\n+#end if\n+##if str($rg.segments.type) == "none"\n+##MIRA4 manual says use segment_placement = unknown or ? for unpaired data\n+##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:\n+##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown\n+##segment_placement = ?\n+##end if\n+##MIRA will accept multiple filenames on one data line, or multiple data lines\n+#for $f in $rg.filenames\n+##Must now map Galaxy datatypes to MIRA file types...\n+#if $f.ext.startswith("fastq")\n+##MIRA doesn\'t like fastqsanger etc, just plain old fastq:\n+data = fastq::$f\n+#elif $f.ext == "mira"\n+##We\'re calling *.maf the "mira" format in Galaxy (name space collision)\n+data = maf::$f\n+#else\n+##MIRA is happy with fasta as name,\n+data = ${f.ext}::$f\n+#end if\n+#end for\n+#end for\n+ </configfile>\n+ </configfiles>\n+ <inputs>\n+ <param name="job_type" type="select" label="Assembly type">\n+ <option value="genome">Genome</option>\n+ <option value="est">EST (transcriptome)</option>\n+ </param>\n+ <param name="job_quality" type="select" label="Assembly quality grade">\n+ <option value="accurate">Accurate</option>\n+ <option value="draft">Draft</option>\n+ </param>\n+ <repeat name="read_group" title="Read Group" min="1">\n+ <param name="technology" type="select" label="Read technology">\n+ '..b'\n+ Note we\'re using just one repeat group,\n+ but two parameters within the repeat (filename, no pairing)\n+ -->\n+ <test>\n+ <param name="job_type" value="genome" />\n+ <param name="job_quality" value="accurate" />\n+ <param name="type" value="none" />\n+ <param name="filenames" value="ecoli.fastq" ftype="fastqsanger" />\n+ <param name="maf_wanted" value="false"/>\n+ <param name="bam_wanted" value="false"/>\n+ <output name="out_fasta" file="ecoli.mira4_de_novo.fasta" ftype="fasta" />\n+ <output name="out_log" file="empty_file.dat" compare="contains" />\n+ </test>\n+ </tests>\n+ <help>\n+\n+**What it does**\n+\n+Runs MIRA v4.0 in de novo mode, collects the output, generates a sorted BAM\n+file, and then throws away all the temporary files.\n+\n+MIRA is an open source assembly tool capable of handling sequence data from\n+a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent\n+and also PacBio).\n+\n+It is particularly suited to small genomes such as bacteria.\n+\n+\n+**Notes on paired reads**\n+\n+.. class:: warningmark\n+\n+MIRA uses read naming conventions to identify paired read partners\n+(and does not care about their order in the input files). In most cases,\n+the Solexa/Illumina setting is fine. For Sanger capillary sequencing,\n+you may need to rename your reads to match one of the standard conventions\n+supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings\n+depend on how the FASTQ file was produced:\n+\n+* If using Roche\'s ``sffinfo`` or older versions of ``sff_extract``\n+ to convert SFF files to FASTQ, your reads will probably have the\n+ ``---> <---`` orientation and use the ``.f`` and ``.r``\n+ suffixes (FR naming).\n+\n+* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``\n+ suffixes are used (Solexa/Illumina style naming) and the original\n+ ``2---> 1--->`` orientation is preserved.\n+\n+The reason for this is the raw data for Roche 454 and Ion Torrent paired-end\n+libraries sequences a circularised fragment such that the raw data begins\n+with the end of the fragment, a linker, then the start of the fragment.\n+This means both the start and end are sequenced from the same strand, and\n+have the orientation ``2---> 1--->``. However, in order to use the data\n+with traditional tools expecting Sanger capillary style ``---> <---``\n+orientation it was common to reverse complement one of the pair to mimic this.\n+\n+\n+**Citation**\n+\n+If you use this Galaxy tool in work leading to a scientific publication please\n+cite the following papers:\n+\n+Peter J.A. Cock, Bj\xc3\xb6rn A. Gr\xc3\xbcning, Konrad Paszkiewicz and Leighton Pritchard (2013).\n+Galaxy tools and workflows for sequence analysis with applications\n+in molecular plant pathology. PeerJ 1:e167\n+http://dx.doi.org/10.7717/peerj.167\n+\n+Bastien Chevreux, Thomas Wetter and S\xc3\xa1ndor Suhai (1999).\n+Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.\n+Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.\n+http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler\n+ </help>\n+ <citations>\n+ <citation type="doi">10.7717/peerj.167</citation>\n+ <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,\n+ author = {B. Chevreux and T. Wetter and S. Suhai},\n+ year = {1999},\n+ title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},\n+ journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}\n+ volume = {99},\n+ pages = {45-56},\n+ url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}\n+ }</citation>\n+ </citations>\n+</tool>\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/mira4_make_bam.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/mira4_make_bam.py Wed Sep 02 07:46:29 2015 -0400 |
[ |
@@ -0,0 +1,92 @@ +#!/usr/bin/env python +"""Wrapper script using miraconvert & samtools to get BAM from MIRA. +""" +import os +import sys +import shutil +import subprocess +import tempfile + +def sys_exit(msg, err=1): + sys.stderr.write(msg+"\n") + sys.exit(err) + +def run(cmd, log_handle): + try: + child = subprocess.Popen(cmd, shell=True, + stdout=subprocess.PIPE, + stderr=subprocess.STDOUT) + except Exception, err: + sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) + #TODO - call clean up? + log_handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) + sys.exit(1) + #Use .communicate as can get deadlocks with .wait(), + stdout, stderr = child.communicate() + assert not stderr #Should be empty as sent to stdout + if len(stdout) > 10000: + #miraconvert can be very verbose (is holding stdout in RAM a problem?) + stdout = stdout.split("\n") + stdout = stdout[:10] + ["...", "<snip>", "..."] + stdout[-10:] + stdout = "\n".join(stdout) + log_handle.write(stdout) + return child.returncode + +def depad(fasta_file, sam_file, bam_file, log_handle): + log_handle.write("\n================= Converting MIRA assembly from SAM to BAM ===================\n") + #Also doing SAM to (uncompressed) BAM during depad + bam_stem = bam_file + ".tmp" # Have write permissions and want final file in this folder + cmd = 'samtools depad -S -u -T "%s" "%s" | samtools sort - "%s"' % (fasta_file, sam_file, bam_stem) + return_code = run(cmd, log_handle) + if return_code: + return "Error %i from command:\n%s" % (return_code, cmd) + if not os.path.isfile(bam_stem + ".bam"): + return "samtools depad or sort failed to produce BAM file" + + log_handle.write("\n====================== Indexing MIRA assembly BAM file =======================\n") + cmd = 'samtools index "%s.bam"' % bam_stem + return_code = run(cmd, log_handle) + if return_code: + return "Error %i from command:\n%s" % (return_code, cmd) + if not os.path.isfile(bam_stem + ".bam.bai"): + return "samtools indexing of BAM file failed to produce BAI file" + + shutil.move(bam_stem + ".bam", bam_file) + os.remove(bam_stem + ".bam.bai") #Let Galaxy handle that... + + +def make_bam(mira_convert, maf_file, fasta_file, bam_file, log_handle): + if not os.path.isfile(mira_convert): + return "Missing binary %r" % mira_convert + if not os.path.isfile(maf_file): + return "Missing input MIRA file: %r" % maf_file + if not os.path.isfile(fasta_file): + return "Missing padded FASTA file: %r" % fasta_file + + log_handle.write("\n====================== Converting MIRA assembly to SAM =======================\n") + tmp_dir = tempfile.mkdtemp() + sam_file = os.path.join(tmp_dir, "x.sam") + + # Note add nbb to the template name, possible MIRA 4.0 RC4 bug + cmd = '"%s" -f maf -t samnbb "%s" "%snbb"' % (mira_convert, maf_file, sam_file) + return_code = run(cmd, log_handle) + if return_code: + return "Error %i from command:\n%s" % (return_code, cmd) + if not os.path.isfile(sam_file): + return "Conversion from MIRA to SAM failed" + + #Also doing SAM to (uncompressed) BAM during depad + msg = depad(fasta_file, sam_file, bam_file, log_handle) + if msg: + return msg + + os.remove(sam_file) + os.rmdir(tmp_dir) + + return None #Good :) + +if __name__ == "__main__": + mira_convert, maf_file, fasta_file, bam_file = sys.argv[1:] + msg = make_bam(mira_convert, maf_file, fasta_file, bam_file, sys.stdout) + if msg: + sys_exit(msg) |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/mira4_mapping.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/mira4_mapping.xml Wed Sep 02 07:46:29 2015 -0400 |
b |
b'@@ -0,0 +1,279 @@\n+<tool id="mira_4_0_mapping" name="MIRA v4.0 mapping" version="0.0.8">\n+ <description>Maps Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>\n+ <requirements>\n+ <requirement type="binary">mira</requirement>\n+ <requirement type="binary">miraconvert</requirement>\n+ <requirement type="package" version="4.0.2">MIRA</requirement>\n+ <requirement type="binary">samtools</requirement>\n+ <requirement type="package" version="0.1.19">samtools</requirement>\n+ </requirements>\n+ <stdio>\n+ <!-- Assume anything other than zero is an error -->\n+ <exit_code range="1:" />\n+ <exit_code range=":-1" />\n+ </stdio>\n+ <version_command interpreter="python">mira4.py --version</version_command>\n+ <command interpreter="python">mira4.py\n+--manifest "$manifest"\n+#if str($maf_wanted) == "true":\n+--maf "$out_maf"\n+#end if\n+#if str($bam_wanted) == "true":\n+--bam "$out_bam"\n+#end if\n+--fasta "$out_fasta"\n+--log "$out_log"\n+ </command>\n+ <configfiles>\n+ <configfile name="manifest">\n+project = MIRA\n+job = mapping,${job_type},${job_quality}\n+parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no\n+## -GE:not is short for -GENERAL:number_of_threads and using one (1)\n+## can be useful for repeatability of assemblies and bug hunting.\n+## This is overriden by the command line -t switch which is easier\n+## to set from within Galaxy.\n+##\n+## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength\n+## and without this MIRA aborts with read names over 40 characters\n+## due to limitations of some downstream tools.\n+##\n+## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should\n+## point to a local hard drive (not something like NFS on network).\n+## We replace /tmp with an environment variable via mira4.py\n+##\n+## -OUT:orc=no is short for -OUTPUT:output_result_caf=no\n+## which turns off an output file we don\'t want anyway.\n+\n+##This bar goes into the manifest as a comment line\n+#------------------------------------------------------------------------------\n+\n+readgroup\n+is_reference\n+#if str($strain_setup)=="same"\n+strain = StrainX\n+#end if\n+#for $f in $references\n+##Must now map Galaxy datatypes to MIRA file types...\n+#if $f.ext.startswith("fastq")\n+##MIRA doesn\'t like fastqsanger etc, just plain old fastq:\n+data = fastq::$f\n+#elif $f.ext == "mira"\n+##We\'re calling *.maf the "mira" format in Galaxy (name space collision)\n+data = maf::$f\n+#elif $f.ext == "fasta"\n+##We\'re calling MIRA with the file type as "fna" as otherwise it wants quals\n+data = fna::$f\n+#else\n+##Currently don\'t expect anything else...\n+data = ${f.ext}::$f\n+#end if\n+#end for\n+#for $rg in $read_group\n+\n+##This bar goes into the manifest as a comment line\n+#------------------------------------------------------------------------------\n+\n+readgroup\n+technology = ${rg.technology}\n+#if str($strain_setup)=="same"\n+##This is perhaps redundant as MIRA defaults to StrainX for the reads:\n+strain = StrainX\n+#end if\n+##Record the segment placement (if any)\n+#if str($rg.segments.type) == "paired"\n+segment_placement = ${rg.segments.placement}\n+segment_naming = ${rg.segments.naming}\n+#end if\n+##if str($rg.segments.type) == "none"\n+##MIRA4 manual says use segment_placement = unknown or ? for unpaired data\n+##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:\n+##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown\n+##segment_placement = ?\n+##end if\n+##MIRA will accept multiple filenames on one data line, or multiple data lines\n+#for $f in $rg.filenames\n+##Must now map Galaxy datatypes to MIRA file types...\n+#if $f.ext.startswith("fastq")\n+##MIRA doesn\'t like fastqsanger etc, just plain old fastq:\n+data = fastq::$f\n+#elif $f.ext == "mira"\n+##We\'re calling *.maf the "mira" format in Galaxy (name space collision)\n+data = maf::$f\n+#else\n+##Currently don\'t expect anything else...\n+data = ${f.ext}::$f\n+#end if\n+#end for\n+#end for\n+ </co'..b'</test>\n+ <test>\n+ <param name="job_type" value="genome" />\n+ <param name="job_quality" value="accurate" />\n+ <param name="references" value="tvc_contigs.fasta" ftype="fasta" />\n+ <param name="strain_setup" value="same" />\n+ <param name="type" value="none" />\n+ <param name="filenames" value="tvc_mini.fastq" ftype="fastqsanger" />\n+ <param name="maf_wanted" value="false"/>\n+ <param name="bam_wanted" value="false"/>\n+ <output name="out_fasta" file="tvc_map_same_strain.fasta" ftype="fasta" />\n+ <output name="out_log" file="empty_file.dat" compare="contains" />\n+ </test>\n+ </tests>\n+ <help>\n+\n+**What it does**\n+\n+Runs MIRA v4.0 in mapping mode, collects the output, generates a sorted BAM\n+file, and throws away all the temporary files.\n+\n+MIRA is an open source assembly tool capable of handling sequence data from\n+a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent\n+and also PacBio).\n+\n+It is particularly suited to small genomes such as bacteria.\n+\n+\n+**Notes on paired reads**\n+\n+.. class:: warningmark\n+\n+MIRA uses read naming conventions to identify paired read partners\n+(and does not care about their order in the input files). In most cases,\n+the Solexa/Illumina setting is fine. For Sanger capillary sequencing,\n+you may need to rename your reads to match one of the standard conventions\n+supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings\n+depend on how the FASTQ file was produced:\n+\n+* If using Roche\'s ``sffinfo`` or older versions of ``sff_extract``\n+ to convert SFF files to FASTQ, your reads will probably have the\n+ ``---> <---`` orientation and use the ``.f`` and ``.r``\n+ suffixes (FR naming).\n+\n+* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``\n+ suffixes are used (Solexa/Illumina style naming) and the original\n+ ``2---> 1--->`` orientation is preserved.\n+\n+The reason for this is the raw data for Roche 454 and Ion Torrent paired-end\n+libraries sequences a circularised fragment such that the raw data begins\n+with the end of the fragment, a linker, then the start of the fragment.\n+This means both the start and end are sequenced from the same strand, and\n+have the orientation ``2---> 1--->``. However, in order to use the data\n+with traditional tools expecting Sanger capillary style ``---> <---``\n+orientation it was common to reverse complement one of the pair to mimic this.\n+\n+\n+**Citation**\n+\n+If you use this Galaxy tool in work leading to a scientific publication please\n+cite the following papers:\n+\n+Peter J.A. Cock, Bj\xc3\xb6rn A. Gr\xc3\xbcning, Konrad Paszkiewicz and Leighton Pritchard (2013).\n+Galaxy tools and workflows for sequence analysis with applications\n+in molecular plant pathology. PeerJ 1:e167\n+http://dx.doi.org/10.7717/peerj.167\n+\n+Bastien Chevreux, Thomas Wetter and S\xc3\xa1ndor Suhai (1999).\n+Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.\n+Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.\n+http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html\n+\n+This wrapper is available to install into other Galaxy Instances via the Galaxy\n+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler\n+ </help>\n+ <citations>\n+ <citation type="doi">10.7717/peerj.167</citation>\n+ <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,\n+ author = {B. Chevreux and T. Wetter and S. Suhai},\n+ year = {1999},\n+ title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},\n+ journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}\n+ volume = {99},\n+ pages = {45-56},\n+ url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}\n+ }</citation>\n+ </citations>\n+</tool>\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/mira4_validator.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/mira4_validator.py Wed Sep 02 07:46:29 2015 -0400 |
[ |
@@ -0,0 +1,64 @@ +#Called from the Galaxy Tool XML file +#import sys + +def validate_input(trans, error_map, param_values, page_param_map): + """Validates the min_size/max_size user input, before execution.""" + err_list = [] + for read_group in param_values["read_group"]: + err = dict() + segments = read_group["segments"] + if str(segments["type"]) != "paired": + err_list.append(dict()) + continue + + min_size = str(segments["min_size"]).strip() + max_size = str(segments["max_size"]).strip() + #sys.stderr.write("DEBUG min_size=%r, max_size=%r\n" % (min_size, max_size)) + + #Somehow Galaxy seems to turn an empty field into string "None"... + if min_size=="None": + min_size = "" + if max_size=="None": + max_size = "" + + if min_size=="" and max_size=="": + #Both missing is good + pass + elif min_size=="": + err["min_size"] = "Minimum size required if maximum size given" + elif max_size=="": + err["max_size"] = "Maximum size required if minimum size given" + + if min_size: + try: + min_size_int = int(min_size) + if min_size_int < 0: + err["min_size"] = "Minumum size must not be negative (%i)" % min_size_int + min_size = None # Avoid doing comparison below + except ValueError: + err["min_size"] = "Minimum size is not an integer (%s)" % min_size + min_size = None # Avoid doing comparison below + + if max_size: + try: + max_size_int = int(max_size) + if max_size_int< 0: + err["max_size"] = "Maximum size must not be negative (%i)" % max_size_int + max_size = None # Avoid doing comparison below + except ValueError: + err["max_size"] = "Maximum size is not an integer (%s)" % max_size + max_size = None # Avoid doing comparison below + + if min_size and max_size and min_size_int > max_size_int: + msg = "Minimum size must be less than maximum size (%i vs %i)" % (min_size_int, max_size_int) + err["min_size"] = msg + err["max_size"] = msg + + if err: + err_list.append({"segments":err}) + else: + err_list.append(dict()) + + if any(err_list): + #Return an error map only if any readgroup gave errors + error_map["read_group"] = err_list |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/repository_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/repository_dependencies.xml Wed Sep 02 07:46:29 2015 -0400 |
b |
@@ -0,0 +1,4 @@ +<?xml version="1.0"?> +<repositories description="This requires the MIRA datatype definitions (e.g. the MIRA Assembly Format)."> + <repository changeset_revision="ddd2e3362c5e" name="mira_datatypes" owner="peterjc" toolshed="https://toolshed.g2.bx.psu.edu" /> +</repositories> |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_0/tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/mira4_0/tool_dependencies.xml Wed Sep 02 07:46:29 2015 -0400 |
b |
@@ -0,0 +1,9 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="samtools" version="0.1.19"> + <repository changeset_revision="96aab723499f" name="package_samtools_0_1_19" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" /> + </package> + <package name="MIRA" version="4.0.2"> + <repository changeset_revision="8564aa1dbbf5" name="package_mira_4_0_2" owner="peterjc" toolshed="https://toolshed.g2.bx.psu.edu" /> + </package> +</tool_dependency> |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/README.rst --- a/tools/mira4_assembler/README.rst Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,177 +0,0 @@ -Galaxy wrapper for the MIRA assembly program (v4.0) -=================================================== - -This tool is copyright 2011-2014 by Peter Cock, The James Hutton Institute -(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. -See the licence text below (MIT licence). - -This tool is a short Python script (to collect the MIRA output and move it -to where Galaxy expects the files) and associated Galaxy wrapper XML file. - -It is available from the Galaxy Tool Shed at: -http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler - -It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format, -http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes - -A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at: -http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler - -Automated Installation -====================== - -This should be straightforward. Via the Tool Shed, Galaxy should automatically -install the 'mira' datatype, samtools, and download and install the precompiled -binary for MIRA v4.0.2 for the Galaxy wrapper, and run any tests. - -For MIRA 4, the Galaxy wrapper has been split in two, allowing separate -cluster settings for de novo usage (high RAM) and mapping (lower RAM). -Consult the Galaxy adminstration documentation for your cluster setup. - -WARNING: For larger tasks, be aware that MIRA can require vast amounts -of RAM and run-times of over a week are possible. This tool wrapper makes -no attempt to spot and reject such large jobs. - - -Manual Installation -=================== - -First install the 'mira' datatype for Galaxy, available here: - -* http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes - -There are four Galaxy files to install: - -* ``mira4_de_novo.xml`` (the Galaxy tool definition for de novo usage) -* ``mira4_mapping.xml`` (the Galaxy tool definition for mapping usage) -* ``mira4_convert.xml`` (the Galaxy tool definition for converting MIRA files) -* ``mira4_bait.xml`` (the Galaxy tool definition for mirabait) -* ``mira4.py`` (the Python wrapper script) -* ``mira4_convert.py`` (the Python wrapper script for miraconvert) -* ``mira4_bait.py`` (the Python wrapper script for mirabait) -* ``mira4_validator.py`` (the XML parameter validation script) - -The suggested location is a new ``tools/mira4`` folder. You will also need to -modify the ``tools_conf.xml`` file to tell Galaxy to offer the tool:: - - <tool file="mira4/mira4_de_novo.xml" /> - <tool file="mira4/mira4_mapping.xml" /> - -You will also need to install MIRA, we used version 4.0.2, and define the -environment variable ``$MIRA4`` pointing at the folder containing the binaries. -See: - -* http://chevreux.org/projects_mira.html -* http://sourceforge.net/projects/mira-assembler/ - -You may wish to use different cluster setups for the de novo and mapping -tools, see above. - -You will also need to install samtools (for generating a BAM file from MIRA's -SAM output). - -If you wish to run the unit tests, also move/copy the ``test-data/`` files -under Galaxy's ``test-data/`` folder. Then:: - - $ ./run_tests.sh -id mira_4_0_bait - $ ./run_tests.sh -id mira_4_0_de_novo - $ ./run_tests.sh -id mira_4_0_mapping - $ ./run_tests.sh -id mira_4_0_convert - - -History -======= - -======= ====================================================================== -Version Changes -------- ---------------------------------------------------------------------- -v0.0.1 - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4) -v0.0.2 - Include BAM output (using ``miraconvert`` and ``samtools``). - - Updated to target MIRA 4.0.1 - - Simplified XML to apply input format to output data. - - Sets temporary folder at run time to respect environment variables - (``$TMPDIR``, ``$TEMP``, or ``$TMP`` in that order). This was - previously hard coded as ``/tmp``. -v0.0.3 - Updated to target MIRA 4.0.2 -v0.0.4 - Using ``optparse`` for the Python wrapper script API - - Made MAF and BAM outputs optional - - Include wrapper for ``miraconvert`` -v0.0.5 - Tool definition now embeds citation information. -v0.0.6 - Fixed error handling in ``mira4_convert.py``. -v0.0.7 - Renamed folder (internal change only). - - Reorder XML elements (internal change only). - - Use the ``format_source=...`` tag in the MIRA bait wrapper. - - Planemo for Tool Shed upload (``.shed.yml``, internal change only). - - MIRA 4.0.2 dependency now declared via dedicated Tool Shed package. -======= ====================================================================== - - -Developers -========== - -Development is on a dedicated GitHub repository: -https://github.com/peterjc/pico_galaxy/tree/master/tools/mira4_assembler - -For pushing a release to the test or main "Galaxy Tool Shed", use the following -Planemo commands (which requires you have set your Tool Shed access details in -``~/.planemo.yml`` and that you have access rights on the Tool Shed):: - - $ planemo shed_update --shed_target testtoolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_assembler/ - ... - -or:: - - $ planemo shed_update --shed_target toolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_assembler/ - ... - -To just build and check the tar ball, use:: - - $ planemo shed_upload --tar_only ~/repositories/pico_galaxy/tools/mira4_assembler/ - ... - $ tar -tzf shed_upload.tar.gz - test-data/U13small_m.fastq - test-data/U13small_m.mira4_de_novo.fasta - test-data/ecoli.fastq - test-data/ecoli.mira4_de_novo.fasta - test-data/empty_file.dat - test-data/header.mira - test-data/tvc_mini.fastq - test-data/tvc_contigs.fasta - test-data/tvc_map_ref_strain.fasta - test-data/tvc_map_same_strain.fasta - test-data/tvc_bait.fasta - test-data/tvc_mini_bait_neg.fastq - test-data/tvc_mini_bait_pos.fastq - test-data/tvc_mini_bait_strict.fastq - tools/mira4_assembler/README.rst - tools/mira4_assembler/mira4.py - tools/mira4_assembler/mira4_bait.py - tools/mira4_assembler/mira4_convert.py - tools/mira4_assembler/mira4_de_novo.xml - tools/mira4_assembler/mira4_make_bam.py - tools/mira4_assembler/mira4_mapping.xml - tools/mira4_assembler/mira4_validator.py - tools/mira4_assembler/repository_dependencies.xml - tools/mira4_assembler/tool_dependencies.xml - - -Licence (MIT) -============= - -Permission is hereby granted, free of charge, to any person obtaining a copy -of this software and associated documentation files (the "Software"), to deal -in the Software without restriction, including without limitation the rights -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -copies of the Software, and to permit persons to whom the Software is -furnished to do so, subject to the following conditions: - -The above copyright notice and this permission notice shall be included in -all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN -THE SOFTWARE. |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/mira4.py --- a/tools/mira4_assembler/mira4.py Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
b'@@ -1,313 +0,0 @@\n-#!/usr/bin/env python\n-"""A simple wrapper script to call MIRA and collect its output.\n-"""\n-import os\n-import sys\n-import subprocess\n-import shutil\n-import time\n-import tempfile\n-from optparse import OptionParser\n-\n-#Do we need any PYTHONPATH magic?\n-from mira4_make_bam import make_bam\n-\n-WRAPPER_VER = "0.0.4" #Keep in sync with the XML file\n-\n-def sys_exit(msg, err=1):\n- sys.stderr.write(msg+"\\n")\n- sys.exit(err)\n-\n-\n-def get_version(mira_binary):\n- """Run MIRA to find its version number"""\n- # At the commend line I would use: mira -v | head -n 1\n- # however there is some pipe error when doing that here.\n- cmd = [mira_binary, "-v"]\n- try:\n- child = subprocess.Popen(cmd,\n- stdout=subprocess.PIPE,\n- stderr=subprocess.STDOUT)\n- except Exception, err:\n- sys.stderr.write("Error invoking command:\\n%s\\n\\n%s\\n" % (" ".join(cmd), err))\n- sys.exit(1)\n- ver, tmp = child.communicate()\n- del child\n- return ver.split("\\n", 1)[0].strip()\n-\n-#Parse Command Line\n-usage = """Galaxy MIRA4 wrapper script v%s - use as follows:\n-\n-$ python mira4.py ...\n-\n-This will run the MIRA binary and collect its output files as directed.\n-""" % WRAPPER_VER\n-parser = OptionParser(usage=usage)\n-parser.add_option("-m", "--manifest", dest="manifest",\n- default=None, metavar="FILE",\n- help="MIRA manifest filename")\n-parser.add_option("--maf", dest="maf",\n- default="-", metavar="FILE",\n- help="MIRA MAF output filename")\n-parser.add_option("--bam", dest="bam",\n- default="-", metavar="FILE",\n- help="Unpadded BAM output filename")\n-parser.add_option("--fasta", dest="fasta",\n- default="-", metavar="FILE",\n- help="Unpadded FASTA output filename")\n-parser.add_option("--log", dest="log",\n- default="-", metavar="FILE",\n- help="MIRA logging output filename")\n-parser.add_option("-v", "--version", dest="version",\n- default=False, action="store_true",\n- help="Show version and quit")\n-options, args = parser.parse_args()\n-manifest = options.manifest\n-out_maf = options.maf\n-out_bam = options.bam\n-out_fasta = options.fasta\n-out_log = options.log\n-\n-try:\n- mira_path = os.environ["MIRA4"]\n-except KeyError:\n- sys_exit("Environment variable $MIRA4 not set")\n-mira_binary = os.path.join(mira_path, "mira")\n-if not os.path.isfile(mira_binary):\n- sys_exit("Missing mira under $MIRA4, %r\\nFolder contained: %s"\n- % (mira_binary, ", ".join(os.listdir(mira_path))))\n-mira_convert = os.path.join(mira_path, "miraconvert")\n-if not os.path.isfile(mira_convert):\n- sys_exit("Missing miraconvert under $MIRA4, %r\\nFolder contained: %s"\n- % (mira_convert, ", ".join(os.listdir(mira_path))))\n-\n-mira_ver = get_version(mira_binary)\n-if not mira_ver.strip().startswith("4.0"):\n- sys_exit("This wrapper is for MIRA V4.0, not:\\n%s\\n%s" % (mira_ver, mira_binary))\n-mira_convert_ver = get_version(mira_convert)\n-if not mira_convert_ver.strip().startswith("4.0"):\n- sys_exit("This wrapper is for MIRA V4.0, not:\\n%s\\n%s" % (mira_ver, mira_convert))\n-if options.version:\n- print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)\n- if mira_ver != mira_convert_ver:\n- print "WARNING: miraconvert %s" % mira_convert_ver\n- sys.exit(0)\n-\n-if not manifest:\n- sys_exit("Manifest is required")\n-elif not os.path.isfile(manifest):\n- sys_exit("Missing input MIRA manifest file: %r" % manifest)\n-\n-\n-try:\n- threads = int(os.environ.get("GALAXY_SLOTS", "1"))\n-except ValueError:\n- threads = 1\n-assert 1 <= threads, threads\n-\n-\n-def override_temp(manifest):\n- """Override ``-DI:trt=/tmp`` in manifest with environment variable.\n-\n- Currently MIRA 4 does not allow envronment variables like ``$TMP``\n- inside the manifest, which'..b't_maf, ref_fasta, out_bam, handle)\n- else:\n- #Not collecting the MAF file, use original location \n- msg = make_bam(mira_convert, old_maf, ref_fasta, out_bam, handle)\n- if msg:\n- sys_exit(msg)\n-\n-def clean_up(temp, name):\n- folder = "%s/%s_assembly" % (temp, name)\n- if os.path.isdir(folder):\n- shutil.rmtree(folder)\n-\n-#TODO - Run MIRA in /tmp or a configurable directory?\n-#Currently Galaxy puts us somewhere safe like:\n-#/opt/galaxy-dist/database/job_working_directory/846/\n-temp = "."\n-\n-name = "MIRA"\n-\n-override_temp(manifest)\n-\n-start_time = time.time()\n-cmd_list = [mira_binary, "-t", str(threads), manifest]\n-cmd = " ".join(cmd_list)\n-\n-assert os.path.isdir(temp)\n-d = "%s_assembly" % name\n-#This can fail on my development machine if stale folders exist\n-#under Galaxy\'s .../database/job_working_directory/ tree:\n-assert not os.path.isdir(d), "Path %r already exists:\\n%s" % (d, os.path.abspath(d))\n-try:\n- #Check path access\n- os.mkdir(d)\n-except Exception, err:\n- log_manifest(manifest)\n- sys.stderr.write("Error making directory %s\\n%s" % (d, err))\n- sys.exit(1)\n-\n-#print os.path.abspath(".")\n-#print cmd\n-\n-if out_log and out_log != "-":\n- handle = open(out_log, "w")\n-else:\n- handle = open(os.devnull, "w")\n-handle.write("======================== MIRA manifest (instructions) ========================\\n")\n-m = open(manifest, "rU")\n-for line in m:\n- handle.write(line)\n-m.close()\n-del m\n-handle.write("\\n")\n-handle.write("============================ Starting MIRA now ===============================\\n")\n-handle.flush()\n-try:\n- #Run MIRA\n- child = subprocess.Popen(cmd_list,\n- stdout=handle,\n- stderr=subprocess.STDOUT)\n-except Exception, err:\n- log_manifest(manifest)\n- sys.stderr.write("Error invoking command:\\n%s\\n\\n%s\\n" % (cmd, err))\n- #TODO - call clean up?\n- handle.write("Error invoking command:\\n%s\\n\\n%s\\n" % (cmd, err))\n- handle.close()\n- sys.exit(1)\n-#Use .communicate as can get deadlocks with .wait(),\n-stdout, stderr = child.communicate()\n-assert not stdout and not stderr #Should be empty as sent to handle\n-run_time = time.time() - start_time\n-return_code = child.returncode\n-handle.write("\\n")\n-handle.write("============================ MIRA has finished ===============================\\n")\n-handle.write("MIRA took %0.2f hours\\n" % (run_time / 3600.0))\n-if return_code:\n- print "MIRA took %0.2f hours" % (run_time / 3600.0)\n- handle.write("Return error code %i from command:\\n" % return_code)\n- handle.write(cmd + "\\n")\n- handle.close()\n- clean_up(temp, name)\n- log_manifest(manifest)\n- sys_exit("Return error code %i from command:\\n%s" % (return_code, cmd),\n- return_code)\n-handle.flush()\n-\n-if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):\n- handle.write("\\n")\n- handle.write("====================== Extract Large Contigs failed ==========================\\n")\n- e = open("MIRA_assembly/MIRA_d_results/ec.log", "rU")\n- for line in e:\n- handle.write(line)\n- e.close()\n- handle.write("============================ (end of ec.log) =================================\\n")\n- handle.flush()\n-\n-#print "Collecting output..."\n-start_time = time.time()\n-collect_output(temp, name, handle)\n-collect_time = time.time() - start_time\n-handle.write("MIRA took %0.2f hours; collecting output %0.2f minutes\\n" % (run_time / 3600.0, collect_time / 60.0))\n-print("MIRA took %0.2f hours; collecting output %0.2f minutes\\n" % (run_time / 3600.0, collect_time / 60.0))\n-\n-if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):\n- #Treat as an error, but doing this AFTER collect_output\n- sys.stderr.write("Extract Large Contigs failed\\n")\n- handle.write("Extract Large Contigs failed\\n")\n- handle.close()\n- sys.exit(1)\n-\n-#print "Cleaning up..."\n-clean_up(temp, name)\n-\n-handle.write("\\nDone\\n")\n-handle.close()\n-print("Done")\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/mira4_bait.py --- a/tools/mira4_assembler/mira4_bait.py Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
@@ -1,114 +0,0 @@ -#!/usr/bin/env python -"""A simple wrapper script to call MIRA4's mirabait and collect its output. -""" -import os -import sys -import subprocess -import shutil -import time - -WRAPPER_VER = "0.0.5" #Keep in sync with the XML file - -def sys_exit(msg, err=1): - sys.stderr.write(msg+"\n") - sys.exit(err) - - -def get_version(mira_binary): - """Run MIRA to find its version number""" - # At the commend line I would use: mira -v | head -n 1 - # however there is some pipe error when doing that here. - cmd = [mira_binary, "-v"] - try: - child = subprocess.Popen(cmd, - stdout=subprocess.PIPE, - stderr=subprocess.STDOUT) - except Exception, err: - sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err)) - sys.exit(1) - ver, tmp = child.communicate() - del child - #Workaround for -v not working in mirabait 4.0RC4 - if "invalid option" in ver.split("\n", 1)[0]: - for line in ver.split("\n", 1): - if " version " in line: - line = line.split() - return line[line.index("version")+1].rstrip(")") - sys_exit("Could not determine MIRA version:\n%s" % ver) - return ver.split("\n", 1)[0] - -try: - mira_path = os.environ["MIRA4"] -except KeyError: - sys_exit("Environment variable $MIRA4 not set") -mira_binary = os.path.join(mira_path, "mirabait") -if not os.path.isfile(mira_binary): - sys_exit("Missing mirabait under $MIRA4, %r\nFolder contained: %s" - % (mira_binary, ", ".join(os.listdir(mira_path)))) -mira_ver = get_version(mira_binary) -if not mira_ver.strip().startswith("4.0"): - sys_exit("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver) -if "-v" in sys.argv or "--version" in sys.argv: - print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER) - sys.exit(0) - - -format, output_choice, strand_choice, kmer_length, min_occurance, bait_file, in_file, out_file = sys.argv[1:] - -if format.startswith("fastq"): - format = "fastq" -elif format == "mira": - format = "maf" -elif format != "fasta": - sys_exit("Was not expected format %r" % format) - -assert out_file.endswith(".dat") -out_file_stem = out_file[:-4] - -cmd_list = [mira_binary, "-f", format, "-t", format, - "-k", kmer_length, "-n", min_occurance, - bait_file, in_file, out_file_stem] -if output_choice == "pos": - pass -elif output_choice == "neg": - #Invert the selection... - cmd_list.insert(1, "-i") -else: - sys_exit("Output choice should be 'pos' or 'neg', not %r" % output_choice) -if strand_choice == "both": - pass -elif strand_choice == "fwd": - #Ingore reverse strand... - cmd_list.insert(1, "-r") -else: - sys_exit("Strand choice should be 'both' or 'fwd', not %r" % strand_choice) - -cmd = " ".join(cmd_list) -#print cmd -start_time = time.time() -try: - #Run MIRA - child = subprocess.Popen(cmd_list, - stdout=subprocess.PIPE, - stderr=subprocess.STDOUT) -except Exception, err: - sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) - sys.exit(1) -#Use .communicate as can get deadlocks with .wait(), -stdout, stderr = child.communicate() -assert stderr is None # Due to way we ran with subprocess -run_time = time.time() - start_time -return_code = child.returncode -print "mirabait took %0.2f minutes" % (run_time / 60.0) - -if return_code: - sys.stderr.write(stdout) - sys_exit("Return error code %i from command:\n%s" % (return_code, cmd), - return_code) - -#Capture output -out_tmp = out_file_stem + "." + format -if not os.path.isfile(out_tmp): - sys.stderr.write(stdout) - sys_exit("Missing output file from mirabait: %s" % out_tmp) -shutil.move(out_tmp, out_file) |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/mira4_convert.py --- a/tools/mira4_assembler/mira4_convert.py Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
b'@@ -1,226 +0,0 @@\n-#!/usr/bin/env python\n-"""A simple wrapper script to call MIRA and collect its output.\n-\n-This focuses on the miraconvert binary.\n-"""\n-import os\n-import sys\n-import subprocess\n-import shutil\n-import time\n-import tempfile\n-from optparse import OptionParser\n-try:\n- from io import BytesIO\n-except ImportError:\n- #Should we worry about Python 2.5 or older?\n- from StringIO import StringIO as BytesIO\n-\n-#Do we need any PYTHONPATH magic?\n-from mira4_make_bam import depad\n-\n-WRAPPER_VER = "0.0.7" # Keep in sync with the XML file\n-\n-def sys_exit(msg, err=1):\n- sys.stderr.write(msg+"\\n")\n- sys.exit(err)\n-\n-def run(cmd):\n- #Avoid using shell=True when we call subprocess to ensure if the Python\n- #script is killed, so too is the child process.\n- try:\n- child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n- except Exception, err:\n- sys_exit("Error invoking command:\\n%s\\n\\n%s\\n" % (" ".join(cmd), err))\n- #Use .communicate as can get deadlocks with .wait(),\n- stdout, stderr = child.communicate()\n- return_code = child.returncode\n- if return_code:\n- cmd_str = " ".join(cmd) # doesn\'t quote spaces etc\n- if stderr and stdout:\n- sys_exit("Return code %i from command:\\n%s\\n\\n%s\\n\\n%s" % (return_code, cmd_str, stdout, stderr))\n- else:\n- sys_exit("Return code %i from command:\\n%s\\n%s" % (return_code, cmd_str, stderr))\n-\n-def get_version(mira_binary):\n- """Run MIRA to find its version number"""\n- # At the commend line I would use: mira -v | head -n 1\n- # however there is some pipe error when doing that here.\n- cmd = [mira_binary, "-v"]\n- try:\n- child = subprocess.Popen(cmd,\n- stdout=subprocess.PIPE,\n- stderr=subprocess.STDOUT)\n- except Exception, err:\n- sys.stderr.write("Error invoking command:\\n%s\\n\\n%s\\n" % (" ".join(cmd), err))\n- sys.exit(1)\n- ver, tmp = child.communicate()\n- del child\n- return ver.split("\\n", 1)[0].strip()\n-\n-#Parse Command Line\n-usage = """Galaxy MIRA4 wrapper script v%s - use as follows:\n-\n-$ python mira4_convert.py ...\n-\n-This will run the MIRA miraconvert binary and collect its output files as directed.\n-""" % WRAPPER_VER\n-parser = OptionParser(usage=usage)\n-parser.add_option("--input", dest="input",\n- default=None, metavar="FILE",\n- help="MIRA input filename")\n-parser.add_option("-x", "--min_length", dest="min_length",\n- default="0",\n- help="Minimum contig length")\n-parser.add_option("-y", "--min_cover", dest="min_cover",\n- default="0",\n- help="Minimum average contig coverage")\n-parser.add_option("-z", "--min_reads", dest="min_reads",\n- default="0",\n- help="Minimum reads per contig")\n-parser.add_option("--maf", dest="maf",\n- default="", metavar="FILE",\n- help="MIRA MAF output filename")\n-parser.add_option("--ace", dest="ace",\n- default="", metavar="FILE",\n- help="ACE output filename")\n-parser.add_option("--bam", dest="bam",\n- default="", metavar="FILE",\n- help="Unpadded BAM output filename")\n-parser.add_option("--fasta", dest="fasta",\n- default="", metavar="FILE",\n- help="Unpadded FASTA output filename")\n-parser.add_option("--cstats", dest="cstats",\n- default="", metavar="FILE",\n- help="Contig statistics filename")\n-parser.add_option("-v", "--version", dest="version",\n- default=False, action="store_true",\n- help="Show version and quit")\n-options, args = parser.parse_args()\n-if args:\n- sys_exit("Expected options (e.g. --input example.maf), not arguments")\n-\n-input_maf = options.input\n-out_maf = options.maf\n-out_bam = options.bam\n-out_fasta '..b's.fasta\n-out_ace = options.ace\n-out_cstats = options.cstats\n-\n-try:\n- mira_path = os.environ["MIRA4"]\n-except KeyError:\n- sys_exit("Environment variable $MIRA4 not set")\n-mira_convert = os.path.join(mira_path, "miraconvert")\n-if not os.path.isfile(mira_convert):\n- sys_exit("Missing miraconvert under $MIRA4, %r\\nFolder contained: %s"\n- % (mira_convert, ", ".join(os.listdir(mira_path))))\n-\n-mira_convert_ver = get_version(mira_convert)\n-if not mira_convert_ver.strip().startswith("4.0"):\n- sys_exit("This wrapper is for MIRA V4.0, not:\\n%s\\n%s" % (mira_convert_ver, mira_convert))\n-if options.version:\n- print("%s, MIRA wrapper version %s" % (mira_convert_ver, WRAPPER_VER))\n- sys.exit(0)\n-\n-if not input_maf:\n- sys_exit("Input MIRA file is required")\n-elif not os.path.isfile(input_maf):\n- sys_exit("Missing input MIRA file: %r" % input_maf)\n-\n-if not (out_maf or out_bam or out_fasta or out_ace or out_cstats):\n- sys_exit("No output requested")\n-\n-\n-def check_min_int(value, name):\n- try:\n- i = int(value)\n- except:\n- sys_exit("Bad %s setting, %r" % (name, value))\n- if i < 0:\n- sys_exit("Negative %s setting, %r" % (name, value))\n- return i\n-\n-min_length = check_min_int(options.min_length, "minimum length")\n-min_cover = check_min_int(options.min_cover, "minimum cover")\n-min_reads = check_min_int(options.min_reads, "minimum reads")\n-\n-#TODO - Run MIRA in /tmp or a configurable directory?\n-#Currently Galaxy puts us somewhere safe like:\n-#/opt/galaxy-dist/database/job_working_directory/846/\n-temp = "."\n-\n-\n-cmd_list = [mira_convert]\n-if min_length:\n- cmd_list.extend(["-x", str(min_length)])\n-if min_cover:\n- cmd_list.extend(["-y", str(min_cover)])\n-if min_reads:\n- cmd_list.extend(["-z", str(min_reads)])\n-cmd_list.extend(["-f", "maf", input_maf, os.path.join(temp, "converted")])\n-if out_maf:\n- cmd_list.append("maf")\n-if out_bam:\n- cmd_list.append("samnbb")\n- if not out_fasta:\n- #Need this for samtools depad\n- out_fasta = os.path.join(temp, "depadded.fasta")\n-if out_fasta:\n- cmd_list.append("fasta")\n-if out_ace:\n- cmd_list.append("ace")\n-if out_cstats:\n- cmd_list.append("cstats")\n-run(cmd_list)\n-\n-def collect(old, new):\n- if not os.path.isfile(old):\n- sys_exit("Missing expected output file %s" % old)\n- shutil.move(old, new)\n-\n-if out_maf:\n- collect(os.path.join(temp, "converted.maf"), out_maf)\n-if out_fasta:\n- #Can we look at the MAF file to see if there are multiple strains?\n- old = os.path.join(temp, "converted_AllStrains.unpadded.fasta")\n- if os.path.isfile(old):\n- collect(old, out_fasta)\n- else:\n- #Might the output be filtered down to zero contigs?\n- old = os.path.join(temp, "converted.fasta")\n- if not os.path.isfile(old):\n- sys_exit("Missing expected output FASTA file")\n- elif os.path.getsize(old) == 0:\n- print("Warning - no contigs (harsh filters?)")\n- collect(old, out_fasta)\n- else:\n- sys_exit("Missing expected output FASTA file (only generic file present)")\n-if out_ace:\n- collect(os.path.join(temp, "converted.maf"), out_ace)\n-if out_cstats:\n- collect(os.path.join(temp, "converted_info_contigstats.txt"), out_cstats)\n-\n-if out_bam:\n- assert os.path.isfile(out_fasta)\n- old = os.path.join(temp, "converted.samnbb")\n- if not os.path.isfile(old):\n- old = os.path.join(temp, "converted.sam")\n- if not os.path.isfile(old):\n- sys_exit("Missing expected intermediate file %s" % old)\n- h = BytesIO()\n- msg = depad(out_fasta, old, out_bam, h)\n- if msg:\n- print(msg)\n- print(h.getvalue())\n- h.close()\n- sys.exit(1)\n- h.close()\n- if out_fasta == os.path.join(temp, "depadded.fasta"):\n- #Not asked for by Galaxy, no longer needed\n- os.remove(out_fasta)\n-\n-if min_length or min_cover or min_reads:\n- print("Filtered.")\n-else:\n- print("Converted.")\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/mira4_de_novo.xml --- a/tools/mira4_assembler/mira4_de_novo.xml Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,275 +0,0 @@\n-<tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.7">\n- <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>\n- <requirements>\n- <requirement type="binary">mira</requirement>\n- <requirement type="binary">miraconvert</requirement>\n- <requirement type="package" version="4.0.2">MIRA</requirement>\n- <requirement type="binary">samtools</requirement>\n- <requirement type="package" version="0.1.19">samtools</requirement>\n- </requirements>\n- <code file="mira4_validator.py" />\n- <stdio>\n- <!-- Assume anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- </stdio>\n- <version_command interpreter="python">mira4.py --version</version_command>\n- <command interpreter="python">mira4.py\n---manifest "$manifest"\n-#if str($maf_wanted)=="true":\n---maf "$out_maf"\n-#end if\n-#if str($bam_wanted)=="true":\n---bam "$out_bam"\n-#end if\n---fasta "$out_fasta"\n---log "$out_log"\n- </command>\n- <configfiles>\n- <configfile name="manifest">\n-project = MIRA\n-job = denovo,${job_type},${job_quality}\n-parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no\n-## -GE:not is short for -GENERAL:number_of_threads and using one (1)\n-## can be useful for repeatability of assemblies and bug hunting.\n-## This is overriden by the command line -t switch which is easier\n-## to set from within Galaxy.\n-##\n-## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength\n-## and without this MIRA aborts with read names over 40 characters\n-## due to limitations of some downstream tools.\n-##\n-## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should\n-## point to a local hard drive (not something like NFS on network).\n-## We replace /tmp with an environment variable via mira4.py\n-##\n-## -OUT:orc=no is short for -OUTPUT:output_result_caf=no \n-## which turns off an output file we don\'t want anyway.\n-\n-#for $rg in $read_group\n-\n-##This bar goes into the manifest as a comment line\n-#------------------------------------------------------------------------------\n-\n-readgroup\n-technology = ${rg.technology}\n-##Record the segment placement (if any)\n-#if str($rg.segments.type) == "paired"\n-segment_placement = ${rg.segments.placement}\n-segment_naming = ${rg.segments.naming}\n-#if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != ""\n-##If our min/max validation failed I trust MIRA to give an error message...\n-template_size = $rg.segments.min_size $rg.segments.max_size\n-#end if\n-#end if\n-##if str($rg.segments.type) == "none"\n-##MIRA4 manual says use segment_placement = unknown or ? for unpaired data\n-##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:\n-##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown\n-##segment_placement = ?\n-##end if\n-##MIRA will accept multiple filenames on one data line, or multiple data lines\n-#for $f in $rg.filenames\n-##Must now map Galaxy datatypes to MIRA file types...\n-#if $f.ext.startswith("fastq")\n-##MIRA doesn\'t like fastqsanger etc, just plain old fastq:\n-data = fastq::$f\n-#elif $f.ext == "mira"\n-##We\'re calling *.maf the "mira" format in Galaxy (name space collision)\n-data = maf::$f\n-#else\n-##MIRA is happy with fasta as name,\n-data = ${f.ext}::$f\n-#end if\n-#end for\n-#end for\n- </configfile>\n- </configfiles>\n- <inputs>\n- <param name="job_type" type="select" label="Assembly type">\n- <option value="genome">Genome</option>\n- <option value="est">EST (transcriptome)</option>\n- </param>\n- <param name="job_quality" type="select" label="Assembly quality grade">\n- <option value="accurate">Accurate</option>\n- <option value="draft">Draft</option>\n- </param>\n- <repeat name="read_group" title="Read Group" min="1">\n- <param name="technology" type="select" label="Read technology">\n- '..b'\n- Note we\'re using just one repeat group,\n- but two parameters within the repeat (filename, no pairing)\n- -->\n- <test>\n- <param name="job_type" value="genome" />\n- <param name="job_quality" value="accurate" />\n- <param name="type" value="none" />\n- <param name="filenames" value="ecoli.fastq" ftype="fastqsanger" />\n- <param name="maf_wanted" value="false"/>\n- <param name="bam_wanted" value="false"/>\n- <output name="out_fasta" file="ecoli.mira4_de_novo.fasta" ftype="fasta" />\n- <output name="out_log" file="empty_file.dat" compare="contains" />\n- </test>\n- </tests>\n- <help>\n-\n-**What it does**\n-\n-Runs MIRA v4.0 in de novo mode, collects the output, generates a sorted BAM\n-file, and then throws away all the temporary files.\n-\n-MIRA is an open source assembly tool capable of handling sequence data from\n-a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent\n-and also PacBio).\n-\n-It is particularly suited to small genomes such as bacteria.\n-\n-\n-**Notes on paired reads**\n-\n-.. class:: warningmark\n-\n-MIRA uses read naming conventions to identify paired read partners\n-(and does not care about their order in the input files). In most cases,\n-the Solexa/Illumina setting is fine. For Sanger capillary sequencing,\n-you may need to rename your reads to match one of the standard conventions\n-supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings\n-depend on how the FASTQ file was produced:\n-\n-* If using Roche\'s ``sffinfo`` or older versions of ``sff_extract``\n- to convert SFF files to FASTQ, your reads will probably have the\n- ``---> <---`` orientation and use the ``.f`` and ``.r``\n- suffixes (FR naming).\n-\n-* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``\n- suffixes are used (Solexa/Illumina style naming) and the original\n- ``2---> 1--->`` orientation is preserved.\n-\n-The reason for this is the raw data for Roche 454 and Ion Torrent paired-end\n-libraries sequences a circularised fragment such that the raw data begins\n-with the end of the fragment, a linker, then the start of the fragment.\n-This means both the start and end are sequenced from the same strand, and\n-have the orientation ``2---> 1--->``. However, in order to use the data\n-with traditional tools expecting Sanger capillary style ``---> <---``\n-orientation it was common to reverse complement one of the pair to mimic this.\n-\n-\n-**Citation**\n-\n-If you use this Galaxy tool in work leading to a scientific publication please\n-cite the following papers:\n-\n-Peter J.A. Cock, Bj\xc3\xb6rn A. Gr\xc3\xbcning, Konrad Paszkiewicz and Leighton Pritchard (2013).\n-Galaxy tools and workflows for sequence analysis with applications\n-in molecular plant pathology. PeerJ 1:e167\n-http://dx.doi.org/10.7717/peerj.167\n-\n-Bastien Chevreux, Thomas Wetter and S\xc3\xa1ndor Suhai (1999).\n-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.\n-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.\n-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler\n- </help>\n- <citations>\n- <citation type="doi">10.7717/peerj.167</citation>\n- <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,\n- author = {B. Chevreux and T. Wetter and S. Suhai},\n- year = {1999},\n- title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},\n- journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}\n- volume = {99},\n- pages = {45-56},\n- url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}\n- }</citation>\n- </citations>\n-</tool>\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/mira4_make_bam.py --- a/tools/mira4_assembler/mira4_make_bam.py Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
@@ -1,92 +0,0 @@ -#!/usr/bin/env python -"""Wrapper script using miraconvert & samtools to get BAM from MIRA. -""" -import os -import sys -import shutil -import subprocess -import tempfile - -def sys_exit(msg, err=1): - sys.stderr.write(msg+"\n") - sys.exit(err) - -def run(cmd, log_handle): - try: - child = subprocess.Popen(cmd, shell=True, - stdout=subprocess.PIPE, - stderr=subprocess.STDOUT) - except Exception, err: - sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) - #TODO - call clean up? - log_handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err)) - sys.exit(1) - #Use .communicate as can get deadlocks with .wait(), - stdout, stderr = child.communicate() - assert not stderr #Should be empty as sent to stdout - if len(stdout) > 10000: - #miraconvert can be very verbose (is holding stdout in RAM a problem?) - stdout = stdout.split("\n") - stdout = stdout[:10] + ["...", "<snip>", "..."] + stdout[-10:] - stdout = "\n".join(stdout) - log_handle.write(stdout) - return child.returncode - -def depad(fasta_file, sam_file, bam_file, log_handle): - log_handle.write("\n================= Converting MIRA assembly from SAM to BAM ===================\n") - #Also doing SAM to (uncompressed) BAM during depad - bam_stem = bam_file + ".tmp" # Have write permissions and want final file in this folder - cmd = 'samtools depad -S -u -T "%s" "%s" | samtools sort - "%s"' % (fasta_file, sam_file, bam_stem) - return_code = run(cmd, log_handle) - if return_code: - return "Error %i from command:\n%s" % (return_code, cmd) - if not os.path.isfile(bam_stem + ".bam"): - return "samtools depad or sort failed to produce BAM file" - - log_handle.write("\n====================== Indexing MIRA assembly BAM file =======================\n") - cmd = 'samtools index "%s.bam"' % bam_stem - return_code = run(cmd, log_handle) - if return_code: - return "Error %i from command:\n%s" % (return_code, cmd) - if not os.path.isfile(bam_stem + ".bam.bai"): - return "samtools indexing of BAM file failed to produce BAI file" - - shutil.move(bam_stem + ".bam", bam_file) - os.remove(bam_stem + ".bam.bai") #Let Galaxy handle that... - - -def make_bam(mira_convert, maf_file, fasta_file, bam_file, log_handle): - if not os.path.isfile(mira_convert): - return "Missing binary %r" % mira_convert - if not os.path.isfile(maf_file): - return "Missing input MIRA file: %r" % maf_file - if not os.path.isfile(fasta_file): - return "Missing padded FASTA file: %r" % fasta_file - - log_handle.write("\n====================== Converting MIRA assembly to SAM =======================\n") - tmp_dir = tempfile.mkdtemp() - sam_file = os.path.join(tmp_dir, "x.sam") - - # Note add nbb to the template name, possible MIRA 4.0 RC4 bug - cmd = '"%s" -f maf -t samnbb "%s" "%snbb"' % (mira_convert, maf_file, sam_file) - return_code = run(cmd, log_handle) - if return_code: - return "Error %i from command:\n%s" % (return_code, cmd) - if not os.path.isfile(sam_file): - return "Conversion from MIRA to SAM failed" - - #Also doing SAM to (uncompressed) BAM during depad - msg = depad(fasta_file, sam_file, bam_file, log_handle) - if msg: - return msg - - os.remove(sam_file) - os.rmdir(tmp_dir) - - return None #Good :) - -if __name__ == "__main__": - mira_convert, maf_file, fasta_file, bam_file = sys.argv[1:] - msg = make_bam(mira_convert, maf_file, fasta_file, bam_file, sys.stdout) - if msg: - sys_exit(msg) |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/mira4_mapping.xml --- a/tools/mira4_assembler/mira4_mapping.xml Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
b'@@ -1,279 +0,0 @@\n-<tool id="mira_4_0_mapping" name="MIRA v4.0 mapping" version="0.0.7">\n- <description>Maps Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>\n- <requirements>\n- <requirement type="binary">mira</requirement>\n- <requirement type="binary">miraconvert</requirement>\n- <requirement type="package" version="4.0.2">MIRA</requirement>\n- <requirement type="binary">samtools</requirement>\n- <requirement type="package" version="0.1.19">samtools</requirement>\n- </requirements>\n- <stdio>\n- <!-- Assume anything other than zero is an error -->\n- <exit_code range="1:" />\n- <exit_code range=":-1" />\n- </stdio>\n- <version_command interpreter="python">mira4.py --version</version_command>\n- <command interpreter="python">mira4.py\n---manifest "$manifest"\n-#if str($maf_wanted) == "true":\n---maf "$out_maf"\n-#end if\n-#if str($bam_wanted) == "true":\n---bam "$out_bam"\n-#end if\n---fasta "$out_fasta"\n---log "$out_log"\n- </command>\n- <configfiles>\n- <configfile name="manifest">\n-project = MIRA\n-job = mapping,${job_type},${job_quality}\n-parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no\n-## -GE:not is short for -GENERAL:number_of_threads and using one (1)\n-## can be useful for repeatability of assemblies and bug hunting.\n-## This is overriden by the command line -t switch which is easier\n-## to set from within Galaxy.\n-##\n-## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength\n-## and without this MIRA aborts with read names over 40 characters\n-## due to limitations of some downstream tools.\n-##\n-## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should\n-## point to a local hard drive (not something like NFS on network).\n-## We replace /tmp with an environment variable via mira4.py\n-##\n-## -OUT:orc=no is short for -OUTPUT:output_result_caf=no\n-## which turns off an output file we don\'t want anyway.\n-\n-##This bar goes into the manifest as a comment line\n-#------------------------------------------------------------------------------\n-\n-readgroup\n-is_reference\n-#if str($strain_setup)=="same"\n-strain = StrainX\n-#end if\n-#for $f in $references\n-##Must now map Galaxy datatypes to MIRA file types...\n-#if $f.ext.startswith("fastq")\n-##MIRA doesn\'t like fastqsanger etc, just plain old fastq:\n-data = fastq::$f\n-#elif $f.ext == "mira"\n-##We\'re calling *.maf the "mira" format in Galaxy (name space collision)\n-data = maf::$f\n-#elif $f.ext == "fasta"\n-##We\'re calling MIRA with the file type as "fna" as otherwise it wants quals\n-data = fna::$f\n-#else\n-##Currently don\'t expect anything else...\n-data = ${f.ext}::$f\n-#end if\n-#end for\n-#for $rg in $read_group\n-\n-##This bar goes into the manifest as a comment line\n-#------------------------------------------------------------------------------\n-\n-readgroup\n-technology = ${rg.technology}\n-#if str($strain_setup)=="same"\n-##This is perhaps redundant as MIRA defaults to StrainX for the reads:\n-strain = StrainX\n-#end if\n-##Record the segment placement (if any)\n-#if str($rg.segments.type) == "paired"\n-segment_placement = ${rg.segments.placement}\n-segment_naming = ${rg.segments.naming}\n-#end if\n-##if str($rg.segments.type) == "none"\n-##MIRA4 manual says use segment_placement = unknown or ? for unpaired data\n-##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:\n-##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown\n-##segment_placement = ?\n-##end if\n-##MIRA will accept multiple filenames on one data line, or multiple data lines\n-#for $f in $rg.filenames\n-##Must now map Galaxy datatypes to MIRA file types...\n-#if $f.ext.startswith("fastq")\n-##MIRA doesn\'t like fastqsanger etc, just plain old fastq:\n-data = fastq::$f\n-#elif $f.ext == "mira"\n-##We\'re calling *.maf the "mira" format in Galaxy (name space collision)\n-data = maf::$f\n-#else\n-##Currently don\'t expect anything else...\n-data = ${f.ext}::$f\n-#end if\n-#end for\n-#end for\n- </co'..b'</test>\n- <test>\n- <param name="job_type" value="genome" />\n- <param name="job_quality" value="accurate" />\n- <param name="references" value="tvc_contigs.fasta" ftype="fasta" />\n- <param name="strain_setup" value="same" />\n- <param name="type" value="none" />\n- <param name="filenames" value="tvc_mini.fastq" ftype="fastqsanger" />\n- <param name="maf_wanted" value="false"/>\n- <param name="bam_wanted" value="false"/>\n- <output name="out_fasta" file="tvc_map_same_strain.fasta" ftype="fasta" />\n- <output name="out_log" file="empty_file.dat" compare="contains" />\n- </test>\n- </tests>\n- <help>\n-\n-**What it does**\n-\n-Runs MIRA v4.0 in mapping mode, collects the output, generates a sorted BAM\n-file, and throws away all the temporary files.\n-\n-MIRA is an open source assembly tool capable of handling sequence data from\n-a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent\n-and also PacBio).\n-\n-It is particularly suited to small genomes such as bacteria.\n-\n-\n-**Notes on paired reads**\n-\n-.. class:: warningmark\n-\n-MIRA uses read naming conventions to identify paired read partners\n-(and does not care about their order in the input files). In most cases,\n-the Solexa/Illumina setting is fine. For Sanger capillary sequencing,\n-you may need to rename your reads to match one of the standard conventions\n-supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings\n-depend on how the FASTQ file was produced:\n-\n-* If using Roche\'s ``sffinfo`` or older versions of ``sff_extract``\n- to convert SFF files to FASTQ, your reads will probably have the\n- ``---> <---`` orientation and use the ``.f`` and ``.r``\n- suffixes (FR naming).\n-\n-* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``\n- suffixes are used (Solexa/Illumina style naming) and the original\n- ``2---> 1--->`` orientation is preserved.\n-\n-The reason for this is the raw data for Roche 454 and Ion Torrent paired-end\n-libraries sequences a circularised fragment such that the raw data begins\n-with the end of the fragment, a linker, then the start of the fragment.\n-This means both the start and end are sequenced from the same strand, and\n-have the orientation ``2---> 1--->``. However, in order to use the data\n-with traditional tools expecting Sanger capillary style ``---> <---``\n-orientation it was common to reverse complement one of the pair to mimic this.\n-\n-\n-**Citation**\n-\n-If you use this Galaxy tool in work leading to a scientific publication please\n-cite the following papers:\n-\n-Peter J.A. Cock, Bj\xc3\xb6rn A. Gr\xc3\xbcning, Konrad Paszkiewicz and Leighton Pritchard (2013).\n-Galaxy tools and workflows for sequence analysis with applications\n-in molecular plant pathology. PeerJ 1:e167\n-http://dx.doi.org/10.7717/peerj.167\n-\n-Bastien Chevreux, Thomas Wetter and S\xc3\xa1ndor Suhai (1999).\n-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.\n-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.\n-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html\n-\n-This wrapper is available to install into other Galaxy Instances via the Galaxy\n-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler\n- </help>\n- <citations>\n- <citation type="doi">10.7717/peerj.167</citation>\n- <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,\n- author = {B. Chevreux and T. Wetter and S. Suhai},\n- year = {1999},\n- title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},\n- journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}\n- volume = {99},\n- pages = {45-56},\n- url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}\n- }</citation>\n- </citations>\n-</tool>\n' |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/mira4_validator.py --- a/tools/mira4_assembler/mira4_validator.py Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
[ |
@@ -1,64 +0,0 @@ -#Called from the Galaxy Tool XML file -#import sys - -def validate_input(trans, error_map, param_values, page_param_map): - """Validates the min_size/max_size user input, before execution.""" - err_list = [] - for read_group in param_values["read_group"]: - err = dict() - segments = read_group["segments"] - if str(segments["type"]) != "paired": - err_list.append(dict()) - continue - - min_size = str(segments["min_size"]).strip() - max_size = str(segments["max_size"]).strip() - #sys.stderr.write("DEBUG min_size=%r, max_size=%r\n" % (min_size, max_size)) - - #Somehow Galaxy seems to turn an empty field into string "None"... - if min_size=="None": - min_size = "" - if max_size=="None": - max_size = "" - - if min_size=="" and max_size=="": - #Both missing is good - pass - elif min_size=="": - err["min_size"] = "Minimum size required if maximum size given" - elif max_size=="": - err["max_size"] = "Maximum size required if minimum size given" - - if min_size: - try: - min_size_int = int(min_size) - if min_size_int < 0: - err["min_size"] = "Minumum size must not be negative (%i)" % min_size_int - min_size = None # Avoid doing comparison below - except ValueError: - err["min_size"] = "Minimum size is not an integer (%s)" % min_size - min_size = None # Avoid doing comparison below - - if max_size: - try: - max_size_int = int(max_size) - if max_size_int< 0: - err["max_size"] = "Maximum size must not be negative (%i)" % max_size_int - max_size = None # Avoid doing comparison below - except ValueError: - err["max_size"] = "Maximum size is not an integer (%s)" % max_size - max_size = None # Avoid doing comparison below - - if min_size and max_size and min_size_int > max_size_int: - msg = "Minimum size must be less than maximum size (%i vs %i)" % (min_size_int, max_size_int) - err["min_size"] = msg - err["max_size"] = msg - - if err: - err_list.append({"segments":err}) - else: - err_list.append(dict()) - - if any(err_list): - #Return an error map only if any readgroup gave errors - error_map["read_group"] = err_list |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/repository_dependencies.xml --- a/tools/mira4_assembler/repository_dependencies.xml Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,4 +0,0 @@ -<?xml version="1.0"?> -<repositories description="This requires the MIRA datatype definitions (e.g. the MIRA Assembly Format)."> - <repository changeset_revision="ddd2e3362c5e" name="mira_datatypes" owner="peterjc" toolshed="https://toolshed.g2.bx.psu.edu" /> -</repositories> |
b |
diff -r 70248e6e3efc -r 4eb32a3d67d1 tools/mira4_assembler/tool_dependencies.xml --- a/tools/mira4_assembler/tool_dependencies.xml Wed Aug 05 11:31:05 2015 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 |
b |
@@ -1,9 +0,0 @@ -<?xml version="1.0"?> -<tool_dependency> - <package name="samtools" version="0.1.19"> - <repository changeset_revision="96aab723499f" name="package_samtools_0_1_19" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" /> - </package> - <package name="MIRA" version="4.0.2"> - <repository changeset_revision="8564aa1dbbf5" name="package_mira_4_0_2" owner="peterjc" toolshed="https://toolshed.g2.bx.psu.edu" /> - </package> -</tool_dependency> |