Repository 'vcfs2fasta'
hg clone https://toolshed.g2.bx.psu.edu/repos/ulfschaefer/vcfs2fasta

Changeset 21:b09ffe50c378 (2015-12-23)
Previous changeset 20:7ac17b6d031e (2015-12-18) Next changeset 22:96f393ad7fc6 (2015-12-23)
Commit message:
Uploaded
removed:
LICENSE
phe/__init__.py
phe/metadata/__init__.py
phe/variant/GATKVariantCaller.py
phe/variant/MPileupVariantCaller.py
phe/variant/__init__.py
phe/variant/variant_factory.py
phe/variant_filters/__init__.py
test-data/1_short.vcf
test-data/2_short.vcf
test-data/testresult.fa
tool_dependencies.xml
vcfs2fasta.py
vcfs2fasta.sh
vcfs2fasta.xml
b
diff -r 7ac17b6d031e -r b09ffe50c378 LICENSE
--- a/LICENSE Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
b'@@ -1,676 +0,0 @@\n-\n-\n-                    GNU GENERAL PUBLIC LICENSE\n-                       Version 3, 29 June 2007\n-\n- Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>\n- Everyone is permitted to copy and distribute verbatim copies\n- of this license document, but changing it is not allowed.\n-\n-                            Preamble\n-\n-  The GNU General Public License is a free, copyleft license for\n-software and other kinds of works.\n-\n-  The licenses for most software and other practical works are designed\n-to take away your freedom to share and change the works.  By contrast,\n-the GNU General Public License is intended to guarantee your freedom to\n-share and change all versions of a program--to make sure it remains free\n-software for all its users.  We, the Free Software Foundation, use the\n-GNU General Public License for most of our software; it applies also to\n-any other work released this way by its authors.  You can apply it to\n-your programs, too.\n-\n-  When we speak of free software, we are referring to freedom, not\n-price.  Our General Public Licenses are designed to make sure that you\n-have the freedom to distribute copies of free software (and charge for\n-them if you wish), that you receive source code or can get it if you\n-want it, that you can change the software or use pieces of it in new\n-free programs, and that you know you can do these things.\n-\n-  To protect your rights, we need to prevent others from denying you\n-these rights or asking you to surrender the rights.  Therefore, you have\n-certain responsibilities if you distribute copies of the software, or if\n-you modify it: responsibilities to respect the freedom of others.\n-\n-  For example, if you distribute copies of such a program, whether\n-gratis or for a fee, you must pass on to the recipients the same\n-freedoms that you received.  You must make sure that they, too, receive\n-or can get the source code.  And you must show them these terms so they\n-know their rights.\n-\n-  Developers that use the GNU GPL protect your rights with two steps:\n-(1) assert copyright on the software, and (2) offer you this License\n-giving you legal permission to copy, distribute and/or modify it.\n-\n-  For the developers\' and authors\' protection, the GPL clearly explains\n-that there is no warranty for this free software.  For both users\' and\n-authors\' sake, the GPL requires that modified versions be marked as\n-changed, so that their problems will not be attributed erroneously to\n-authors of previous versions.\n-\n-  Some devices are designed to deny users access to install or run\n-modified versions of the software inside them, although the manufacturer\n-can do so.  This is fundamentally incompatible with the aim of\n-protecting users\' freedom to change the software.  The systematic\n-pattern of such abuse occurs in the area of products for individuals to\n-use, which is precisely where it is most unacceptable.  Therefore, we\n-have designed this version of the GPL to prohibit the practice for those\n-products.  If such problems arise substantially in other domains, we\n-stand ready to extend this provision to those domains in future versions\n-of the GPL, as needed to protect the freedom of users.\n-\n-  Finally, every program is threatened constantly by software patents.\n-States should not allow patents to restrict development and use of\n-software on general-purpose computers, but in those that do, we wish to\n-avoid the special danger that patents applied to a free program could\n-make it effectively proprietary.  To prevent this, the GPL assures that\n-patents cannot be used to render the program non-free.\n-\n-  The precise terms and conditions for copying, distribution and\n-modification follow.\n-\n-                       TERMS AND CONDITIONS\n-\n-  0. Definitions.\n-\n-  "This License" refers to version 3 of the GNU General Public License.\n-\n-  "Copyright" also means copyright-like laws that apply to other kinds of\n-works, such as semiconductor masks.\n-\n-  "The Program" refers '..b'CE OF THE PROGRAM\n-IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF\n-ALL NECESSARY SERVICING, REPAIR OR CORRECTION.\n-\n-  16. Limitation of Liability.\n-\n-  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING\n-WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS\n-THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY\n-GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE\n-USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF\n-DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD\n-PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),\n-EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF\n-SUCH DAMAGES.\n-\n-  17. Interpretation of Sections 15 and 16.\n-\n-  If the disclaimer of warranty and limitation of liability provided\n-above cannot be given local legal effect according to their terms,\n-reviewing courts shall apply local law that most closely approximates\n-an absolute waiver of all civil liability in connection with the\n-Program, unless a warranty or assumption of liability accompanies a\n-copy of the Program in return for a fee.\n-\n-                     END OF TERMS AND CONDITIONS\n-\n-            How to Apply These Terms to Your New Programs\n-\n-  If you develop a new program, and you want it to be of the greatest\n-possible use to the public, the best way to achieve this is to make it\n-free software which everyone can redistribute and change under these terms.\n-\n-  To do so, attach the following notices to the program.  It is safest\n-to attach them to the start of each source file to most effectively\n-state the exclusion of warranty; and each file should have at least\n-the "copyright" line and a pointer to where the full notice is found.\n-\n-    {one line to give the program\'s name and a brief idea of what it does.}\n-    Copyright (C) {year}  {name of author}\n-\n-    This program is free software: you can redistribute it and/or modify\n-    it under the terms of the GNU General Public License as published by\n-    the Free Software Foundation, either version 3 of the License, or\n-    (at your option) any later version.\n-\n-    This program is distributed in the hope that it will be useful,\n-    but WITHOUT ANY WARRANTY; without even the implied warranty of\n-    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n-    GNU General Public License for more details.\n-\n-    You should have received a copy of the GNU General Public License\n-    along with this program.  If not, see <http://www.gnu.org/licenses/>.\n-\n-Also add information on how to contact you by electronic and paper mail.\n-\n-  If the program does terminal interaction, make it output a short\n-notice like this when it starts in an interactive mode:\n-\n-    {project}  Copyright (C) {year}  {fullname}\n-    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w\'.\n-    This is free software, and you are welcome to redistribute it\n-    under certain conditions; type `show c\' for details.\n-\n-The hypothetical commands `show w\' and `show c\' should show the appropriate\n-parts of the General Public License.  Of course, your program\'s commands\n-might be different; for a GUI interface, you would use an "about box".\n-\n-  You should also get your employer (if you work as a programmer) or school,\n-if any, to sign a "copyright disclaimer" for the program, if necessary.\n-For more information on this, and how to apply and follow the GNU GPL, see\n-<http://www.gnu.org/licenses/>.\n-\n-  The GNU General Public License does not permit incorporating your program\n-into proprietary programs.  If your program is a subroutine library, you\n-may consider it more useful to permit linking proprietary applications with\n-the library.  If this is what you want to do, use the GNU Lesser General\n-Public License instead of this License.  But first, please read\n-<http://www.gnu.org/philosophy/why-not-lgpl.html>.\n'
b
diff -r 7ac17b6d031e -r b09ffe50c378 phe/__init__.py
--- a/phe/__init__.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,4 +0,0 @@
-if __name__ == "phe":
-    # If this package is added as library, append extended path.
-    from pkgutil import extend_path
-    __path__ = extend_path(__path__, __name__)
\ No newline at end of file
b
diff -r 7ac17b6d031e -r b09ffe50c378 phe/metadata/__init__.py
--- a/phe/metadata/__init__.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,16 +0,0 @@
-"""Metadata related information."""
-
-import abc
-
-class PHEMetaData(object):
-    """Abstract class to provide interface for meta-data creation."""
-
-    __metaclass__ = abc.ABCMeta
-
-    def __init__(self):
-        pass
-
-    @abc.abstractmethod
-    def get_meta(self):
-        """Get the metadata."""
-        raise NotImplementedError("get meta has not been implemented yet.")
b
diff -r 7ac17b6d031e -r b09ffe50c378 phe/variant/GATKVariantCaller.py
--- a/phe/variant/GATKVariantCaller.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,126 +0,0 @@
-'''
-Created on 22 Sep 2015
-
-@author: alex
-'''
-from collections import OrderedDict
-import logging
-import os
-import subprocess
-
-from phe.variant import VariantCaller
-
-
-class GATKVariantCaller(VariantCaller):
-    """Implemetation of the Broad institute's variant caller."""
-
-    name = "gatk"
-    """Plain text name of the variant caller."""
-
-    _default_options = "--sample_ploidy 2 --genotype_likelihoods_model BOTH -rf BadCigar -out_mode EMIT_ALL_SITES -nt 1"
-    """Default options for the variant caller."""
-
-    def __init__(self, cmd_options=None):
-        """Constructor"""
-        if cmd_options is None:
-            cmd_options = self._default_options
-
-        super(GATKVariantCaller, self).__init__(cmd_options=cmd_options)
-
-        self.last_command = None
-
-    def get_info(self, plain=False):
-        d = {"name": "gatk", "version": self.get_version(), "command": self.last_command}
-
-        if plain:
-            result = "GATK(%(version)s): %(command)s" % d
-        else:
-            result = OrderedDict(d)
-
-        return result
-
-    def get_version(self):
-
-        p = subprocess.Popen(["java", "-jar", os.environ["GATK_JAR"], "-version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
-        (output, _) = p.communicate()
-
-        # last character is EOL.
-        version = output.split("\n")[-2]
-
-        return version
-
-    def make_vcf(self, *args, **kwargs):
-        ref = kwargs.get("ref")
-        bam = kwargs.get("bam")
-
-        if kwargs.get("vcf_file") is None:
-            kwargs["vcf_file"] = "variants.vcf"
-
-        opts = {"ref": os.path.abspath(ref),
-                "bam": os.path.abspath(bam),
-                "gatk_jar": os.environ["GATK_JAR"],
-                "all_variants_file": os.path.abspath(kwargs.get("vcf_file")),
-                "extra_cmd_options": self.cmd_options}
-
-#         if not self.create_aux_files(ref):
-#             logging.warn("Auxiliary files were not created.")
-#             return False
-
-        # Call variants
-        # FIXME: Sample ploidy = 2?
-        os.environ["GATK_JAR"]
-        cmd = "java -XX:+UseSerialGC -jar %(gatk_jar)s -T UnifiedGenotyper -R %(ref)s -I %(bam)s -o %(all_variants_file)s %(extra_cmd_options)s" % opts
-        success = os.system(cmd)
-
-        if success != 0:
-            logging.warn("Calling variants returned non-zero exit status.")
-            return False
-
-        self.last_command = cmd
-
-        return True
-
-    def create_aux_files(self, ref):
-        """Create auxiliary files needed for this variant.
-
-        Tools needed: samtools and picard tools. Picard tools is a Java
-        library that can be defined using environment variable: PICARD_JAR
-        specifying path to picard.jar or PICARD_TOOLS_PATH specifying path
-        to the directory where separate jars are (older version before jars
-        were merged into a single picard.jar).
-        Parameters:
-        -----------
-        ref: str
-            Path to the reference file.
-        
-        Returns:
-        --------
-        bool:
-            True if auxiliary files were created, False otherwise.
-        """
-
-        ref_name, _ = os.path.splitext(ref)
-
-        success = os.system("samtools faidx %s" % ref)
-
-        if success != 0:
-            logging.warn("Fasta index could not be created.")
-            return False
-
-        d = {"ref": ref, "ref_name": ref_name}
-
-        if os.environ.get("PICARD_TOOLS_PATH"):
-            d["picard_tools_path"] = os.path.join(os.environ["PICARD_TOOLS_PATH"], "CreateSequenceDictionary.jar")
-        elif os.environ.get("PICARD_JAR"):
-            # This is used in newer version of PICARD tool where multiple
-            #    jars were merged into a single jar file.
-            d["picard_tools_path"] = "%s %s" % (os.environ["PICARD_JAR"], "CreateSequenceDictionary")
-        else:
-            logging.error("Picard tools are not present in the path.")
-            return False
-
-        success = os.system("java -jar %(picard_tools_path)s R=%(ref)s O=%(ref_name)s.dict" % d)
-
-        if success != 0:
-            logging.warn("Dictionary for the %s reference could not be created", ref)
-            return False
b
diff -r 7ac17b6d031e -r b09ffe50c378 phe/variant/MPileupVariantCaller.py
--- a/phe/variant/MPileupVariantCaller.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,96 +0,0 @@
-'''
-Created on 22 Sep 2015
-
-@author: alex
-'''
-from collections import OrderedDict
-import logging
-import os
-import subprocess
-import tempfile
-
-from phe.variant import VariantCaller
-
-
-class MPileupVariantCaller(VariantCaller):
-    """Implemetation of the Broad institute's variant caller."""
-
-    name = "mpileup"
-    """Plain text name of the variant caller."""
-
-    _default_options = "-m -f GQ"
-    """Default options for the variant caller."""
-
-    def __init__(self, cmd_options=None):
-        """Constructor"""
-        if cmd_options is None:
-            cmd_options = self._default_options
-
-        super(MPileupVariantCaller, self).__init__(cmd_options=cmd_options)
-
-        self.last_command = None
-
-    def get_info(self, plain=False):
-        d = {"name": self.name, "version": self.get_version(), "command": self.last_command}
-
-        if plain:
-            result = "mpileup(%(version)s): %(command)s" % d
-        else:
-            result = OrderedDict(d)
-
-        return result
-
-    def get_version(self):
-
-        p = subprocess.Popen(["samtools", "--version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
-        (output, _) = p.communicate()
-
-        # first line is the version of the samtools
-        version = output.split("\n")[0].split(" ")[1]
-
-        return version
-
-    def make_vcf(self, *args, **kwargs):
-        ref = kwargs.get("ref")
-        bam = kwargs.get("bam")
-
-        if kwargs.get("vcf_file") is None:
-            kwargs["vcf_file"] = "variants.vcf"
-
-        opts = {"ref": os.path.abspath(ref),
-                "bam": os.path.abspath(bam),
-                "all_variants_file": os.path.abspath(kwargs.get("vcf_file")),
-                "extra_cmd_options": self.cmd_options}
-
-        with tempfile.NamedTemporaryFile(suffix=".pileup") as tmp:
-            opts["pileup_file"] = tmp.name
-            cmd = "samtools mpileup -t DP,DV,DP4,DPR,SP -Auf %(ref)s %(bam)s | bcftools call %(extra_cmd_options)s > %(all_variants_file)s" % opts
-            print cmd
-            self.last_command = cmd
-            if os.system(cmd) != 0:
-                logging.warn("Pileup creation was not successful.")
-                return False
-
-        return True
-
-    def create_aux_files(self, ref):
-        """Index reference with faidx from samtools.
-
-        Parameters:
-        -----------
-        ref: str
-            Path to the reference file.
-
-        Returns:
-        --------
-        bool:
-            True if auxiliary files were created, False otherwise.
-        """
-
-        success = os.system("samtools faidx %s" % ref)
-
-        if success != 0:
-            logging.warn("Fasta index could not be created.")
-            return False
-
-        return True
b
diff -r 7ac17b6d031e -r b09ffe50c378 phe/variant/__init__.py
--- a/phe/variant/__init__.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
b'@@ -1,288 +0,0 @@\n-"""Classes and methods to work with variants and such."""\n-import abc\n-#ulf\n-# from collections import OrderedDict\n-try:\n-    from collections import OrderedDict\n-except ImportError:\n-    from ordereddict import OrderedDict\n-    \n-import logging\n-import pickle\n-\n-from vcf import filters\n-import vcf\n-from vcf.parser import _Filter\n-\n-from phe.metadata import PHEMetaData\n-from phe.variant_filters import make_filters, PHEFilterBase, str_to_filters\n-\n-\n-class VCFTemplate(object):\n-    """This is a small hack class for the Template used in generating\n-    VCF file."""\n-\n-    def __init__(self, vcf_reader):\n-        self.infos = vcf_reader.infos\n-        self.formats = vcf_reader.formats\n-        self.filters = vcf_reader.filters\n-        self.alts = vcf_reader.alts\n-        self.contigs = vcf_reader.contigs\n-        self.metadata = vcf_reader.metadata\n-        self._column_headers = vcf_reader._column_headers\n-        self.samples = vcf_reader.samples\n-\n-class VariantSet(object):\n-    """A convenient representation of set of variants.\n-    TODO: Implement iterator and generator for the variant set.\n-    """\n-\n-    _reader = None\n-\n-    def __init__(self, vcf_in, filters=None):\n-        """Constructor of variant set.\n-\n-        Parameters:\n-        -----------\n-        vcf_in: str\n-            Path to the VCF file for loading information.\n-        filters: str or dict, optional\n-            Dictionary or string of the filter:threshold key value pairs.\n-        """\n-        self.vcf_in = vcf_in\n-        self._reader = vcf.Reader(filename=vcf_in)\n-        self.out_template = VCFTemplate(self._reader)\n-\n-        self.filters = []\n-        if filters is not None:\n-            if isinstance(filters, str):\n-                self.filters = str_to_filters(filters)\n-            elif isinstance(filters, dict):\n-                self.filters = make_filters(config=filters)\n-            elif isinstance(filters, list):\n-                self.filters = filters\n-            else:\n-                logging.warn("Could not create filters from %s", filters)\n-        else:\n-            reader = vcf.Reader(filename=self.vcf_in)\n-            filters = {}\n-            for filter_id in reader.filters:\n-                filters.update(PHEFilterBase.decode(filter_id))\n-\n-            if filters:\n-                self.filters = make_filters(config=filters)\n-\n-        self.variants = []\n-\n-    def filter_variants(self, keep_only_snps=True):\n-        """Create a variant """\n-\n-        if self._reader is None:\n-            # Create a reader class from input VCF.\n-            self._reader = vcf.Reader(filename=self.vcf_in)\n-\n-        # get list of existing filters.\n-        existing_filters = {}\n-        removed_filters = []\n-\n-        for filter_id in self._reader.filters:\n-            conf = PHEFilterBase.decode(filter_id)\n-            tuple(conf.keys())\n-            existing_filters.update({tuple(conf.keys()):filter_id})\n-\n-        # Add each filter we are going to use to the record.\n-        # This is needed for writing out proper #FILTER header in VCF.\n-        for record_filter in self.filters:\n-            # We know that each filter has short description method.\n-            short_doc = record_filter.short_desc()\n-            short_doc = short_doc.split(\'\\n\')[0].lstrip()\n-\n-            filter_name = PHEFilterBase.decode(record_filter.filter_name())\n-\n-            # Check if the sample has been filtered for this type of filter\n-            #    in the past. If so remove is, because it is going to be refiltered.\n-            if tuple(filter_name) in existing_filters:\n-                logging.info("Removing existing filter: %s", existing_filters[tuple(filter_name)])\n-                removed_filters.append(existing_filters[tuple(filter_name)])\n-                del self._reader.filters[existing_filters[tuple(filter_name)]]\n-\n-            self._reader.filters[record_filter.filter_name()] = _Filter(record_filter.filter_name(), short_doc)\n-\n-        '..b'    (default: False).\n-\n-        Returns:\n-        int:\n-            Number of records written.\n-        """\n-        written_variants = 0\n-        with open(vcf_out, "w") as out_vcf:\n-            writer = vcf.Writer(out_vcf, self.out_template)\n-            for record in self.variants:\n-\n-                if only_snps and not record.is_snp:\n-                    continue\n-\n-                if only_good and record.FILTER != "PASS" or record.FILTER is None:\n-                    continue\n-\n-                writer.write_record(record)\n-                written_variants += 1\n-\n-        return written_variants\n-\n-    def _write_bad_variants(self, vcf_out):\n-        """**PRIVATE:** Write only those records that **haven\'t** passed."""\n-        written_variants = 0\n-        with open(vcf_out, "w") as out_vcf:\n-            writer = vcf.Writer(out_vcf, self.out_template)\n-            for record in self.variants:\n-                if record.FILTER != "PASS" and record.FILTER is not None:\n-                    writer.write_record(record)\n-                    written_variants += 1\n-        return written_variants\n-\n-    def serialise(self, out_file):\n-        """Save the data in this class to a file for future use/reload.\n-\n-        Parameters:\n-        -----------\n-        out_file: str\n-            path to file where the data should be written to.\n-\n-        Returns:\n-        --------\n-        int:\n-            Number of variants written.\n-        """\n-        written_variants = 0\n-        with open(out_file, "w") as out_vcf:\n-            writer = vcf.Writer(out_vcf, self.out_template)\n-            for record in self.variants:\n-                writer.write_record(record)\n-                written_variants += 1\n-\n-        return written_variants\n-\n-    def update_filters(self, new_filters):\n-        """Update internal filters in the output template."""\n-        for new_filter, filter_data in new_filters.items():\n-            self.out_template.filters[new_filter] = filter_data\n-\n-\n-class VariantCaller(PHEMetaData):\n-    """Abstract class used for access to the implemented variant callers."""\n-\n-    __metaclass__ = abc.ABCMeta\n-\n-    def __init__(self, cmd_options=None):\n-        """Constructor for variant caller.\n-\n-        Parameters:\n-        -----------\n-        cmd_options: str, optional\n-            Command options to pass to the variant command.\n-        """\n-        self.cmd_options = cmd_options\n-\n-        super(VariantCaller, self).__init__()\n-\n-    @abc.abstractmethod\n-    def make_vcf(self, *args, **kwargs):\n-        """Create a VCF from **BAM** file.\n-\n-        Parameters:\n-        -----------\n-        ref: str\n-            Path to the reference file.\n-        bam: str\n-            Path to the indexed **BAM** file for calling variants.\n-        vcf_file: str\n-            path to the VCF file where data will be written to.\n-\n-        Returns:\n-        --------\n-        bool:\n-            True if variant calling was successful, False otherwise.\n-        """\n-        raise NotImplementedError("make_vcf is not implemented yet.")\n-\n-    @abc.abstractmethod\n-    def create_aux_files(self, ref):\n-        """Create needed (if any) auxiliary files.\n-        These files are required for proper functioning of the variant caller.\n-        """\n-        raise NotImplementedError("create_aux_files is not implemeted.")\n-\n-    @abc.abstractmethod\n-    def get_info(self, plain=False):\n-        """Get information about this variant caller."""\n-        raise NotImplementedError("Get info has not been implemented yet."\n-                                  )\n-    def get_meta(self):\n-        """Get the metadata about this variant caller."""\n-        od = self.get_info()\n-        od["ID"] = "VariantCaller"\n-        return OrderedDict({"PHEVariantMetaData": [od]})\n-\n-    @abc.abstractmethod\n-    def get_version(self):\n-        """Get the version of the underlying command used."""\n-        raise NotImplementedError("Get version has not been implemented yet.")\n'
b
diff -r 7ac17b6d031e -r b09ffe50c378 phe/variant/variant_factory.py
--- a/phe/variant/variant_factory.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,92 +0,0 @@
-'''Classes and functions for working with variant callers.
-
-Created on 22 Sep 2015
-
-@author: alex
-'''
-import glob
-import inspect
-import logging
-import os
-import sys
-
-from phe.variant import VariantCaller
-
-def dynamic_caller_loader():
-    """Fancy way of dynamically importing existing variant callers.
-    
-    Returns
-    -------
-    dict:
-        Available variant callers dictionary. Keys are parameters that
-        can be used to call variants.
-    """
-
-    # We assume the caller are in the same directory as THIS file.
-    variants_dir = os.path.dirname(__file__)
-    variants_dir = os.path.abspath(variants_dir)
-
-    # This is populated when the module is first imported.
-    avail_callers = {}
-
-    # Add this directory to the syspath.
-    sys.path.insert(0, variants_dir)
-
-    # Find all "py" files.
-    for caller_mod in glob.glob(os.path.join(variants_dir, "*.py")):
-
-        # Derive name of the module where caller is.
-        caller_mod_file = os.path.basename(caller_mod)
-
-        # Ignore __init__ file, only base class is there.
-        if caller_mod_file.startswith("__init__"):
-            continue
-
-        # Import the module with a caller.
-        mod = __import__(caller_mod_file.replace(".pyc", "").replace(".py", ""))
-
-        # Find all the classes contained in this module.
-        classes = inspect.getmembers(mod, inspect.isclass)
-        for cls_name, cls in classes:
-            # For each class, if it is a sublass of VariantCaller, add it.
-            if cls_name != "VariantCaller" and issubclass(cls, VariantCaller):
-                # The name is inherited and defined within each caller.
-                avail_callers[cls.name] = cls
-
-    sys.path.remove(variants_dir)
-
-    return avail_callers
-
-_avail_variant_callers = dynamic_caller_loader()
-
-def available_callers():
-    """Return list of available variant callers."""
-    return _avail_variant_callers.keys()
-
-def factory(variant=None, custom_options=None):
-    """Make an instance of a variant class.
-    
-    Parameters:
-    -----------
-    variant: str, optional
-        Name of the variant class to instantiate.
-    custom_options: str, optional
-        Custom options to be passed directly to the implementing class.
-    
-    Returns:
-    --------
-    :py:class:`phe.variant.VariantCaller`:
-        Instance of the :py:class:`phe.variant.VariantCaller` for given
-        variant name, or None if one couldn't be found.
-    """
-    if variant is not None and isinstance(variant, str):
-
-        variant = variant.lower()
-        if variant in _avail_variant_callers:
-            return _avail_variant_callers[variant](cmd_options=custom_options)
-        else:
-            logging.error("No implementation for %s mapper.")
-            return None
-
-    logging.warn("Unknown parameters. Mapper could not be initialised.")
-    return None
b
diff -r 7ac17b6d031e -r b09ffe50c378 phe/variant_filters/__init__.py
--- a/phe/variant_filters/__init__.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,209 +0,0 @@
-"""Classes and functions for working with variant filters."""
-
-from __builtin__ import __import__
-from abc import abstractproperty
-import abc
-import argparse
-import glob
-import inspect
-import logging
-import os
-import re
-import sys
-
-import vcf
-import vcf.filters
-from vcf.parser import _Filter
-
-IUPAC_CODES = {frozenset(["A", "G"]): "R",
-                frozenset(["C", "T"]): "Y",
-                frozenset(["G", "C"]): "S",
-                frozenset(["A", "T"]): "W",
-                frozenset(["G", "T"]): "K",
-                frozenset(["A", "C"]): "M",
-                frozenset(["C", "G", "T"]): "B",
-                frozenset(["A", "G", "T"]): "D",
-                frozenset(["A", "C", "T"]): "H",
-                frozenset(["A", "C", "G"]): "V"
-              }
-
-class PHEFilterBase(vcf.filters.Base):
-    """Base class for VCF filters."""
-    __meta__ = abc.ABCMeta
-
-    magic_sep = ":"
-    decoder_pattern = re.compile(magic_sep)
-
-    @abc.abstractproperty
-    def parameter(self):
-        """Short name of parameter being filtered."""
-        return self.parameter
-
-    @abc.abstractproperty
-    def _default_threshold(self):
-        """Default threshold for filtering."""
-        return self._default_threshold
-
-    def __init__(self, args):
-        super(PHEFilterBase, self).__init__(args)
-
-        # Change the threshold to custom gq value.
-        self.threshold = self._default_threshold
-
-        if isinstance(args, dict):
-            self.threshold = args.get(self.parameter)
-
-    def __str__(self):
-        return self.filter_name()
-
-    @abc.abstractmethod
-    def short_desc(self):
-        """Short description of the filter (included in VCF)."""
-        raise NotImplementedError("Get short description is not implemented.")
-
-    def get_config(self):
-        """This is used for reconstructing filter."""
-        return {self.parameter: self.threshold}
-
-    def filter_name(self):
-        """Create filter names by their parameter separated by magic.
-        E.g. if filter parameter is ad_ratio and threshold is 0.9 then
-        ad_ratio:0.9 if the filter name.
-        """
-        return "%s%s%s" % (self.parameter, self.magic_sep, self.threshold)
-
-    @staticmethod
-    def decode(filter_id):
-        """Decode name of filter."""
-        conf = {}
-
-        if PHEFilterBase.magic_sep in filter_id:
-            info = PHEFilterBase.decoder_pattern.split(filter_id)
-            assert len(info) == 2
-            conf[info[0]] = info[1]
-        return conf
-
-    def is_gap(self):
-        return False
-
-    def is_n(self):
-        return True
-
-    @staticmethod
-    def call_concensus(record):
-        extended_code = "N"
-        try:
-            sample_ad = set([str(c) for c in record.ALT] + [record.REF])
-
-
-            for code, cov in IUPAC_CODES.items():
-                if sample_ad == cov:
-                    extended_code = code
-                    break
-        except AttributeError:
-            extended_code = "N"
-
-        return extended_code
-
-def dynamic_filter_loader():
-    """Fancy way of dynamically importing existing filters.
-    
-    Returns
-    -------
-    dict:
-        Available filters dictionary. Keys are parameters that
-        can be supplied to the filters.
-    """
-
-    # We assume the filters are in the same directory as THIS file.
-    filter_dir = os.path.dirname(__file__)
-    filter_dir = os.path.abspath(filter_dir)
-
-    # This is populated when the module is first imported.
-    avail_filters = {}
-
-    # Add this directory to the syspath.
-    sys.path.insert(0, filter_dir)
-
-    # Find all "py" files.
-    for filter_mod in glob.glob(os.path.join(filter_dir, "*.py")):
-
-        # Derive name of the module where filter is.
-        filter_mod_file = os.path.basename(filter_mod)
-
-        # Ignore this file, obviously.
-        if filter_mod_file.startswith("__init__"):
-            continue
-
-        # Import the module with a filter.
-        mod = __import__(filter_mod_file.replace(".pyc", "").replace(".py", ""))
-
-        # Find all the classes contained in this module.
-        classes = inspect.getmembers(mod, inspect.isclass)
-        for cls_name, cls in classes:
-            # For each class, if it is a sublass of PHEFilterBase, add it.
-            if cls_name != "PHEFilterBase" and issubclass(cls, PHEFilterBase):
-                # The parameters are inherited and defined within each filter.
-                avail_filters[cls.parameter] = cls
-
-    sys.path.remove(filter_dir)
-
-    return avail_filters
-
-_avail_filters = dynamic_filter_loader()
-
-def available_filters():
-    """Return list of available filters."""
-    return _avail_filters.keys()
-
-def str_to_filters(filters):
-    """Convert from filter string to array of filters.
-    E.g. ad_ration:0.9,min_depth:5
-    
-    Parameters:
-    -----------
-    filters: str
-        String version of filters, separated by comma.
-    
-    Returns:
-    --------
-    list:
-        List of :py:class:`phe.variant_filters.PHEFilterBase` instances.
-    """
-
-    config = {}
-    for kv_pair in filters.split(","):
-        pair = kv_pair.split(":")
-        assert len(pair) == 2, "Filters should be separated by ':' %s" % kv_pair
-
-        # We don't care about casting them to correct type because Filters
-        #    will do it for us.
-        config[pair[0]] = pair[1]
-
-    return make_filters(config)
-
-def make_filters(config):
-    """Create a list of filters from *config*.
-    
-    Parameters:
-    -----------
-    config: dict, optional
-        Dictionary with parameter: value pairs. For each parameter, an
-        appropriate Filter will be found and instanciated.
-        
-    Returns:
-    --------
-    list:
-        List of :py:class:`PHEFilterBase` filters.
-    """
-    filters = []
-
-    if config:
-        for custom_filter in config:
-            if custom_filter in _avail_filters:
-                filters.append(_avail_filters[custom_filter](config))
-            else:
-                logging.warn("Could not find appropriate filter for %s",
-                             custom_filter)
-
-    return filters
b
diff -r 7ac17b6d031e -r b09ffe50c378 test-data/1_short.vcf
--- a/test-data/1_short.vcf Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
b'@@ -1,50000 +0,0 @@\n-##fileformat=VCFv4.1\n-##FILTER=<ID=LowQual,Description="Low quality">\n-##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">\n-##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">\n-##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">\n-##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n-##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">\n-##GATKCommandLine=<ID=SelectVariants,Version=2.6-5-gba531bd,Date="Thu May 22 09:24:36 BST 2014",Epoch=1400747076888,CommandLineOptions="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/input_variant.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] keepIDs=null fullyDecode=false forceGenotypesDecode=false justRead=false maxIndelSize=2147483647 ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">\n-##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-5-gba531bd,Date="Thu May 22 09:10:24 BST 2014",Epoch=1400746224902,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input_0.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=NONE downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOrigina'..b'.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49922\t.\tT\t.\t434.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49923\t.\tC\t.\t434.23\t.\tAN=2;DP=143;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n-gi|15829254|ref|NC_002695.1|\t49924\t.\tA\t.\t428.23\t.\tAN=2;DP=141;MQ=60.00;MQ0=0\tGT:DP\t0/0:141\n-gi|15829254|ref|NC_002695.1|\t49925\t.\tA\t.\t440.23\t.\tAN=2;DP=149;MQ=60.00;MQ0=0\tGT:DP\t0/0:149\n-gi|15829254|ref|NC_002695.1|\t49926\t.\tT\t.\t440.23\t.\tAN=2;DP=152;MQ=60.00;MQ0=0\tGT:DP\t0/0:152\n-gi|15829254|ref|NC_002695.1|\t49927\t.\tG\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49928\t.\tT\t.\t452.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n-gi|15829254|ref|NC_002695.1|\t49929\t.\tC\t.\t443.23\t.\tAN=2;DP=153;MQ=60.00;MQ0=0\tGT:DP\t0/0:153\n-gi|15829254|ref|NC_002695.1|\t49930\t.\tG\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49931\t.\tA\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49932\t.\tT\t.\t461.23\t.\tAN=2;DP=159;MQ=60.00;MQ0=0\tGT:DP\t0/0:159\n-gi|15829254|ref|NC_002695.1|\t49933\t.\tG\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49934\t.\tA\t.\t455.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49935\t.\tA\t.\t452.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49936\t.\tG\t.\t455.23\t.\tAN=2;DP=158;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n-gi|15829254|ref|NC_002695.1|\t49937\t.\tA\t.\t440.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49938\t.\tG\t.\t449.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n-gi|15829254|ref|NC_002695.1|\t49939\t.\tC\t.\t443.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49940\t.\tA\t.\t440.23\t.\tAN=2;DP=154;MQ=60.00;MQ0=0\tGT:DP\t0/0:154\n-gi|15829254|ref|NC_002695.1|\t49941\t.\tT\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49942\t.\tC\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49943\t.\tC\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49944\t.\tG\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49945\t.\tC\t.\t431.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49946\t.\tA\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49947\t.\tC\t.\t431.23\t.\tAN=2;DP=145;MQ=60.00;MQ0=0\tGT:DP\t0/0:145\n-gi|15829254|ref|NC_002695.1|\t49948\t.\tA\t.\t404.23\t.\tAN=2;DP=137;MQ=60.00;MQ0=0\tGT:DP\t0/0:137\n-gi|15829254|ref|NC_002695.1|\t49949\t.\tT\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n-gi|15829254|ref|NC_002695.1|\t49950\t.\tT\t.\t419.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49951\t.\tG\t.\t431.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49952\t.\tT\t.\t437.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49953\t.\tT\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49954\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49955\t.\tT\t.\t428.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49956\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49957\t.\tA\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49958\t.\tA\t.\t416.23\t.\tAN=2;DP=140;MQ=60.00;MQ0=0\tGT:DP\t0/0:140\n-gi|15829254|ref|NC_002695.1|\t49959\t.\tA\t.\t407.23\t.\tAN=2;DP=138;MQ=60.00;MQ0=0\tGT:DP\t0/0:138\n-gi|15829254|ref|NC_002695.1|\t49960\t.\tG\t.\t395.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n-gi|15829254|ref|NC_002695.1|\t49961\t.\tC\t.\t404.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n-gi|15829254|ref|NC_002695.1|\t49962\t.\tC\t.\t392.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n-gi|15829254|ref|NC_002695.1|\t49963\t.\tG\t.\t398.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n-gi|15829254|ref|NC_002695.1|\t49964\t.\tA\t.\t398.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n'
b
diff -r 7ac17b6d031e -r b09ffe50c378 test-data/2_short.vcf
--- a/test-data/2_short.vcf Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
b'@@ -1,50000 +0,0 @@\n-##fileformat=VCFv4.1\n-##FILTER=<ID=LowQual,Description="Low quality">\n-##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">\n-##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">\n-##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">\n-##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n-##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">\n-##GATKCommandLine=<ID=SelectVariants,Version=2.6-5-gba531bd,Date="Thu May 22 09:15:54 BST 2014",Epoch=1400746554393,CommandLineOptions="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-4mpB9s/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/phengs/galaxy/database/tmp/tmp-gatk-4mpB9s/input_variant.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] keepIDs=null fullyDecode=false forceGenotypesDecode=false justRead=false maxIndelSize=2147483647 ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">\n-##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-5-gba531bd,Date="Thu May 22 09:06:54 BST 2014",Epoch=1400746014908,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/phengs/galaxy/database/tmp/tmp-gatk-Gz3f1A/gatk_input_0.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-Gz3f1A/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=NONE downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOrigina'..b'AN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49921\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n-gi|15829254|ref|NC_002695.1|\t49922\t.\tT\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49923\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49924\t.\tA\t.\t151.23\t.\tAN=2;DP=45;MQ=60.00;MQ0=0\tGT:DP\t0/0:45\n-gi|15829254|ref|NC_002695.1|\t49925\t.\tA\t.\t163.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n-gi|15829254|ref|NC_002695.1|\t49926\t.\tT\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n-gi|15829254|ref|NC_002695.1|\t49927\t.\tG\t.\t160.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n-gi|15829254|ref|NC_002695.1|\t49928\t.\tT\t.\t160.23\t.\tAN=2;DP=52;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n-gi|15829254|ref|NC_002695.1|\t49929\t.\tC\t.\t160.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n-gi|15829254|ref|NC_002695.1|\t49930\t.\tG\t.\t163.23\t.\tAN=2;DP=53;MQ=60.00;MQ0=0\tGT:DP\t0/0:53\n-gi|15829254|ref|NC_002695.1|\t49931\t.\tA\t.\t166.23\t.\tAN=2;DP=54;MQ=60.00;MQ0=0\tGT:DP\t0/0:54\n-gi|15829254|ref|NC_002695.1|\t49932\t.\tT\t.\t169.23\t.\tAN=2;DP=55;MQ=60.00;MQ0=0\tGT:DP\t0/0:55\n-gi|15829254|ref|NC_002695.1|\t49933\t.\tG\t.\t163.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n-gi|15829254|ref|NC_002695.1|\t49934\t.\tA\t.\t166.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n-gi|15829254|ref|NC_002695.1|\t49935\t.\tA\t.\t169.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n-gi|15829254|ref|NC_002695.1|\t49936\t.\tG\t.\t169.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n-gi|15829254|ref|NC_002695.1|\t49937\t.\tA\t.\t163.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n-gi|15829254|ref|NC_002695.1|\t49938\t.\tG\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n-gi|15829254|ref|NC_002695.1|\t49939\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49940\t.\tA\t.\t151.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49941\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n-gi|15829254|ref|NC_002695.1|\t49942\t.\tC\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n-gi|15829254|ref|NC_002695.1|\t49943\t.\tC\t.\t157.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49944\t.\tG\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49945\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49946\t.\tA\t.\t154.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49947\t.\tC\t.\t154.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49948\t.\tA\t.\t151.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n-gi|15829254|ref|NC_002695.1|\t49949\t.\tT\t.\t151.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n-gi|15829254|ref|NC_002695.1|\t49950\t.\tT\t.\t148.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n-gi|15829254|ref|NC_002695.1|\t49951\t.\tG\t.\t154.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n-gi|15829254|ref|NC_002695.1|\t49952\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n-gi|15829254|ref|NC_002695.1|\t49953\t.\tT\t.\t154.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n-gi|15829254|ref|NC_002695.1|\t49954\t.\tG\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49955\t.\tT\t.\t151.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n-gi|15829254|ref|NC_002695.1|\t49956\t.\tG\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49957\t.\tA\t.\t151.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49958\t.\tA\t.\t148.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49959\t.\tA\t.\t148.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n-gi|15829254|ref|NC_002695.1|\t49960\t.\tG\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n-gi|15829254|ref|NC_002695.1|\t49961\t.\tC\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n-gi|15829254|ref|NC_002695.1|\t49962\t.\tC\t.\t148.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n-gi|15829254|ref|NC_002695.1|\t49963\t.\tG\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n-gi|15829254|ref|NC_002695.1|\t49964\t.\tA\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n'
b
diff -r 7ac17b6d031e -r b09ffe50c378 test-data/testresult.fa
--- a/test-data/testresult.fa Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,6 +0,0 @@
->1_short
-TTCTTCATGA
->2_short
-TTTTACACGA
->reference
-CCCATAGCAG
b
diff -r 7ac17b6d031e -r b09ffe50c378 tool_dependencies.xml
--- a/tool_dependencies.xml Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,21 +0,0 @@
-<?xml version="1.0"?>
-<tool_dependency>
- <package name="python" version="2.7.10">
-        <repository changeset_revision="0339c4a9b87b" name="package_python_2_7_10" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu" />
-    </package>
-    <package name="pyvcf" version="0.6.8dev">
-        <repository changeset_revision="7ef691e979b5" name="package_python_2_7_pyvcf_0_6_8dev" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
-    </package>
-    <package name="pyyaml" version="3.11">
-        <repository changeset_revision="99267d131c05" name="package_python_2_7_pyyaml_3_11" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
-    </package>
- <package name="bintrees" version="2.0.2">
-        <repository changeset_revision="1d94386c45bc" name="package_python_2_7_bintrees_2_0_2" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
-    </package>
- <package name="biopython" version="1.66">
-        <repository changeset_revision="5d5355863287" name="package_python_2_7_biopython_1_66" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
-    </package>
- <package name="matplotlib" version="1.4">
-        <repository changeset_revision="f7424e1cf115" name="package_python_2_7_matplotlib_1_4" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
-    </package>
-</tool_dependency>
b
diff -r 7ac17b6d031e -r b09ffe50c378 vcfs2fasta.py
--- a/vcfs2fasta.py Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
b'@@ -1,415 +0,0 @@\n-#!/usr/bin/env python\n-\'\'\'\n-Merge SNP data from multiple VCF files into a single fasta file.\n-\n-Created on 5 Oct 2015\n-\n-@author: alex\n-\'\'\'\n-import argparse\n-from collections import OrderedDict\n-import glob\n-import itertools\n-import logging\n-import os\n-\n-from Bio import SeqIO\n-from bintrees import FastRBTree\n-\n-# Try importing the matplotlib and numpy for stats.\n-try:\n-    from matplotlib import pyplot as plt\n-    import numpy\n-    can_stats = True\n-except ImportError:\n-    can_stats = False\n-\n-import vcf\n-\n-from phe.variant_filters import IUPAC_CODES\n-\n-\n-def plot_stats(pos_stats, total_samples, plots_dir="plots", discarded={}):\n-    if not os.path.exists(plots_dir):\n-        os.makedirs(plots_dir)\n-\n-    for contig in pos_stats:\n-\n-        plt.style.use(\'ggplot\')\n-\n-        x = numpy.array([pos for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n-        y = numpy.array([ float(pos_stats[contig][pos]["mut"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, []) ])\n-\n-        f, (ax1, ax2, ax3, ax4) = plt.subplots(4, sharex=True, sharey=True)\n-        f.set_size_inches(12, 15)\n-        ax1.plot(x, y, \'ro\')\n-        ax1.set_title("Fraction of samples with SNPs")\n-        plt.ylim(0, 1.1)\n-\n-        y = numpy.array([ float(pos_stats[contig][pos]["N"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n-        ax2.plot(x, y, \'bo\')\n-        ax2.set_title("Fraction of samples with Ns")\n-\n-        y = numpy.array([ float(pos_stats[contig][pos]["mix"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n-        ax3.plot(x, y, \'go\')\n-        ax3.set_title("Fraction of samples with mixed bases")\n-\n-        y = numpy.array([ float(pos_stats[contig][pos]["gap"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n-        ax4.plot(x, y, \'yo\')\n-        ax4.set_title("Fraction of samples with uncallable genotype (gap)")\n-\n-        plt.savefig(os.path.join(plots_dir, "%s.png" % contig), dpi=100)\n-\n-def get_mixture(record, threshold):\n-    mixtures = {}\n-    try:\n-        if len(record.samples[0].data.AD) > 1:\n-\n-            total_depth = sum(record.samples[0].data.AD)\n-            # Go over all combinations of touples.\n-            for comb in itertools.combinations(range(0, len(record.samples[0].data.AD)), 2):\n-                i = comb[0]\n-                j = comb[1]\n-\n-                alleles = list()\n-\n-                if 0 in comb:\n-                    alleles.append(str(record.REF))\n-\n-                if i != 0:\n-                    alleles.append(str(record.ALT[i - 1]))\n-                    mixture = record.samples[0].data.AD[i]\n-                if j != 0:\n-                    alleles.append(str(record.ALT[j - 1]))\n-                    mixture = record.samples[0].data.AD[j]\n-\n-                ratio = float(mixture) / total_depth\n-                if ratio == 1.0:\n-                    logging.debug("This is only designed for mixtures! %s %s %s %s", record, ratio, record.samples[0].data.AD, record.FILTER)\n-\n-                    if ratio not in mixtures:\n-                        mixtures[ratio] = []\n-                    mixtures[ratio].append(alleles.pop())\n-\n-                elif ratio >= threshold:\n-                    try:\n-                        code = IUPAC_CODES[frozenset(alleles)]\n-                        if ratio not in mixtures:\n-                            mixtures[ratio] = []\n-                            mixtures[ratio].append(code)\n-                    except KeyError:\n-                        logging.warn("Could not retrieve IUPAC code for %s from %s", alleles, record)\n-    except AttributeError:\n-        mixtures = {}\n-\n-    return mixtures\n-\n-def print_stats(stats, pos_stats, total_vars):\n-    for contig in stats:\n-        for sample, info in stats[contig].items():\n-            print "%s,%i,%i" % (sample, len(info.get("n_pos", [])), total_vars)\n-\n'..b'n_ratio = float(len(sample_stats[contig][sample]["n_pos"])) / len(avail_pos[contig])\n-                if sample_n_ratio > args.sample_Ns:\n-                    for pos in sample_stats[contig][sample]["n_pos"]:\n-                        pos_stats[contig][pos]["N"] -= 1\n-\n-                    logging.info("Removing %s due to high Ns in sample: %s", sample , sample_n_ratio)\n-\n-                    delete_samples.append(sample)\n-\n-        samples = [sample for sample in samples if sample not in delete_samples]\n-    snp_positions = []\n-    with open(args.out, "w") as fp:\n-\n-        for sample in samples:\n-            sample_seq = ""\n-            for contig in contigs:\n-                if contig in avail_pos:\n-                    if args.reference:\n-                        positions = xrange(1, len(args.reference[contig]) + 1)\n-                    else:\n-                        positions = avail_pos[contig].keys()\n-                    for pos in positions:\n-                        if pos in avail_pos[contig]:\n-                            if not args.column_Ns or float(pos_stats[contig][pos]["N"]) / len(samples) < args.column_Ns and \\\n-                                float(pos_stats[contig][pos]["-"]) / len(samples) < args.column_Ns:\n-                                sample_seq += all_data[contig][sample][pos]\n-                            else:\n-                                if contig not in discarded:\n-                                    discarded[contig] = []\n-                                discarded[contig].append(pos)\n-                        elif args.reference:\n-                            sample_seq += args.reference[contig][pos - 1]\n-                elif args.reference:\n-                    sample_seq += args.reference[contig]\n-\n-            fp.write(">%s\\n%s\\n" % (sample, sample_seq))\n-        # Do the same for reference data.\n-        ref_snps = ""\n-\n-        for contig in contigs:\n-            if contig in avail_pos:\n-                if args.reference:\n-                    positions = xrange(1, len(args.reference[contig]) + 1)\n-                else:\n-                    positions = avail_pos[contig].keys()\n-                for pos in positions:\n-                    if pos in avail_pos[contig]:\n-                        if not args.column_Ns or float(pos_stats[contig][pos]["N"]) / len(samples) < args.column_Ns and \\\n-                                float(pos_stats[contig][pos]["-"]) / len(samples) < args.column_Ns:\n-\n-                            ref_snps += str(avail_pos[contig][pos])\n-                            snp_positions.append((contig, pos,))\n-                    elif args.reference:\n-                        ref_snps += args.reference[contig][pos - 1]\n-            elif args.reference:\n-                    ref_snps += args.reference[contig]\n-\n-        fp.write(">reference\\n%s\\n" % ref_snps)\n-\n-    if can_stats and args.with_stats:\n-        with open(args.with_stats, "wb") as fp:\n-            fp.write("contig\\tposition\\tmutations\\tn_frac\\n")\n-            for values in snp_positions:\n-                fp.write("%s\\t%s\\t%s\\t%s\\n" % (values[0],\n-                                             values[1],\n-                                             float(pos_stats[values[0]][values[1]]["mut"]) / len(args.input),\n-                                             float(pos_stats[values[0]][values[1]]["N"]) / len(args.input)))\n-        plot_stats(pos_stats, len(samples), discarded=discarded, plots_dir=os.path.abspath(args.plots_dir))\n-    # print_stats(sample_stats, pos_stats, total_vars=len(avail_pos[contig]))\n-\n-    total_discarded = 0\n-    for _, i in discarded.items():\n-        total_discarded += len(i)\n-    logging.info("Discarded total of %i poor quality columns", float(total_discarded) / len(args.input))\n-    return 0\n-\n-if __name__ == \'__main__\':\n-    import time\n-\n-#     with PyCallGraph(output=graphviz):\n-#     T0 = time.time()\n-    r = main()\n-#     T1 = time.time()\n-\n-#     print "Time taken: %i" % (T1 - T0)\n-    exit(r)\n'
b
diff -r 7ac17b6d031e -r b09ffe50c378 vcfs2fasta.sh
--- a/vcfs2fasta.sh Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,51 +0,0 @@
-#!/bin/bash
-
-echo $@
-
-OUTPUT=$1
-shift
-WITHMIXTURES=$1
-shift
-COLUMNNS=$1
-shift
-SAMPLENS=$1
-shift
-REFERENCE=$1
-shift
-INCLUDE=$1
-shift
-EXCLUDE=$1
-shift
-INPUT=$@
-
-DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
-export PATH=$PATH:$DIR
-
-CMD="vcfs2fasta.py --out $OUTPUT --input $INPUT"
-
-if [ $WITHMIXTURES != "NOTTHERE" ]; then
-    CMD="$CMD --with-mixtures $WITHMIXTURES"
-fi
-
-if [ $COLUMNNS != "NOTTHERE" ]; then
-    CMD="$CMD --column-Ns $COLUMNNS"
-fi
-
-if [ $SAMPLENS != "NOTTHERE" ]; then
-    CMD="$CMD --sample-Ns $SAMPLENS"
-fi
-
-if [ $REFERENCE != "NOTTHERE" ]; then
-    CMD="$CMD --reference $REFERENCE"
-fi
-
-if [ $INCLUDE != "NOTTHERE" ]; then
-    CMD="$CMD --include INCLUDE"
-fi
-
-if [ $EXCLUDE != "NOTTHERE" ]; then
-    CMD="$CMD --exclude EXCLUDE"
-fi
-
-echo $CMD
-eval $CMD
b
diff -r 7ac17b6d031e -r b09ffe50c378 vcfs2fasta.xml
--- a/vcfs2fasta.xml Fri Dec 18 07:31:09 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,123 +0,0 @@
-<tool id="vcfs2fasta" name="VCFs to fasta" version="1.0">
-  <description>Takes a set of VCF files and outputs a multi fasta file with only the variant positions.</description>
-  <requirements>
-    <requirement type="package" version="2.7.10">python</requirement>
-    <requirement type="package" version="0.6.8dev">pyvcf</requirement>
-    <requirement type="package" version="3.11">pyyaml</requirement>
-    <requirement type="package" version="2.0.2">bintrees</requirement>
- <requirement type="package" version="1.66">biopython</requirement>
- <requirement type="package" version="1.4">matplotlib</requirement>
-  </requirements>
-  <stdio>
- <!-- Assume anything other than zero is an error -->
-    <exit_code range="1:" />
-    <exit_code range=":-1" />
-  </stdio>
-  <command interpreter="bash">
-    vcfs2fasta.sh
- $output
- #if str($mix_cond.mix) == "yes":
-        $mix_cond.mix_value
-    #else
-        NOTTHERE
-    #end if
- #if str($cols_cond.cols) == "yes":
-        $cols_cond.column_ns
-    #else
-        NOTTHERE
-    #end if
- #if str($sample_cond.sample) == "yes":
-        $sample_cond.sample_ns
-    #else
-        NOTTHERE
-    #end if
- #if str($reference_cond.reference) == "yes":
- $reference_cond.ref_fa
- #else
-        NOTTHERE
-    #end if
- #if str($include_cond.include) == "yes":
- $include_cond.in_bed
- #else
-        NOTTHERE
-    #end if
- #if str($exclude_cond.exclude) == "yes":
- $exclude_cond.ex_bed
- #else
-        NOTTHERE
-    #end if
- #for $i, $input_vcf in enumerate( $input_vcfs ):
-        "${input_vcf}"
-    #end for
-  </command>
-
-  <inputs>
- <param name="input_vcfs" type="data" multiple="true" format="vcf" label="Input VCF file(s)" />
- <conditional name="mix_cond">
-        <param name="mix" type="select" label="With Mixtures" help="Specify this option with a threshold to output mixtures above this threshold.">
-          <option value="yes">Specify</option>
-          <option value="no" selected="true">Do not specify</option>
-        </param>
-        <when value="yes">
- <param name="mix_value" type="float" value="0.5" label="Mixture value" />
-        </when>
-    </conditional>
- <conditional name="cols_cond">
-        <param name="cols" type="select" label="Column Ns" help="Keeps columns with fraction of Ns above specified threshold.">
-          <option value="yes">Specify</option>
-          <option value="no" selected="true">Do not specify</option>
-        </param>
-        <when value="yes">
- <param name="column_ns" type="float" value="0.5" label="Column Ns value" />
-        </when>
-    </conditional>
- <conditional name="sample_cond">
-        <param name="sample" type="select" label="Sample Ns" help="Keeps samples with fraction of Ns above specified threshold.">
-          <option value="yes">Specify</option>
-          <option value="no" selected="true">Do not specify</option>
-        </param>
-        <when value="yes">
- <param name="sample_ns" type="float" value="0.5" label="Sample Ns value" />
-        </when>
-    </conditional>
- <conditional name="reference_cond">
-        <param name="reference" type="select" label="Reference genome file" help="If path to reference specified, then whole genome will be outputted.">
-          <option value="yes">Specify</option>
-          <option value="no" selected="true">Do not specify</option>
-        </param>
-        <when value="yes">
- <param name="ref_fa" type="data" format="fasta" label="Reference fasta file" help="Fasta format"/>
-        </when>
-    </conditional>
- <conditional name="include_cond">
-        <param name="include" type="select" label="Include region" help="Specify regions to include in a bed file.">
-          <option value="yes">Specify</option>
-          <option value="no" selected="true">Do not specify</option>
-        </param>
-        <when value="yes">
- <param name="in_bed" type="data" format="bed" label="Include regions bed file" help="bed format"/>
-        </when>
-    </conditional>
- <conditional name="exclude_cond">
-        <param name="exclude" type="select" label="Exclude region" help="Specify regions to exclude in a bed file.">
-          <option value="yes">Specify</option>
-          <option value="no" selected="true">Do not specify</option>
-        </param>
-        <when value="yes">
- <param name="ex_bed" type="data" format="bed" label="Exclude regions bed file" help="bed format"/>
-        </when>
-    </conditional>
-  </inputs>
-
-  <outputs>
-    <data format="fasta" name="output" label="${tool.name} on ${on_string}: FASTA file" />
-  </outputs>
-  <test>
- <param name="input_vcfs" value="1_short.vcf" ftype="vcf" />
- <param name="input_vcfs" value="2_short.vcf" ftype="vcf" />
- <output name="output" file="testresult.fa" ftype="fasta" />
-  </test>
-  <help>
-
-  </help>
-</tool>