Repository 'vcfs2fasta'
hg clone https://toolshed.g2.bx.psu.edu/repos/ulfschaefer/vcfs2fasta

Changeset 14:f72039c5faa4 (2015-12-16)
Previous changeset 13:2e69ce9dca65 (2015-12-16) Next changeset 15:fd42dbe82c12 (2015-12-16)
Commit message:
Uploaded
added:
LICENSE
phe/__init__.py
phe/__init__.pyc
phe/metadata/__init__.py
phe/variant/GATKVariantCaller.py
phe/variant/MPileupVariantCaller.py
phe/variant/__init__.py
phe/variant/variant_factory.py
phe/variant_filters/__init__.py
phe/variant_filters/__init__.pyc
test-data/1_short.vcf
test-data/2_short.vcf
test-data/testresult.fa
tool_dependencies.xml
vcfs2fasta.py
vcfs2fasta.sh
vcfs2fasta.xml
removed:
vcfs2fasta/phe/metadata/__init__.py
vcfs2fasta/phe/variant/GATKVariantCaller.py
vcfs2fasta/phe/variant_filters/__init__.py
vcfs2fasta/test-data/1_short.vcf
b
diff -r 2e69ce9dca65 -r f72039c5faa4 LICENSE
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/LICENSE Wed Dec 16 07:29:05 2015 -0500
b
b'@@ -0,0 +1,676 @@\n+\n+\n+                    GNU GENERAL PUBLIC LICENSE\n+                       Version 3, 29 June 2007\n+\n+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>\n+ Everyone is permitted to copy and distribute verbatim copies\n+ of this license document, but changing it is not allowed.\n+\n+                            Preamble\n+\n+  The GNU General Public License is a free, copyleft license for\n+software and other kinds of works.\n+\n+  The licenses for most software and other practical works are designed\n+to take away your freedom to share and change the works.  By contrast,\n+the GNU General Public License is intended to guarantee your freedom to\n+share and change all versions of a program--to make sure it remains free\n+software for all its users.  We, the Free Software Foundation, use the\n+GNU General Public License for most of our software; it applies also to\n+any other work released this way by its authors.  You can apply it to\n+your programs, too.\n+\n+  When we speak of free software, we are referring to freedom, not\n+price.  Our General Public Licenses are designed to make sure that you\n+have the freedom to distribute copies of free software (and charge for\n+them if you wish), that you receive source code or can get it if you\n+want it, that you can change the software or use pieces of it in new\n+free programs, and that you know you can do these things.\n+\n+  To protect your rights, we need to prevent others from denying you\n+these rights or asking you to surrender the rights.  Therefore, you have\n+certain responsibilities if you distribute copies of the software, or if\n+you modify it: responsibilities to respect the freedom of others.\n+\n+  For example, if you distribute copies of such a program, whether\n+gratis or for a fee, you must pass on to the recipients the same\n+freedoms that you received.  You must make sure that they, too, receive\n+or can get the source code.  And you must show them these terms so they\n+know their rights.\n+\n+  Developers that use the GNU GPL protect your rights with two steps:\n+(1) assert copyright on the software, and (2) offer you this License\n+giving you legal permission to copy, distribute and/or modify it.\n+\n+  For the developers\' and authors\' protection, the GPL clearly explains\n+that there is no warranty for this free software.  For both users\' and\n+authors\' sake, the GPL requires that modified versions be marked as\n+changed, so that their problems will not be attributed erroneously to\n+authors of previous versions.\n+\n+  Some devices are designed to deny users access to install or run\n+modified versions of the software inside them, although the manufacturer\n+can do so.  This is fundamentally incompatible with the aim of\n+protecting users\' freedom to change the software.  The systematic\n+pattern of such abuse occurs in the area of products for individuals to\n+use, which is precisely where it is most unacceptable.  Therefore, we\n+have designed this version of the GPL to prohibit the practice for those\n+products.  If such problems arise substantially in other domains, we\n+stand ready to extend this provision to those domains in future versions\n+of the GPL, as needed to protect the freedom of users.\n+\n+  Finally, every program is threatened constantly by software patents.\n+States should not allow patents to restrict development and use of\n+software on general-purpose computers, but in those that do, we wish to\n+avoid the special danger that patents applied to a free program could\n+make it effectively proprietary.  To prevent this, the GPL assures that\n+patents cannot be used to render the program non-free.\n+\n+  The precise terms and conditions for copying, distribution and\n+modification follow.\n+\n+                       TERMS AND CONDITIONS\n+\n+  0. Definitions.\n+\n+  "This License" refers to version 3 of the GNU General Public License.\n+\n+  "Copyright" also means copyright-like laws that apply to other kinds of\n+works, such as semiconductor masks.\n+\n+  "The Program" refers '..b'CE OF THE PROGRAM\n+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF\n+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.\n+\n+  16. Limitation of Liability.\n+\n+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING\n+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS\n+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY\n+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE\n+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF\n+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD\n+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),\n+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF\n+SUCH DAMAGES.\n+\n+  17. Interpretation of Sections 15 and 16.\n+\n+  If the disclaimer of warranty and limitation of liability provided\n+above cannot be given local legal effect according to their terms,\n+reviewing courts shall apply local law that most closely approximates\n+an absolute waiver of all civil liability in connection with the\n+Program, unless a warranty or assumption of liability accompanies a\n+copy of the Program in return for a fee.\n+\n+                     END OF TERMS AND CONDITIONS\n+\n+            How to Apply These Terms to Your New Programs\n+\n+  If you develop a new program, and you want it to be of the greatest\n+possible use to the public, the best way to achieve this is to make it\n+free software which everyone can redistribute and change under these terms.\n+\n+  To do so, attach the following notices to the program.  It is safest\n+to attach them to the start of each source file to most effectively\n+state the exclusion of warranty; and each file should have at least\n+the "copyright" line and a pointer to where the full notice is found.\n+\n+    {one line to give the program\'s name and a brief idea of what it does.}\n+    Copyright (C) {year}  {name of author}\n+\n+    This program is free software: you can redistribute it and/or modify\n+    it under the terms of the GNU General Public License as published by\n+    the Free Software Foundation, either version 3 of the License, or\n+    (at your option) any later version.\n+\n+    This program is distributed in the hope that it will be useful,\n+    but WITHOUT ANY WARRANTY; without even the implied warranty of\n+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n+    GNU General Public License for more details.\n+\n+    You should have received a copy of the GNU General Public License\n+    along with this program.  If not, see <http://www.gnu.org/licenses/>.\n+\n+Also add information on how to contact you by electronic and paper mail.\n+\n+  If the program does terminal interaction, make it output a short\n+notice like this when it starts in an interactive mode:\n+\n+    {project}  Copyright (C) {year}  {fullname}\n+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w\'.\n+    This is free software, and you are welcome to redistribute it\n+    under certain conditions; type `show c\' for details.\n+\n+The hypothetical commands `show w\' and `show c\' should show the appropriate\n+parts of the General Public License.  Of course, your program\'s commands\n+might be different; for a GUI interface, you would use an "about box".\n+\n+  You should also get your employer (if you work as a programmer) or school,\n+if any, to sign a "copyright disclaimer" for the program, if necessary.\n+For more information on this, and how to apply and follow the GNU GPL, see\n+<http://www.gnu.org/licenses/>.\n+\n+  The GNU General Public License does not permit incorporating your program\n+into proprietary programs.  If your program is a subroutine library, you\n+may consider it more useful to permit linking proprietary applications with\n+the library.  If this is what you want to do, use the GNU Lesser General\n+Public License instead of this License.  But first, please read\n+<http://www.gnu.org/philosophy/why-not-lgpl.html>.\n'
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/__init__.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/phe/__init__.py Wed Dec 16 07:29:05 2015 -0500
b
@@ -0,0 +1,4 @@
+if __name__ == "phe":
+    # If this package is added as library, append extended path.
+    from pkgutil import extend_path
+    __path__ = extend_path(__path__, __name__)
\ No newline at end of file
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/__init__.pyc
b
Binary file phe/__init__.pyc has changed
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/metadata/__init__.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/phe/metadata/__init__.py Wed Dec 16 07:29:05 2015 -0500
b
@@ -0,0 +1,16 @@
+"""Metadata related information."""
+
+import abc
+
+class PHEMetaData(object):
+    """Abstract class to provide interface for meta-data creation."""
+
+    __metaclass__ = abc.ABCMeta
+
+    def __init__(self):
+        pass
+
+    @abc.abstractmethod
+    def get_meta(self):
+        """Get the metadata."""
+        raise NotImplementedError("get meta has not been implemented yet.")
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/variant/GATKVariantCaller.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/phe/variant/GATKVariantCaller.py Wed Dec 16 07:29:05 2015 -0500
[
@@ -0,0 +1,126 @@
+'''
+Created on 22 Sep 2015
+
+@author: alex
+'''
+from collections import OrderedDict
+import logging
+import os
+import subprocess
+
+from phe.variant import VariantCaller
+
+
+class GATKVariantCaller(VariantCaller):
+    """Implemetation of the Broad institute's variant caller."""
+
+    name = "gatk"
+    """Plain text name of the variant caller."""
+
+    _default_options = "--sample_ploidy 2 --genotype_likelihoods_model BOTH -rf BadCigar -out_mode EMIT_ALL_SITES -nt 1"
+    """Default options for the variant caller."""
+
+    def __init__(self, cmd_options=None):
+        """Constructor"""
+        if cmd_options is None:
+            cmd_options = self._default_options
+
+        super(GATKVariantCaller, self).__init__(cmd_options=cmd_options)
+
+        self.last_command = None
+
+    def get_info(self, plain=False):
+        d = {"name": "gatk", "version": self.get_version(), "command": self.last_command}
+
+        if plain:
+            result = "GATK(%(version)s): %(command)s" % d
+        else:
+            result = OrderedDict(d)
+
+        return result
+
+    def get_version(self):
+
+        p = subprocess.Popen(["java", "-jar", os.environ["GATK_JAR"], "-version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+        (output, _) = p.communicate()
+
+        # last character is EOL.
+        version = output.split("\n")[-2]
+
+        return version
+
+    def make_vcf(self, *args, **kwargs):
+        ref = kwargs.get("ref")
+        bam = kwargs.get("bam")
+
+        if kwargs.get("vcf_file") is None:
+            kwargs["vcf_file"] = "variants.vcf"
+
+        opts = {"ref": os.path.abspath(ref),
+                "bam": os.path.abspath(bam),
+                "gatk_jar": os.environ["GATK_JAR"],
+                "all_variants_file": os.path.abspath(kwargs.get("vcf_file")),
+                "extra_cmd_options": self.cmd_options}
+
+#         if not self.create_aux_files(ref):
+#             logging.warn("Auxiliary files were not created.")
+#             return False
+
+        # Call variants
+        # FIXME: Sample ploidy = 2?
+        os.environ["GATK_JAR"]
+        cmd = "java -XX:+UseSerialGC -jar %(gatk_jar)s -T UnifiedGenotyper -R %(ref)s -I %(bam)s -o %(all_variants_file)s %(extra_cmd_options)s" % opts
+        success = os.system(cmd)
+
+        if success != 0:
+            logging.warn("Calling variants returned non-zero exit status.")
+            return False
+
+        self.last_command = cmd
+
+        return True
+
+    def create_aux_files(self, ref):
+        """Create auxiliary files needed for this variant.
+
+        Tools needed: samtools and picard tools. Picard tools is a Java
+        library that can be defined using environment variable: PICARD_JAR
+        specifying path to picard.jar or PICARD_TOOLS_PATH specifying path
+        to the directory where separate jars are (older version before jars
+        were merged into a single picard.jar).
+        Parameters:
+        -----------
+        ref: str
+            Path to the reference file.
+        
+        Returns:
+        --------
+        bool:
+            True if auxiliary files were created, False otherwise.
+        """
+
+        ref_name, _ = os.path.splitext(ref)
+
+        success = os.system("samtools faidx %s" % ref)
+
+        if success != 0:
+            logging.warn("Fasta index could not be created.")
+            return False
+
+        d = {"ref": ref, "ref_name": ref_name}
+
+        if os.environ.get("PICARD_TOOLS_PATH"):
+            d["picard_tools_path"] = os.path.join(os.environ["PICARD_TOOLS_PATH"], "CreateSequenceDictionary.jar")
+        elif os.environ.get("PICARD_JAR"):
+            # This is used in newer version of PICARD tool where multiple
+            #    jars were merged into a single jar file.
+            d["picard_tools_path"] = "%s %s" % (os.environ["PICARD_JAR"], "CreateSequenceDictionary")
+        else:
+            logging.error("Picard tools are not present in the path.")
+            return False
+
+        success = os.system("java -jar %(picard_tools_path)s R=%(ref)s O=%(ref_name)s.dict" % d)
+
+        if success != 0:
+            logging.warn("Dictionary for the %s reference could not be created", ref)
+            return False
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/variant/MPileupVariantCaller.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/phe/variant/MPileupVariantCaller.py Wed Dec 16 07:29:05 2015 -0500
[
@@ -0,0 +1,96 @@
+'''
+Created on 22 Sep 2015
+
+@author: alex
+'''
+from collections import OrderedDict
+import logging
+import os
+import subprocess
+import tempfile
+
+from phe.variant import VariantCaller
+
+
+class MPileupVariantCaller(VariantCaller):
+    """Implemetation of the Broad institute's variant caller."""
+
+    name = "mpileup"
+    """Plain text name of the variant caller."""
+
+    _default_options = "-m -f GQ"
+    """Default options for the variant caller."""
+
+    def __init__(self, cmd_options=None):
+        """Constructor"""
+        if cmd_options is None:
+            cmd_options = self._default_options
+
+        super(MPileupVariantCaller, self).__init__(cmd_options=cmd_options)
+
+        self.last_command = None
+
+    def get_info(self, plain=False):
+        d = {"name": self.name, "version": self.get_version(), "command": self.last_command}
+
+        if plain:
+            result = "mpileup(%(version)s): %(command)s" % d
+        else:
+            result = OrderedDict(d)
+
+        return result
+
+    def get_version(self):
+
+        p = subprocess.Popen(["samtools", "--version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+        (output, _) = p.communicate()
+
+        # first line is the version of the samtools
+        version = output.split("\n")[0].split(" ")[1]
+
+        return version
+
+    def make_vcf(self, *args, **kwargs):
+        ref = kwargs.get("ref")
+        bam = kwargs.get("bam")
+
+        if kwargs.get("vcf_file") is None:
+            kwargs["vcf_file"] = "variants.vcf"
+
+        opts = {"ref": os.path.abspath(ref),
+                "bam": os.path.abspath(bam),
+                "all_variants_file": os.path.abspath(kwargs.get("vcf_file")),
+                "extra_cmd_options": self.cmd_options}
+
+        with tempfile.NamedTemporaryFile(suffix=".pileup") as tmp:
+            opts["pileup_file"] = tmp.name
+            cmd = "samtools mpileup -t DP,DV,DP4,DPR,SP -Auf %(ref)s %(bam)s | bcftools call %(extra_cmd_options)s > %(all_variants_file)s" % opts
+            print cmd
+            self.last_command = cmd
+            if os.system(cmd) != 0:
+                logging.warn("Pileup creation was not successful.")
+                return False
+
+        return True
+
+    def create_aux_files(self, ref):
+        """Index reference with faidx from samtools.
+
+        Parameters:
+        -----------
+        ref: str
+            Path to the reference file.
+
+        Returns:
+        --------
+        bool:
+            True if auxiliary files were created, False otherwise.
+        """
+
+        success = os.system("samtools faidx %s" % ref)
+
+        if success != 0:
+            logging.warn("Fasta index could not be created.")
+            return False
+
+        return True
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/variant/__init__.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/phe/variant/__init__.py Wed Dec 16 07:29:05 2015 -0500
[
b'@@ -0,0 +1,288 @@\n+"""Classes and methods to work with variants and such."""\n+import abc\n+#ulf\n+# from collections import OrderedDict\n+try:\n+    from collections import OrderedDict\n+except ImportError:\n+    from ordereddict import OrderedDict\n+    \n+import logging\n+import pickle\n+\n+from vcf import filters\n+import vcf\n+from vcf.parser import _Filter\n+\n+from phe.metadata import PHEMetaData\n+from phe.variant_filters import make_filters, PHEFilterBase, str_to_filters\n+\n+\n+class VCFTemplate(object):\n+    """This is a small hack class for the Template used in generating\n+    VCF file."""\n+\n+    def __init__(self, vcf_reader):\n+        self.infos = vcf_reader.infos\n+        self.formats = vcf_reader.formats\n+        self.filters = vcf_reader.filters\n+        self.alts = vcf_reader.alts\n+        self.contigs = vcf_reader.contigs\n+        self.metadata = vcf_reader.metadata\n+        self._column_headers = vcf_reader._column_headers\n+        self.samples = vcf_reader.samples\n+\n+class VariantSet(object):\n+    """A convenient representation of set of variants.\n+    TODO: Implement iterator and generator for the variant set.\n+    """\n+\n+    _reader = None\n+\n+    def __init__(self, vcf_in, filters=None):\n+        """Constructor of variant set.\n+\n+        Parameters:\n+        -----------\n+        vcf_in: str\n+            Path to the VCF file for loading information.\n+        filters: str or dict, optional\n+            Dictionary or string of the filter:threshold key value pairs.\n+        """\n+        self.vcf_in = vcf_in\n+        self._reader = vcf.Reader(filename=vcf_in)\n+        self.out_template = VCFTemplate(self._reader)\n+\n+        self.filters = []\n+        if filters is not None:\n+            if isinstance(filters, str):\n+                self.filters = str_to_filters(filters)\n+            elif isinstance(filters, dict):\n+                self.filters = make_filters(config=filters)\n+            elif isinstance(filters, list):\n+                self.filters = filters\n+            else:\n+                logging.warn("Could not create filters from %s", filters)\n+        else:\n+            reader = vcf.Reader(filename=self.vcf_in)\n+            filters = {}\n+            for filter_id in reader.filters:\n+                filters.update(PHEFilterBase.decode(filter_id))\n+\n+            if filters:\n+                self.filters = make_filters(config=filters)\n+\n+        self.variants = []\n+\n+    def filter_variants(self, keep_only_snps=True):\n+        """Create a variant """\n+\n+        if self._reader is None:\n+            # Create a reader class from input VCF.\n+            self._reader = vcf.Reader(filename=self.vcf_in)\n+\n+        # get list of existing filters.\n+        existing_filters = {}\n+        removed_filters = []\n+\n+        for filter_id in self._reader.filters:\n+            conf = PHEFilterBase.decode(filter_id)\n+            tuple(conf.keys())\n+            existing_filters.update({tuple(conf.keys()):filter_id})\n+\n+        # Add each filter we are going to use to the record.\n+        # This is needed for writing out proper #FILTER header in VCF.\n+        for record_filter in self.filters:\n+            # We know that each filter has short description method.\n+            short_doc = record_filter.short_desc()\n+            short_doc = short_doc.split(\'\\n\')[0].lstrip()\n+\n+            filter_name = PHEFilterBase.decode(record_filter.filter_name())\n+\n+            # Check if the sample has been filtered for this type of filter\n+            #    in the past. If so remove is, because it is going to be refiltered.\n+            if tuple(filter_name) in existing_filters:\n+                logging.info("Removing existing filter: %s", existing_filters[tuple(filter_name)])\n+                removed_filters.append(existing_filters[tuple(filter_name)])\n+                del self._reader.filters[existing_filters[tuple(filter_name)]]\n+\n+            self._reader.filters[record_filter.filter_name()] = _Filter(record_filter.filter_name(), short_doc)\n+\n+        '..b'    (default: False).\n+\n+        Returns:\n+        int:\n+            Number of records written.\n+        """\n+        written_variants = 0\n+        with open(vcf_out, "w") as out_vcf:\n+            writer = vcf.Writer(out_vcf, self.out_template)\n+            for record in self.variants:\n+\n+                if only_snps and not record.is_snp:\n+                    continue\n+\n+                if only_good and record.FILTER != "PASS" or record.FILTER is None:\n+                    continue\n+\n+                writer.write_record(record)\n+                written_variants += 1\n+\n+        return written_variants\n+\n+    def _write_bad_variants(self, vcf_out):\n+        """**PRIVATE:** Write only those records that **haven\'t** passed."""\n+        written_variants = 0\n+        with open(vcf_out, "w") as out_vcf:\n+            writer = vcf.Writer(out_vcf, self.out_template)\n+            for record in self.variants:\n+                if record.FILTER != "PASS" and record.FILTER is not None:\n+                    writer.write_record(record)\n+                    written_variants += 1\n+        return written_variants\n+\n+    def serialise(self, out_file):\n+        """Save the data in this class to a file for future use/reload.\n+\n+        Parameters:\n+        -----------\n+        out_file: str\n+            path to file where the data should be written to.\n+\n+        Returns:\n+        --------\n+        int:\n+            Number of variants written.\n+        """\n+        written_variants = 0\n+        with open(out_file, "w") as out_vcf:\n+            writer = vcf.Writer(out_vcf, self.out_template)\n+            for record in self.variants:\n+                writer.write_record(record)\n+                written_variants += 1\n+\n+        return written_variants\n+\n+    def update_filters(self, new_filters):\n+        """Update internal filters in the output template."""\n+        for new_filter, filter_data in new_filters.items():\n+            self.out_template.filters[new_filter] = filter_data\n+\n+\n+class VariantCaller(PHEMetaData):\n+    """Abstract class used for access to the implemented variant callers."""\n+\n+    __metaclass__ = abc.ABCMeta\n+\n+    def __init__(self, cmd_options=None):\n+        """Constructor for variant caller.\n+\n+        Parameters:\n+        -----------\n+        cmd_options: str, optional\n+            Command options to pass to the variant command.\n+        """\n+        self.cmd_options = cmd_options\n+\n+        super(VariantCaller, self).__init__()\n+\n+    @abc.abstractmethod\n+    def make_vcf(self, *args, **kwargs):\n+        """Create a VCF from **BAM** file.\n+\n+        Parameters:\n+        -----------\n+        ref: str\n+            Path to the reference file.\n+        bam: str\n+            Path to the indexed **BAM** file for calling variants.\n+        vcf_file: str\n+            path to the VCF file where data will be written to.\n+\n+        Returns:\n+        --------\n+        bool:\n+            True if variant calling was successful, False otherwise.\n+        """\n+        raise NotImplementedError("make_vcf is not implemented yet.")\n+\n+    @abc.abstractmethod\n+    def create_aux_files(self, ref):\n+        """Create needed (if any) auxiliary files.\n+        These files are required for proper functioning of the variant caller.\n+        """\n+        raise NotImplementedError("create_aux_files is not implemeted.")\n+\n+    @abc.abstractmethod\n+    def get_info(self, plain=False):\n+        """Get information about this variant caller."""\n+        raise NotImplementedError("Get info has not been implemented yet."\n+                                  )\n+    def get_meta(self):\n+        """Get the metadata about this variant caller."""\n+        od = self.get_info()\n+        od["ID"] = "VariantCaller"\n+        return OrderedDict({"PHEVariantMetaData": [od]})\n+\n+    @abc.abstractmethod\n+    def get_version(self):\n+        """Get the version of the underlying command used."""\n+        raise NotImplementedError("Get version has not been implemented yet.")\n'
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/variant/variant_factory.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/phe/variant/variant_factory.py Wed Dec 16 07:29:05 2015 -0500
[
@@ -0,0 +1,92 @@
+'''Classes and functions for working with variant callers.
+
+Created on 22 Sep 2015
+
+@author: alex
+'''
+import glob
+import inspect
+import logging
+import os
+import sys
+
+from phe.variant import VariantCaller
+
+def dynamic_caller_loader():
+    """Fancy way of dynamically importing existing variant callers.
+    
+    Returns
+    -------
+    dict:
+        Available variant callers dictionary. Keys are parameters that
+        can be used to call variants.
+    """
+
+    # We assume the caller are in the same directory as THIS file.
+    variants_dir = os.path.dirname(__file__)
+    variants_dir = os.path.abspath(variants_dir)
+
+    # This is populated when the module is first imported.
+    avail_callers = {}
+
+    # Add this directory to the syspath.
+    sys.path.insert(0, variants_dir)
+
+    # Find all "py" files.
+    for caller_mod in glob.glob(os.path.join(variants_dir, "*.py")):
+
+        # Derive name of the module where caller is.
+        caller_mod_file = os.path.basename(caller_mod)
+
+        # Ignore __init__ file, only base class is there.
+        if caller_mod_file.startswith("__init__"):
+            continue
+
+        # Import the module with a caller.
+        mod = __import__(caller_mod_file.replace(".pyc", "").replace(".py", ""))
+
+        # Find all the classes contained in this module.
+        classes = inspect.getmembers(mod, inspect.isclass)
+        for cls_name, cls in classes:
+            # For each class, if it is a sublass of VariantCaller, add it.
+            if cls_name != "VariantCaller" and issubclass(cls, VariantCaller):
+                # The name is inherited and defined within each caller.
+                avail_callers[cls.name] = cls
+
+    sys.path.remove(variants_dir)
+
+    return avail_callers
+
+_avail_variant_callers = dynamic_caller_loader()
+
+def available_callers():
+    """Return list of available variant callers."""
+    return _avail_variant_callers.keys()
+
+def factory(variant=None, custom_options=None):
+    """Make an instance of a variant class.
+    
+    Parameters:
+    -----------
+    variant: str, optional
+        Name of the variant class to instantiate.
+    custom_options: str, optional
+        Custom options to be passed directly to the implementing class.
+    
+    Returns:
+    --------
+    :py:class:`phe.variant.VariantCaller`:
+        Instance of the :py:class:`phe.variant.VariantCaller` for given
+        variant name, or None if one couldn't be found.
+    """
+    if variant is not None and isinstance(variant, str):
+
+        variant = variant.lower()
+        if variant in _avail_variant_callers:
+            return _avail_variant_callers[variant](cmd_options=custom_options)
+        else:
+            logging.error("No implementation for %s mapper.")
+            return None
+
+    logging.warn("Unknown parameters. Mapper could not be initialised.")
+    return None
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/variant_filters/__init__.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/phe/variant_filters/__init__.py Wed Dec 16 07:29:05 2015 -0500
[
@@ -0,0 +1,209 @@
+"""Classes and functions for working with variant filters."""
+
+from __builtin__ import __import__
+from abc import abstractproperty
+import abc
+import argparse
+import glob
+import inspect
+import logging
+import os
+import re
+import sys
+
+import vcf
+import vcf.filters
+from vcf.parser import _Filter
+
+IUPAC_CODES = {frozenset(["A", "G"]): "R",
+                frozenset(["C", "T"]): "Y",
+                frozenset(["G", "C"]): "S",
+                frozenset(["A", "T"]): "W",
+                frozenset(["G", "T"]): "K",
+                frozenset(["A", "C"]): "M",
+                frozenset(["C", "G", "T"]): "B",
+                frozenset(["A", "G", "T"]): "D",
+                frozenset(["A", "C", "T"]): "H",
+                frozenset(["A", "C", "G"]): "V"
+              }
+
+class PHEFilterBase(vcf.filters.Base):
+    """Base class for VCF filters."""
+    __meta__ = abc.ABCMeta
+
+    magic_sep = ":"
+    decoder_pattern = re.compile(magic_sep)
+
+    @abc.abstractproperty
+    def parameter(self):
+        """Short name of parameter being filtered."""
+        return self.parameter
+
+    @abc.abstractproperty
+    def _default_threshold(self):
+        """Default threshold for filtering."""
+        return self._default_threshold
+
+    def __init__(self, args):
+        super(PHEFilterBase, self).__init__(args)
+
+        # Change the threshold to custom gq value.
+        self.threshold = self._default_threshold
+
+        if isinstance(args, dict):
+            self.threshold = args.get(self.parameter)
+
+    def __str__(self):
+        return self.filter_name()
+
+    @abc.abstractmethod
+    def short_desc(self):
+        """Short description of the filter (included in VCF)."""
+        raise NotImplementedError("Get short description is not implemented.")
+
+    def get_config(self):
+        """This is used for reconstructing filter."""
+        return {self.parameter: self.threshold}
+
+    def filter_name(self):
+        """Create filter names by their parameter separated by magic.
+        E.g. if filter parameter is ad_ratio and threshold is 0.9 then
+        ad_ratio:0.9 if the filter name.
+        """
+        return "%s%s%s" % (self.parameter, self.magic_sep, self.threshold)
+
+    @staticmethod
+    def decode(filter_id):
+        """Decode name of filter."""
+        conf = {}
+
+        if PHEFilterBase.magic_sep in filter_id:
+            info = PHEFilterBase.decoder_pattern.split(filter_id)
+            assert len(info) == 2
+            conf[info[0]] = info[1]
+        return conf
+
+    def is_gap(self):
+        return False
+
+    def is_n(self):
+        return True
+
+    @staticmethod
+    def call_concensus(record):
+        extended_code = "N"
+        try:
+            sample_ad = set([str(c) for c in record.ALT] + [record.REF])
+
+
+            for code, cov in IUPAC_CODES.items():
+                if sample_ad == cov:
+                    extended_code = code
+                    break
+        except AttributeError:
+            extended_code = "N"
+
+        return extended_code
+
+def dynamic_filter_loader():
+    """Fancy way of dynamically importing existing filters.
+    
+    Returns
+    -------
+    dict:
+        Available filters dictionary. Keys are parameters that
+        can be supplied to the filters.
+    """
+
+    # We assume the filters are in the same directory as THIS file.
+    filter_dir = os.path.dirname(__file__)
+    filter_dir = os.path.abspath(filter_dir)
+
+    # This is populated when the module is first imported.
+    avail_filters = {}
+
+    # Add this directory to the syspath.
+    sys.path.insert(0, filter_dir)
+
+    # Find all "py" files.
+    for filter_mod in glob.glob(os.path.join(filter_dir, "*.py")):
+
+        # Derive name of the module where filter is.
+        filter_mod_file = os.path.basename(filter_mod)
+
+        # Ignore this file, obviously.
+        if filter_mod_file.startswith("__init__"):
+            continue
+
+        # Import the module with a filter.
+        mod = __import__(filter_mod_file.replace(".pyc", "").replace(".py", ""))
+
+        # Find all the classes contained in this module.
+        classes = inspect.getmembers(mod, inspect.isclass)
+        for cls_name, cls in classes:
+            # For each class, if it is a sublass of PHEFilterBase, add it.
+            if cls_name != "PHEFilterBase" and issubclass(cls, PHEFilterBase):
+                # The parameters are inherited and defined within each filter.
+                avail_filters[cls.parameter] = cls
+
+    sys.path.remove(filter_dir)
+
+    return avail_filters
+
+_avail_filters = dynamic_filter_loader()
+
+def available_filters():
+    """Return list of available filters."""
+    return _avail_filters.keys()
+
+def str_to_filters(filters):
+    """Convert from filter string to array of filters.
+    E.g. ad_ration:0.9,min_depth:5
+    
+    Parameters:
+    -----------
+    filters: str
+        String version of filters, separated by comma.
+    
+    Returns:
+    --------
+    list:
+        List of :py:class:`phe.variant_filters.PHEFilterBase` instances.
+    """
+
+    config = {}
+    for kv_pair in filters.split(","):
+        pair = kv_pair.split(":")
+        assert len(pair) == 2, "Filters should be separated by ':' %s" % kv_pair
+
+        # We don't care about casting them to correct type because Filters
+        #    will do it for us.
+        config[pair[0]] = pair[1]
+
+    return make_filters(config)
+
+def make_filters(config):
+    """Create a list of filters from *config*.
+    
+    Parameters:
+    -----------
+    config: dict, optional
+        Dictionary with parameter: value pairs. For each parameter, an
+        appropriate Filter will be found and instanciated.
+        
+    Returns:
+    --------
+    list:
+        List of :py:class:`PHEFilterBase` filters.
+    """
+    filters = []
+
+    if config:
+        for custom_filter in config:
+            if custom_filter in _avail_filters:
+                filters.append(_avail_filters[custom_filter](config))
+            else:
+                logging.warn("Could not find appropriate filter for %s",
+                             custom_filter)
+
+    return filters
b
diff -r 2e69ce9dca65 -r f72039c5faa4 phe/variant_filters/__init__.pyc
b
Binary file phe/variant_filters/__init__.pyc has changed
b
diff -r 2e69ce9dca65 -r f72039c5faa4 test-data/1_short.vcf
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/1_short.vcf Wed Dec 16 07:29:05 2015 -0500
[
b'@@ -0,0 +1,50000 @@\n+##fileformat=VCFv4.1\n+##FILTER=<ID=LowQual,Description="Low quality">\n+##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">\n+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">\n+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">\n+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n+##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">\n+##GATKCommandLine=<ID=SelectVariants,Version=2.6-5-gba531bd,Date="Thu May 22 09:24:36 BST 2014",Epoch=1400747076888,CommandLineOptions="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/input_variant.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] keepIDs=null fullyDecode=false forceGenotypesDecode=false justRead=false maxIndelSize=2147483647 ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">\n+##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-5-gba531bd,Date="Thu May 22 09:10:24 BST 2014",Epoch=1400746224902,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input_0.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=NONE downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOrigina'..b'.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49922\t.\tT\t.\t434.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49923\t.\tC\t.\t434.23\t.\tAN=2;DP=143;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n+gi|15829254|ref|NC_002695.1|\t49924\t.\tA\t.\t428.23\t.\tAN=2;DP=141;MQ=60.00;MQ0=0\tGT:DP\t0/0:141\n+gi|15829254|ref|NC_002695.1|\t49925\t.\tA\t.\t440.23\t.\tAN=2;DP=149;MQ=60.00;MQ0=0\tGT:DP\t0/0:149\n+gi|15829254|ref|NC_002695.1|\t49926\t.\tT\t.\t440.23\t.\tAN=2;DP=152;MQ=60.00;MQ0=0\tGT:DP\t0/0:152\n+gi|15829254|ref|NC_002695.1|\t49927\t.\tG\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49928\t.\tT\t.\t452.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n+gi|15829254|ref|NC_002695.1|\t49929\t.\tC\t.\t443.23\t.\tAN=2;DP=153;MQ=60.00;MQ0=0\tGT:DP\t0/0:153\n+gi|15829254|ref|NC_002695.1|\t49930\t.\tG\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49931\t.\tA\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49932\t.\tT\t.\t461.23\t.\tAN=2;DP=159;MQ=60.00;MQ0=0\tGT:DP\t0/0:159\n+gi|15829254|ref|NC_002695.1|\t49933\t.\tG\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49934\t.\tA\t.\t455.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49935\t.\tA\t.\t452.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49936\t.\tG\t.\t455.23\t.\tAN=2;DP=158;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n+gi|15829254|ref|NC_002695.1|\t49937\t.\tA\t.\t440.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49938\t.\tG\t.\t449.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n+gi|15829254|ref|NC_002695.1|\t49939\t.\tC\t.\t443.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49940\t.\tA\t.\t440.23\t.\tAN=2;DP=154;MQ=60.00;MQ0=0\tGT:DP\t0/0:154\n+gi|15829254|ref|NC_002695.1|\t49941\t.\tT\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49942\t.\tC\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49943\t.\tC\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49944\t.\tG\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49945\t.\tC\t.\t431.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49946\t.\tA\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49947\t.\tC\t.\t431.23\t.\tAN=2;DP=145;MQ=60.00;MQ0=0\tGT:DP\t0/0:145\n+gi|15829254|ref|NC_002695.1|\t49948\t.\tA\t.\t404.23\t.\tAN=2;DP=137;MQ=60.00;MQ0=0\tGT:DP\t0/0:137\n+gi|15829254|ref|NC_002695.1|\t49949\t.\tT\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n+gi|15829254|ref|NC_002695.1|\t49950\t.\tT\t.\t419.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49951\t.\tG\t.\t431.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49952\t.\tT\t.\t437.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49953\t.\tT\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49954\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49955\t.\tT\t.\t428.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49956\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49957\t.\tA\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49958\t.\tA\t.\t416.23\t.\tAN=2;DP=140;MQ=60.00;MQ0=0\tGT:DP\t0/0:140\n+gi|15829254|ref|NC_002695.1|\t49959\t.\tA\t.\t407.23\t.\tAN=2;DP=138;MQ=60.00;MQ0=0\tGT:DP\t0/0:138\n+gi|15829254|ref|NC_002695.1|\t49960\t.\tG\t.\t395.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n+gi|15829254|ref|NC_002695.1|\t49961\t.\tC\t.\t404.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n+gi|15829254|ref|NC_002695.1|\t49962\t.\tC\t.\t392.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n+gi|15829254|ref|NC_002695.1|\t49963\t.\tG\t.\t398.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n+gi|15829254|ref|NC_002695.1|\t49964\t.\tA\t.\t398.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n'
b
diff -r 2e69ce9dca65 -r f72039c5faa4 test-data/2_short.vcf
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/2_short.vcf Wed Dec 16 07:29:05 2015 -0500
[
b'@@ -0,0 +1,50000 @@\n+##fileformat=VCFv4.1\n+##FILTER=<ID=LowQual,Description="Low quality">\n+##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">\n+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">\n+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">\n+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n+##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">\n+##GATKCommandLine=<ID=SelectVariants,Version=2.6-5-gba531bd,Date="Thu May 22 09:15:54 BST 2014",Epoch=1400746554393,CommandLineOptions="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-4mpB9s/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/phengs/galaxy/database/tmp/tmp-gatk-4mpB9s/input_variant.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] keepIDs=null fullyDecode=false forceGenotypesDecode=false justRead=false maxIndelSize=2147483647 ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">\n+##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-5-gba531bd,Date="Thu May 22 09:06:54 BST 2014",Epoch=1400746014908,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/phengs/galaxy/database/tmp/tmp-gatk-Gz3f1A/gatk_input_0.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-Gz3f1A/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=NONE downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOrigina'..b'AN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49921\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49922\t.\tT\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49923\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49924\t.\tA\t.\t151.23\t.\tAN=2;DP=45;MQ=60.00;MQ0=0\tGT:DP\t0/0:45\n+gi|15829254|ref|NC_002695.1|\t49925\t.\tA\t.\t163.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49926\t.\tT\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49927\t.\tG\t.\t160.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49928\t.\tT\t.\t160.23\t.\tAN=2;DP=52;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49929\t.\tC\t.\t160.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49930\t.\tG\t.\t163.23\t.\tAN=2;DP=53;MQ=60.00;MQ0=0\tGT:DP\t0/0:53\n+gi|15829254|ref|NC_002695.1|\t49931\t.\tA\t.\t166.23\t.\tAN=2;DP=54;MQ=60.00;MQ0=0\tGT:DP\t0/0:54\n+gi|15829254|ref|NC_002695.1|\t49932\t.\tT\t.\t169.23\t.\tAN=2;DP=55;MQ=60.00;MQ0=0\tGT:DP\t0/0:55\n+gi|15829254|ref|NC_002695.1|\t49933\t.\tG\t.\t163.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49934\t.\tA\t.\t166.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49935\t.\tA\t.\t169.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49936\t.\tG\t.\t169.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49937\t.\tA\t.\t163.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49938\t.\tG\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49939\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49940\t.\tA\t.\t151.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49941\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49942\t.\tC\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49943\t.\tC\t.\t157.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49944\t.\tG\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49945\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49946\t.\tA\t.\t154.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49947\t.\tC\t.\t154.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49948\t.\tA\t.\t151.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49949\t.\tT\t.\t151.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49950\t.\tT\t.\t148.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49951\t.\tG\t.\t154.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49952\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49953\t.\tT\t.\t154.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49954\t.\tG\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49955\t.\tT\t.\t151.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49956\t.\tG\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49957\t.\tA\t.\t151.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49958\t.\tA\t.\t148.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49959\t.\tA\t.\t148.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49960\t.\tG\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49961\t.\tC\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49962\t.\tC\t.\t148.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49963\t.\tG\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49964\t.\tA\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n'
b
diff -r 2e69ce9dca65 -r f72039c5faa4 test-data/testresult.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/testresult.fa Wed Dec 16 07:29:05 2015 -0500
b
@@ -0,0 +1,6 @@
+>1_short
+TTCTTCATGA
+>2_short
+TTTTACACGA
+>reference
+CCCATAGCAG
b
diff -r 2e69ce9dca65 -r f72039c5faa4 tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tool_dependencies.xml Wed Dec 16 07:29:05 2015 -0500
b
@@ -0,0 +1,21 @@
+<?xml version="1.0"?>
+<tool_dependency>
+ <package name="python" version="2.7.10">
+        <repository changeset_revision="0339c4a9b87b" name="package_python_2_7_10" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu" />
+    </package>
+    <package name="pyvcf" version="0.6.7">
+        <repository changeset_revision="f6d05f9a22bf" name="package_python_2_7_pyvcf_0_6_7" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
+    </package>
+    <package name="pyyaml" version="3.11">
+        <repository changeset_revision="99267d131c05" name="package_python_2_7_pyyaml_3_11" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
+    </package>
+ <package name="bintrees" version="2.0.2">
+        <repository changeset_revision="1d16207341c7" name="package_python_2_7_bintrees_2_0_2" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
+    </package>
+ <package name="biopython" version="1.66">
+        <repository changeset_revision="5d5355863287" name="package_python_2_7_biopython_1_66" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
+    </package>
+ <package name="matplotlib" version="1.4">
+        <repository changeset_revision="f7424e1cf115" name="package_python_2_7_matplotlib_1_4" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/>
+    </package>
+</tool_dependency>
b
diff -r 2e69ce9dca65 -r f72039c5faa4 vcfs2fasta.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/vcfs2fasta.py Wed Dec 16 07:29:05 2015 -0500
[
b'@@ -0,0 +1,415 @@\n+#!/usr/bin/env python\n+\'\'\'\n+Merge SNP data from multiple VCF files into a single fasta file.\n+\n+Created on 5 Oct 2015\n+\n+@author: alex\n+\'\'\'\n+import argparse\n+from collections import OrderedDict\n+import glob\n+import itertools\n+import logging\n+import os\n+\n+from Bio import SeqIO\n+from bintrees import FastRBTree\n+\n+# Try importing the matplotlib and numpy for stats.\n+try:\n+    from matplotlib import pyplot as plt\n+    import numpy\n+    can_stats = True\n+except ImportError:\n+    can_stats = False\n+\n+import vcf\n+\n+from phe.variant_filters import IUPAC_CODES\n+\n+\n+def plot_stats(pos_stats, total_samples, plots_dir="plots", discarded={}):\n+    if not os.path.exists(plots_dir):\n+        os.makedirs(plots_dir)\n+\n+    for contig in pos_stats:\n+\n+        plt.style.use(\'ggplot\')\n+\n+        x = numpy.array([pos for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+        y = numpy.array([ float(pos_stats[contig][pos]["mut"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, []) ])\n+\n+        f, (ax1, ax2, ax3, ax4) = plt.subplots(4, sharex=True, sharey=True)\n+        f.set_size_inches(12, 15)\n+        ax1.plot(x, y, \'ro\')\n+        ax1.set_title("Fraction of samples with SNPs")\n+        plt.ylim(0, 1.1)\n+\n+        y = numpy.array([ float(pos_stats[contig][pos]["N"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+        ax2.plot(x, y, \'bo\')\n+        ax2.set_title("Fraction of samples with Ns")\n+\n+        y = numpy.array([ float(pos_stats[contig][pos]["mix"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+        ax3.plot(x, y, \'go\')\n+        ax3.set_title("Fraction of samples with mixed bases")\n+\n+        y = numpy.array([ float(pos_stats[contig][pos]["gap"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+        ax4.plot(x, y, \'yo\')\n+        ax4.set_title("Fraction of samples with uncallable genotype (gap)")\n+\n+        plt.savefig(os.path.join(plots_dir, "%s.png" % contig), dpi=100)\n+\n+def get_mixture(record, threshold):\n+    mixtures = {}\n+    try:\n+        if len(record.samples[0].data.AD) > 1:\n+\n+            total_depth = sum(record.samples[0].data.AD)\n+            # Go over all combinations of touples.\n+            for comb in itertools.combinations(range(0, len(record.samples[0].data.AD)), 2):\n+                i = comb[0]\n+                j = comb[1]\n+\n+                alleles = list()\n+\n+                if 0 in comb:\n+                    alleles.append(str(record.REF))\n+\n+                if i != 0:\n+                    alleles.append(str(record.ALT[i - 1]))\n+                    mixture = record.samples[0].data.AD[i]\n+                if j != 0:\n+                    alleles.append(str(record.ALT[j - 1]))\n+                    mixture = record.samples[0].data.AD[j]\n+\n+                ratio = float(mixture) / total_depth\n+                if ratio == 1.0:\n+                    logging.debug("This is only designed for mixtures! %s %s %s %s", record, ratio, record.samples[0].data.AD, record.FILTER)\n+\n+                    if ratio not in mixtures:\n+                        mixtures[ratio] = []\n+                    mixtures[ratio].append(alleles.pop())\n+\n+                elif ratio >= threshold:\n+                    try:\n+                        code = IUPAC_CODES[frozenset(alleles)]\n+                        if ratio not in mixtures:\n+                            mixtures[ratio] = []\n+                            mixtures[ratio].append(code)\n+                    except KeyError:\n+                        logging.warn("Could not retrieve IUPAC code for %s from %s", alleles, record)\n+    except AttributeError:\n+        mixtures = {}\n+\n+    return mixtures\n+\n+def print_stats(stats, pos_stats, total_vars):\n+    for contig in stats:\n+        for sample, info in stats[contig].items():\n+            print "%s,%i,%i" % (sample, len(info.get("n_pos", [])), total_vars)\n+\n'..b'n_ratio = float(len(sample_stats[contig][sample]["n_pos"])) / len(avail_pos[contig])\n+                if sample_n_ratio > args.sample_Ns:\n+                    for pos in sample_stats[contig][sample]["n_pos"]:\n+                        pos_stats[contig][pos]["N"] -= 1\n+\n+                    logging.info("Removing %s due to high Ns in sample: %s", sample , sample_n_ratio)\n+\n+                    delete_samples.append(sample)\n+\n+        samples = [sample for sample in samples if sample not in delete_samples]\n+    snp_positions = []\n+    with open(args.out, "w") as fp:\n+\n+        for sample in samples:\n+            sample_seq = ""\n+            for contig in contigs:\n+                if contig in avail_pos:\n+                    if args.reference:\n+                        positions = xrange(1, len(args.reference[contig]) + 1)\n+                    else:\n+                        positions = avail_pos[contig].keys()\n+                    for pos in positions:\n+                        if pos in avail_pos[contig]:\n+                            if not args.column_Ns or float(pos_stats[contig][pos]["N"]) / len(samples) < args.column_Ns and \\\n+                                float(pos_stats[contig][pos]["-"]) / len(samples) < args.column_Ns:\n+                                sample_seq += all_data[contig][sample][pos]\n+                            else:\n+                                if contig not in discarded:\n+                                    discarded[contig] = []\n+                                discarded[contig].append(pos)\n+                        elif args.reference:\n+                            sample_seq += args.reference[contig][pos - 1]\n+                elif args.reference:\n+                    sample_seq += args.reference[contig]\n+\n+            fp.write(">%s\\n%s\\n" % (sample, sample_seq))\n+        # Do the same for reference data.\n+        ref_snps = ""\n+\n+        for contig in contigs:\n+            if contig in avail_pos:\n+                if args.reference:\n+                    positions = xrange(1, len(args.reference[contig]) + 1)\n+                else:\n+                    positions = avail_pos[contig].keys()\n+                for pos in positions:\n+                    if pos in avail_pos[contig]:\n+                        if not args.column_Ns or float(pos_stats[contig][pos]["N"]) / len(samples) < args.column_Ns and \\\n+                                float(pos_stats[contig][pos]["-"]) / len(samples) < args.column_Ns:\n+\n+                            ref_snps += str(avail_pos[contig][pos])\n+                            snp_positions.append((contig, pos,))\n+                    elif args.reference:\n+                        ref_snps += args.reference[contig][pos - 1]\n+            elif args.reference:\n+                    ref_snps += args.reference[contig]\n+\n+        fp.write(">reference\\n%s\\n" % ref_snps)\n+\n+    if can_stats and args.with_stats:\n+        with open(args.with_stats, "wb") as fp:\n+            fp.write("contig\\tposition\\tmutations\\tn_frac\\n")\n+            for values in snp_positions:\n+                fp.write("%s\\t%s\\t%s\\t%s\\n" % (values[0],\n+                                             values[1],\n+                                             float(pos_stats[values[0]][values[1]]["mut"]) / len(args.input),\n+                                             float(pos_stats[values[0]][values[1]]["N"]) / len(args.input)))\n+        plot_stats(pos_stats, len(samples), discarded=discarded, plots_dir=os.path.abspath(args.plots_dir))\n+    # print_stats(sample_stats, pos_stats, total_vars=len(avail_pos[contig]))\n+\n+    total_discarded = 0\n+    for _, i in discarded.items():\n+        total_discarded += len(i)\n+    logging.info("Discarded total of %i poor quality columns", float(total_discarded) / len(args.input))\n+    return 0\n+\n+if __name__ == \'__main__\':\n+    import time\n+\n+#     with PyCallGraph(output=graphviz):\n+#     T0 = time.time()\n+    r = main()\n+#     T1 = time.time()\n+\n+#     print "Time taken: %i" % (T1 - T0)\n+    exit(r)\n'
b
diff -r 2e69ce9dca65 -r f72039c5faa4 vcfs2fasta.sh
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/vcfs2fasta.sh Wed Dec 16 07:29:05 2015 -0500
[
@@ -0,0 +1,51 @@
+#!/bin/bash
+
+echo $@
+
+OUTPUT=$1
+shift
+WITHMIXTURES=$1
+shift
+COLUMNNS=$1
+shift
+SAMPLENS=$1
+shift
+REFERENCE=$1
+shift
+INCLUDE=$1
+shift
+EXCLUDE=$1
+shift
+INPUT=$@
+
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+export PATH=$PATH:$DIR
+
+CMD="vcfs2fasta.py --out $OUTPUT --input $INPUT"
+
+if [ $WITHMIXTURES != "NOTTHERE" ]; then
+    CMD="$CMD --with-mixtures $WITHMIXTURES"
+fi
+
+if [ $COLUMNNS != "NOTTHERE" ]; then
+    CMD="$CMD --column-Ns $COLUMNNS"
+fi
+
+if [ $SAMPLENS != "NOTTHERE" ]; then
+    CMD="$CMD --sample-Ns $SAMPLENS"
+fi
+
+if [ $REFERENCE != "NOTTHERE" ]; then
+    CMD="$CMD --reference $REFERENCE"
+fi
+
+if [ $INCLUDE != "NOTTHERE" ]; then
+    CMD="$CMD --include INCLUDE"
+fi
+
+if [ $EXCLUDE != "NOTTHERE" ]; then
+    CMD="$CMD --exclude EXCLUDE"
+fi
+
+echo $CMD
+eval $CMD
b
diff -r 2e69ce9dca65 -r f72039c5faa4 vcfs2fasta.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/vcfs2fasta.xml Wed Dec 16 07:29:05 2015 -0500
b
@@ -0,0 +1,123 @@
+<tool id="vcfs2fasta" name="VCFs to fasta" version="1.0">
+  <description>Takes a set of VCF files and outputs a multi fasta file with only the variant positions.</description>
+  <requirements>
+    <requirement type="package" version="2.7.10">python</requirement>
+    <requirement type="package" version="0.6.7">pyvcf</requirement>
+    <requirement type="package" version="3.11">pyyaml</requirement>
+    <requirement type="package" version="2.0.2">bintrees</requirement>
+ <requirement type="package" version="1.66">biopython</requirement>
+ <requirement type="package" version="1.4">matplotlib</requirement>
+  </requirements>
+  <stdio>
+ <!-- Assume anything other than zero is an error -->
+    <exit_code range="1:" />
+    <exit_code range=":-1" />
+  </stdio>
+  <command interpreter="bash">
+    vcfs2fasta.sh
+ $output
+ #if str($mix_cond.mix) == "yes":
+        $mix_cond.mix_value
+    #else
+        NOTTHERE
+    #end if
+ #if str($cols_cond.cols) == "yes":
+        $cols_cond.column_ns
+    #else
+        NOTTHERE
+    #end if
+ #if str($sample_cond.sample) == "yes":
+        $sample_cond.sample_ns
+    #else
+        NOTTHERE
+    #end if
+ #if str($reference_cond.reference) == "yes":
+ $reference_cond.ref_fa
+ #else
+        NOTTHERE
+    #end if
+ #if str($include_cond.include) == "yes":
+ $include_cond.in_bed
+ #else
+        NOTTHERE
+    #end if
+ #if str($exclude_cond.exclude) == "yes":
+ $exclude_cond.ex_bed
+ #else
+        NOTTHERE
+    #end if
+ #for $i, $input_vcf in enumerate( $input_vcfs ):
+        "${input_vcf}"
+    #end for
+  </command>
+
+  <inputs>
+ <param name="input_vcfs" type="data" multiple="true" format="vcf" label="Input VCF file(s)" />
+ <conditional name="mix_cond">
+        <param name="mix" type="select" label="With Mixtures" help="Specify this option with a threshold to output mixtures above this threshold.">
+          <option value="yes">Specify</option>
+          <option value="no" selected="true">Do not specify</option>
+        </param>
+        <when value="yes">
+ <param name="mix_value" type="float" value="0.5" label="Mixture value" />
+        </when>
+    </conditional>
+ <conditional name="cols_cond">
+        <param name="cols" type="select" label="Column Ns" help="Keeps columns with fraction of Ns above specified threshold.">
+          <option value="yes">Specify</option>
+          <option value="no" selected="true">Do not specify</option>
+        </param>
+        <when value="yes">
+ <param name="column_ns" type="float" value="0.5" label="Column Ns value" />
+        </when>
+    </conditional>
+ <conditional name="sample_cond">
+        <param name="sample" type="select" label="Sample Ns" help="Keeps samples with fraction of Ns above specified threshold.">
+          <option value="yes">Specify</option>
+          <option value="no" selected="true">Do not specify</option>
+        </param>
+        <when value="yes">
+ <param name="sample_ns" type="float" value="0.5" label="Sample Ns value" />
+        </when>
+    </conditional>
+ <conditional name="reference_cond">
+        <param name="reference" type="select" label="Reference genome file" help="If path to reference specified, then whole genome will be outputted.">
+          <option value="yes">Specify</option>
+          <option value="no" selected="true">Do not specify</option>
+        </param>
+        <when value="yes">
+ <param name="ref_fa" type="data" format="fasta" label="Reference fasta file" help="Fasta format"/>
+        </when>
+    </conditional>
+ <conditional name="include_cond">
+        <param name="include" type="select" label="Include region" help="Specify regions to include in a bed file.">
+          <option value="yes">Specify</option>
+          <option value="no" selected="true">Do not specify</option>
+        </param>
+        <when value="yes">
+ <param name="in_bed" type="data" format="bed" label="Include regions bed file" help="bed format"/>
+        </when>
+    </conditional>
+ <conditional name="exclude_cond">
+        <param name="exclude" type="select" label="Exclude region" help="Specify regions to exclude in a bed file.">
+          <option value="yes">Specify</option>
+          <option value="no" selected="true">Do not specify</option>
+        </param>
+        <when value="yes">
+ <param name="ex_bed" type="data" format="bed" label="Exclude regions bed file" help="bed format"/>
+        </when>
+    </conditional>
+  </inputs>
+
+  <outputs>
+    <data format="fasta" name="output" label="${tool.name} on ${on_string}: FASTA file" />
+  </outputs>
+  <test>
+ <param name="input_vcfs" value="1_short.vcf" ftype="vcf" />
+ <param name="input_vcfs" value="2_short.vcf" ftype="vcf" />
+ <output name="output" file="testresult.fa" ftype="fasta" />
+  </test>
+  <help>
+
+  </help>
+</tool>
b
diff -r 2e69ce9dca65 -r f72039c5faa4 vcfs2fasta/phe/metadata/__init__.py
--- a/vcfs2fasta/phe/metadata/__init__.py Wed Dec 16 07:26:29 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
b
@@ -1,16 +0,0 @@
-"""Metadata related information."""
-
-import abc
-
-class PHEMetaData(object):
-    """Abstract class to provide interface for meta-data creation."""
-
-    __metaclass__ = abc.ABCMeta
-
-    def __init__(self):
-        pass
-
-    @abc.abstractmethod
-    def get_meta(self):
-        """Get the metadata."""
-        raise NotImplementedError("get meta has not been implemented yet.")
b
diff -r 2e69ce9dca65 -r f72039c5faa4 vcfs2fasta/phe/variant/GATKVariantCaller.py
--- a/vcfs2fasta/phe/variant/GATKVariantCaller.py Wed Dec 16 07:26:29 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,126 +0,0 @@
-'''
-Created on 22 Sep 2015
-
-@author: alex
-'''
-from collections import OrderedDict
-import logging
-import os
-import subprocess
-
-from phe.variant import VariantCaller
-
-
-class GATKVariantCaller(VariantCaller):
-    """Implemetation of the Broad institute's variant caller."""
-
-    name = "gatk"
-    """Plain text name of the variant caller."""
-
-    _default_options = "--sample_ploidy 2 --genotype_likelihoods_model BOTH -rf BadCigar -out_mode EMIT_ALL_SITES -nt 1"
-    """Default options for the variant caller."""
-
-    def __init__(self, cmd_options=None):
-        """Constructor"""
-        if cmd_options is None:
-            cmd_options = self._default_options
-
-        super(GATKVariantCaller, self).__init__(cmd_options=cmd_options)
-
-        self.last_command = None
-
-    def get_info(self, plain=False):
-        d = {"name": "gatk", "version": self.get_version(), "command": self.last_command}
-
-        if plain:
-            result = "GATK(%(version)s): %(command)s" % d
-        else:
-            result = OrderedDict(d)
-
-        return result
-
-    def get_version(self):
-
-        p = subprocess.Popen(["java", "-jar", os.environ["GATK_JAR"], "-version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
-        (output, _) = p.communicate()
-
-        # last character is EOL.
-        version = output.split("\n")[-2]
-
-        return version
-
-    def make_vcf(self, *args, **kwargs):
-        ref = kwargs.get("ref")
-        bam = kwargs.get("bam")
-
-        if kwargs.get("vcf_file") is None:
-            kwargs["vcf_file"] = "variants.vcf"
-
-        opts = {"ref": os.path.abspath(ref),
-                "bam": os.path.abspath(bam),
-                "gatk_jar": os.environ["GATK_JAR"],
-                "all_variants_file": os.path.abspath(kwargs.get("vcf_file")),
-                "extra_cmd_options": self.cmd_options}
-
-#         if not self.create_aux_files(ref):
-#             logging.warn("Auxiliary files were not created.")
-#             return False
-
-        # Call variants
-        # FIXME: Sample ploidy = 2?
-        os.environ["GATK_JAR"]
-        cmd = "java -XX:+UseSerialGC -jar %(gatk_jar)s -T UnifiedGenotyper -R %(ref)s -I %(bam)s -o %(all_variants_file)s %(extra_cmd_options)s" % opts
-        success = os.system(cmd)
-
-        if success != 0:
-            logging.warn("Calling variants returned non-zero exit status.")
-            return False
-
-        self.last_command = cmd
-
-        return True
-
-    def create_aux_files(self, ref):
-        """Create auxiliary files needed for this variant.
-
-        Tools needed: samtools and picard tools. Picard tools is a Java
-        library that can be defined using environment variable: PICARD_JAR
-        specifying path to picard.jar or PICARD_TOOLS_PATH specifying path
-        to the directory where separate jars are (older version before jars
-        were merged into a single picard.jar).
-        Parameters:
-        -----------
-        ref: str
-            Path to the reference file.
-        
-        Returns:
-        --------
-        bool:
-            True if auxiliary files were created, False otherwise.
-        """
-
-        ref_name, _ = os.path.splitext(ref)
-
-        success = os.system("samtools faidx %s" % ref)
-
-        if success != 0:
-            logging.warn("Fasta index could not be created.")
-            return False
-
-        d = {"ref": ref, "ref_name": ref_name}
-
-        if os.environ.get("PICARD_TOOLS_PATH"):
-            d["picard_tools_path"] = os.path.join(os.environ["PICARD_TOOLS_PATH"], "CreateSequenceDictionary.jar")
-        elif os.environ.get("PICARD_JAR"):
-            # This is used in newer version of PICARD tool where multiple
-            #    jars were merged into a single jar file.
-            d["picard_tools_path"] = "%s %s" % (os.environ["PICARD_JAR"], "CreateSequenceDictionary")
-        else:
-            logging.error("Picard tools are not present in the path.")
-            return False
-
-        success = os.system("java -jar %(picard_tools_path)s R=%(ref)s O=%(ref_name)s.dict" % d)
-
-        if success != 0:
-            logging.warn("Dictionary for the %s reference could not be created", ref)
-            return False
b
diff -r 2e69ce9dca65 -r f72039c5faa4 vcfs2fasta/phe/variant_filters/__init__.py
--- a/vcfs2fasta/phe/variant_filters/__init__.py Wed Dec 16 07:26:29 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
@@ -1,209 +0,0 @@
-"""Classes and functions for working with variant filters."""
-
-from __builtin__ import __import__
-from abc import abstractproperty
-import abc
-import argparse
-import glob
-import inspect
-import logging
-import os
-import re
-import sys
-
-import vcf
-import vcf.filters
-from vcf.parser import _Filter
-
-IUPAC_CODES = {frozenset(["A", "G"]): "R",
-                frozenset(["C", "T"]): "Y",
-                frozenset(["G", "C"]): "S",
-                frozenset(["A", "T"]): "W",
-                frozenset(["G", "T"]): "K",
-                frozenset(["A", "C"]): "M",
-                frozenset(["C", "G", "T"]): "B",
-                frozenset(["A", "G", "T"]): "D",
-                frozenset(["A", "C", "T"]): "H",
-                frozenset(["A", "C", "G"]): "V"
-              }
-
-class PHEFilterBase(vcf.filters.Base):
-    """Base class for VCF filters."""
-    __meta__ = abc.ABCMeta
-
-    magic_sep = ":"
-    decoder_pattern = re.compile(magic_sep)
-
-    @abc.abstractproperty
-    def parameter(self):
-        """Short name of parameter being filtered."""
-        return self.parameter
-
-    @abc.abstractproperty
-    def _default_threshold(self):
-        """Default threshold for filtering."""
-        return self._default_threshold
-
-    def __init__(self, args):
-        super(PHEFilterBase, self).__init__(args)
-
-        # Change the threshold to custom gq value.
-        self.threshold = self._default_threshold
-
-        if isinstance(args, dict):
-            self.threshold = args.get(self.parameter)
-
-    def __str__(self):
-        return self.filter_name()
-
-    @abc.abstractmethod
-    def short_desc(self):
-        """Short description of the filter (included in VCF)."""
-        raise NotImplementedError("Get short description is not implemented.")
-
-    def get_config(self):
-        """This is used for reconstructing filter."""
-        return {self.parameter: self.threshold}
-
-    def filter_name(self):
-        """Create filter names by their parameter separated by magic.
-        E.g. if filter parameter is ad_ratio and threshold is 0.9 then
-        ad_ratio:0.9 if the filter name.
-        """
-        return "%s%s%s" % (self.parameter, self.magic_sep, self.threshold)
-
-    @staticmethod
-    def decode(filter_id):
-        """Decode name of filter."""
-        conf = {}
-
-        if PHEFilterBase.magic_sep in filter_id:
-            info = PHEFilterBase.decoder_pattern.split(filter_id)
-            assert len(info) == 2
-            conf[info[0]] = info[1]
-        return conf
-
-    def is_gap(self):
-        return False
-
-    def is_n(self):
-        return True
-
-    @staticmethod
-    def call_concensus(record):
-        extended_code = "N"
-        try:
-            sample_ad = set([str(c) for c in record.ALT] + [record.REF])
-
-
-            for code, cov in IUPAC_CODES.items():
-                if sample_ad == cov:
-                    extended_code = code
-                    break
-        except AttributeError:
-            extended_code = "N"
-
-        return extended_code
-
-def dynamic_filter_loader():
-    """Fancy way of dynamically importing existing filters.
-    
-    Returns
-    -------
-    dict:
-        Available filters dictionary. Keys are parameters that
-        can be supplied to the filters.
-    """
-
-    # We assume the filters are in the same directory as THIS file.
-    filter_dir = os.path.dirname(__file__)
-    filter_dir = os.path.abspath(filter_dir)
-
-    # This is populated when the module is first imported.
-    avail_filters = {}
-
-    # Add this directory to the syspath.
-    sys.path.insert(0, filter_dir)
-
-    # Find all "py" files.
-    for filter_mod in glob.glob(os.path.join(filter_dir, "*.py")):
-
-        # Derive name of the module where filter is.
-        filter_mod_file = os.path.basename(filter_mod)
-
-        # Ignore this file, obviously.
-        if filter_mod_file.startswith("__init__"):
-            continue
-
-        # Import the module with a filter.
-        mod = __import__(filter_mod_file.replace(".pyc", "").replace(".py", ""))
-
-        # Find all the classes contained in this module.
-        classes = inspect.getmembers(mod, inspect.isclass)
-        for cls_name, cls in classes:
-            # For each class, if it is a sublass of PHEFilterBase, add it.
-            if cls_name != "PHEFilterBase" and issubclass(cls, PHEFilterBase):
-                # The parameters are inherited and defined within each filter.
-                avail_filters[cls.parameter] = cls
-
-    sys.path.remove(filter_dir)
-
-    return avail_filters
-
-_avail_filters = dynamic_filter_loader()
-
-def available_filters():
-    """Return list of available filters."""
-    return _avail_filters.keys()
-
-def str_to_filters(filters):
-    """Convert from filter string to array of filters.
-    E.g. ad_ration:0.9,min_depth:5
-    
-    Parameters:
-    -----------
-    filters: str
-        String version of filters, separated by comma.
-    
-    Returns:
-    --------
-    list:
-        List of :py:class:`phe.variant_filters.PHEFilterBase` instances.
-    """
-
-    config = {}
-    for kv_pair in filters.split(","):
-        pair = kv_pair.split(":")
-        assert len(pair) == 2, "Filters should be separated by ':' %s" % kv_pair
-
-        # We don't care about casting them to correct type because Filters
-        #    will do it for us.
-        config[pair[0]] = pair[1]
-
-    return make_filters(config)
-
-def make_filters(config):
-    """Create a list of filters from *config*.
-    
-    Parameters:
-    -----------
-    config: dict, optional
-        Dictionary with parameter: value pairs. For each parameter, an
-        appropriate Filter will be found and instanciated.
-        
-    Returns:
-    --------
-    list:
-        List of :py:class:`PHEFilterBase` filters.
-    """
-    filters = []
-
-    if config:
-        for custom_filter in config:
-            if custom_filter in _avail_filters:
-                filters.append(_avail_filters[custom_filter](config))
-            else:
-                logging.warn("Could not find appropriate filter for %s",
-                             custom_filter)
-
-    return filters
b
diff -r 2e69ce9dca65 -r f72039c5faa4 vcfs2fasta/test-data/1_short.vcf
--- a/vcfs2fasta/test-data/1_short.vcf Wed Dec 16 07:26:29 2015 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
[
b'@@ -1,50000 +0,0 @@\n-##fileformat=VCFv4.1\n-##FILTER=<ID=LowQual,Description="Low quality">\n-##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">\n-##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">\n-##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">\n-##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n-##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">\n-##GATKCommandLine=<ID=SelectVariants,Version=2.6-5-gba531bd,Date="Thu May 22 09:24:36 BST 2014",Epoch=1400747076888,CommandLineOptions="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/input_variant.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] keepIDs=null fullyDecode=false forceGenotypesDecode=false justRead=false maxIndelSize=2147483647 ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">\n-##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-5-gba531bd,Date="Thu May 22 09:10:24 BST 2014",Epoch=1400746224902,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input_0.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=NONE downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOrigina'..b'.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49922\t.\tT\t.\t434.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49923\t.\tC\t.\t434.23\t.\tAN=2;DP=143;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n-gi|15829254|ref|NC_002695.1|\t49924\t.\tA\t.\t428.23\t.\tAN=2;DP=141;MQ=60.00;MQ0=0\tGT:DP\t0/0:141\n-gi|15829254|ref|NC_002695.1|\t49925\t.\tA\t.\t440.23\t.\tAN=2;DP=149;MQ=60.00;MQ0=0\tGT:DP\t0/0:149\n-gi|15829254|ref|NC_002695.1|\t49926\t.\tT\t.\t440.23\t.\tAN=2;DP=152;MQ=60.00;MQ0=0\tGT:DP\t0/0:152\n-gi|15829254|ref|NC_002695.1|\t49927\t.\tG\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49928\t.\tT\t.\t452.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n-gi|15829254|ref|NC_002695.1|\t49929\t.\tC\t.\t443.23\t.\tAN=2;DP=153;MQ=60.00;MQ0=0\tGT:DP\t0/0:153\n-gi|15829254|ref|NC_002695.1|\t49930\t.\tG\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49931\t.\tA\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49932\t.\tT\t.\t461.23\t.\tAN=2;DP=159;MQ=60.00;MQ0=0\tGT:DP\t0/0:159\n-gi|15829254|ref|NC_002695.1|\t49933\t.\tG\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49934\t.\tA\t.\t455.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49935\t.\tA\t.\t452.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n-gi|15829254|ref|NC_002695.1|\t49936\t.\tG\t.\t455.23\t.\tAN=2;DP=158;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n-gi|15829254|ref|NC_002695.1|\t49937\t.\tA\t.\t440.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49938\t.\tG\t.\t449.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n-gi|15829254|ref|NC_002695.1|\t49939\t.\tC\t.\t443.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49940\t.\tA\t.\t440.23\t.\tAN=2;DP=154;MQ=60.00;MQ0=0\tGT:DP\t0/0:154\n-gi|15829254|ref|NC_002695.1|\t49941\t.\tT\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49942\t.\tC\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n-gi|15829254|ref|NC_002695.1|\t49943\t.\tC\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49944\t.\tG\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49945\t.\tC\t.\t431.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49946\t.\tA\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49947\t.\tC\t.\t431.23\t.\tAN=2;DP=145;MQ=60.00;MQ0=0\tGT:DP\t0/0:145\n-gi|15829254|ref|NC_002695.1|\t49948\t.\tA\t.\t404.23\t.\tAN=2;DP=137;MQ=60.00;MQ0=0\tGT:DP\t0/0:137\n-gi|15829254|ref|NC_002695.1|\t49949\t.\tT\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n-gi|15829254|ref|NC_002695.1|\t49950\t.\tT\t.\t419.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49951\t.\tG\t.\t431.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49952\t.\tT\t.\t437.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49953\t.\tT\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49954\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49955\t.\tT\t.\t428.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49956\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n-gi|15829254|ref|NC_002695.1|\t49957\t.\tA\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n-gi|15829254|ref|NC_002695.1|\t49958\t.\tA\t.\t416.23\t.\tAN=2;DP=140;MQ=60.00;MQ0=0\tGT:DP\t0/0:140\n-gi|15829254|ref|NC_002695.1|\t49959\t.\tA\t.\t407.23\t.\tAN=2;DP=138;MQ=60.00;MQ0=0\tGT:DP\t0/0:138\n-gi|15829254|ref|NC_002695.1|\t49960\t.\tG\t.\t395.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n-gi|15829254|ref|NC_002695.1|\t49961\t.\tC\t.\t404.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n-gi|15829254|ref|NC_002695.1|\t49962\t.\tC\t.\t392.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n-gi|15829254|ref|NC_002695.1|\t49963\t.\tG\t.\t398.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n-gi|15829254|ref|NC_002695.1|\t49964\t.\tA\t.\t398.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n'