Previous changeset 15:fd42dbe82c12 (2015-12-16) Next changeset 17:a2ffbf5f50c1 (2015-12-18) |
Commit message:
Uploaded |
added:
LICENSE phe/__init__.py phe/metadata/__init__.py phe/variant/GATKVariantCaller.py phe/variant/MPileupVariantCaller.py phe/variant/__init__.py phe/variant/variant_factory.py phe/variant_filters/__init__.py test-data/1_short.vcf test-data/2_short.vcf test-data/testresult.fa tool_dependencies.xml vcfs2fasta.py vcfs2fasta.sh vcfs2fasta.xml |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec LICENSE --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/LICENSE Wed Dec 16 07:32:22 2015 -0500 |
b |
b'@@ -0,0 +1,676 @@\n+\n+\n+ GNU GENERAL PUBLIC LICENSE\n+ Version 3, 29 June 2007\n+\n+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>\n+ Everyone is permitted to copy and distribute verbatim copies\n+ of this license document, but changing it is not allowed.\n+\n+ Preamble\n+\n+ The GNU General Public License is a free, copyleft license for\n+software and other kinds of works.\n+\n+ The licenses for most software and other practical works are designed\n+to take away your freedom to share and change the works. By contrast,\n+the GNU General Public License is intended to guarantee your freedom to\n+share and change all versions of a program--to make sure it remains free\n+software for all its users. We, the Free Software Foundation, use the\n+GNU General Public License for most of our software; it applies also to\n+any other work released this way by its authors. You can apply it to\n+your programs, too.\n+\n+ When we speak of free software, we are referring to freedom, not\n+price. Our General Public Licenses are designed to make sure that you\n+have the freedom to distribute copies of free software (and charge for\n+them if you wish), that you receive source code or can get it if you\n+want it, that you can change the software or use pieces of it in new\n+free programs, and that you know you can do these things.\n+\n+ To protect your rights, we need to prevent others from denying you\n+these rights or asking you to surrender the rights. Therefore, you have\n+certain responsibilities if you distribute copies of the software, or if\n+you modify it: responsibilities to respect the freedom of others.\n+\n+ For example, if you distribute copies of such a program, whether\n+gratis or for a fee, you must pass on to the recipients the same\n+freedoms that you received. You must make sure that they, too, receive\n+or can get the source code. And you must show them these terms so they\n+know their rights.\n+\n+ Developers that use the GNU GPL protect your rights with two steps:\n+(1) assert copyright on the software, and (2) offer you this License\n+giving you legal permission to copy, distribute and/or modify it.\n+\n+ For the developers\' and authors\' protection, the GPL clearly explains\n+that there is no warranty for this free software. For both users\' and\n+authors\' sake, the GPL requires that modified versions be marked as\n+changed, so that their problems will not be attributed erroneously to\n+authors of previous versions.\n+\n+ Some devices are designed to deny users access to install or run\n+modified versions of the software inside them, although the manufacturer\n+can do so. This is fundamentally incompatible with the aim of\n+protecting users\' freedom to change the software. The systematic\n+pattern of such abuse occurs in the area of products for individuals to\n+use, which is precisely where it is most unacceptable. Therefore, we\n+have designed this version of the GPL to prohibit the practice for those\n+products. If such problems arise substantially in other domains, we\n+stand ready to extend this provision to those domains in future versions\n+of the GPL, as needed to protect the freedom of users.\n+\n+ Finally, every program is threatened constantly by software patents.\n+States should not allow patents to restrict development and use of\n+software on general-purpose computers, but in those that do, we wish to\n+avoid the special danger that patents applied to a free program could\n+make it effectively proprietary. To prevent this, the GPL assures that\n+patents cannot be used to render the program non-free.\n+\n+ The precise terms and conditions for copying, distribution and\n+modification follow.\n+\n+ TERMS AND CONDITIONS\n+\n+ 0. Definitions.\n+\n+ "This License" refers to version 3 of the GNU General Public License.\n+\n+ "Copyright" also means copyright-like laws that apply to other kinds of\n+works, such as semiconductor masks.\n+\n+ "The Program" refers '..b'CE OF THE PROGRAM\n+IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF\n+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.\n+\n+ 16. Limitation of Liability.\n+\n+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING\n+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS\n+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY\n+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE\n+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF\n+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD\n+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),\n+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF\n+SUCH DAMAGES.\n+\n+ 17. Interpretation of Sections 15 and 16.\n+\n+ If the disclaimer of warranty and limitation of liability provided\n+above cannot be given local legal effect according to their terms,\n+reviewing courts shall apply local law that most closely approximates\n+an absolute waiver of all civil liability in connection with the\n+Program, unless a warranty or assumption of liability accompanies a\n+copy of the Program in return for a fee.\n+\n+ END OF TERMS AND CONDITIONS\n+\n+ How to Apply These Terms to Your New Programs\n+\n+ If you develop a new program, and you want it to be of the greatest\n+possible use to the public, the best way to achieve this is to make it\n+free software which everyone can redistribute and change under these terms.\n+\n+ To do so, attach the following notices to the program. It is safest\n+to attach them to the start of each source file to most effectively\n+state the exclusion of warranty; and each file should have at least\n+the "copyright" line and a pointer to where the full notice is found.\n+\n+ {one line to give the program\'s name and a brief idea of what it does.}\n+ Copyright (C) {year} {name of author}\n+\n+ This program is free software: you can redistribute it and/or modify\n+ it under the terms of the GNU General Public License as published by\n+ the Free Software Foundation, either version 3 of the License, or\n+ (at your option) any later version.\n+\n+ This program is distributed in the hope that it will be useful,\n+ but WITHOUT ANY WARRANTY; without even the implied warranty of\n+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n+ GNU General Public License for more details.\n+\n+ You should have received a copy of the GNU General Public License\n+ along with this program. If not, see <http://www.gnu.org/licenses/>.\n+\n+Also add information on how to contact you by electronic and paper mail.\n+\n+ If the program does terminal interaction, make it output a short\n+notice like this when it starts in an interactive mode:\n+\n+ {project} Copyright (C) {year} {fullname}\n+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w\'.\n+ This is free software, and you are welcome to redistribute it\n+ under certain conditions; type `show c\' for details.\n+\n+The hypothetical commands `show w\' and `show c\' should show the appropriate\n+parts of the General Public License. Of course, your program\'s commands\n+might be different; for a GUI interface, you would use an "about box".\n+\n+ You should also get your employer (if you work as a programmer) or school,\n+if any, to sign a "copyright disclaimer" for the program, if necessary.\n+For more information on this, and how to apply and follow the GNU GPL, see\n+<http://www.gnu.org/licenses/>.\n+\n+ The GNU General Public License does not permit incorporating your program\n+into proprietary programs. If your program is a subroutine library, you\n+may consider it more useful to permit linking proprietary applications with\n+the library. If this is what you want to do, use the GNU Lesser General\n+Public License instead of this License. But first, please read\n+<http://www.gnu.org/philosophy/why-not-lgpl.html>.\n' |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec phe/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/phe/__init__.py Wed Dec 16 07:32:22 2015 -0500 |
b |
@@ -0,0 +1,4 @@ +if __name__ == "phe": + # If this package is added as library, append extended path. + from pkgutil import extend_path + __path__ = extend_path(__path__, __name__) \ No newline at end of file |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec phe/metadata/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/phe/metadata/__init__.py Wed Dec 16 07:32:22 2015 -0500 |
b |
@@ -0,0 +1,16 @@ +"""Metadata related information.""" + +import abc + +class PHEMetaData(object): + """Abstract class to provide interface for meta-data creation.""" + + __metaclass__ = abc.ABCMeta + + def __init__(self): + pass + + @abc.abstractmethod + def get_meta(self): + """Get the metadata.""" + raise NotImplementedError("get meta has not been implemented yet.") |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec phe/variant/GATKVariantCaller.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/phe/variant/GATKVariantCaller.py Wed Dec 16 07:32:22 2015 -0500 |
[ |
@@ -0,0 +1,126 @@ +''' +Created on 22 Sep 2015 + +@author: alex +''' +from collections import OrderedDict +import logging +import os +import subprocess + +from phe.variant import VariantCaller + + +class GATKVariantCaller(VariantCaller): + """Implemetation of the Broad institute's variant caller.""" + + name = "gatk" + """Plain text name of the variant caller.""" + + _default_options = "--sample_ploidy 2 --genotype_likelihoods_model BOTH -rf BadCigar -out_mode EMIT_ALL_SITES -nt 1" + """Default options for the variant caller.""" + + def __init__(self, cmd_options=None): + """Constructor""" + if cmd_options is None: + cmd_options = self._default_options + + super(GATKVariantCaller, self).__init__(cmd_options=cmd_options) + + self.last_command = None + + def get_info(self, plain=False): + d = {"name": "gatk", "version": self.get_version(), "command": self.last_command} + + if plain: + result = "GATK(%(version)s): %(command)s" % d + else: + result = OrderedDict(d) + + return result + + def get_version(self): + + p = subprocess.Popen(["java", "-jar", os.environ["GATK_JAR"], "-version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) + (output, _) = p.communicate() + + # last character is EOL. + version = output.split("\n")[-2] + + return version + + def make_vcf(self, *args, **kwargs): + ref = kwargs.get("ref") + bam = kwargs.get("bam") + + if kwargs.get("vcf_file") is None: + kwargs["vcf_file"] = "variants.vcf" + + opts = {"ref": os.path.abspath(ref), + "bam": os.path.abspath(bam), + "gatk_jar": os.environ["GATK_JAR"], + "all_variants_file": os.path.abspath(kwargs.get("vcf_file")), + "extra_cmd_options": self.cmd_options} + +# if not self.create_aux_files(ref): +# logging.warn("Auxiliary files were not created.") +# return False + + # Call variants + # FIXME: Sample ploidy = 2? + os.environ["GATK_JAR"] + cmd = "java -XX:+UseSerialGC -jar %(gatk_jar)s -T UnifiedGenotyper -R %(ref)s -I %(bam)s -o %(all_variants_file)s %(extra_cmd_options)s" % opts + success = os.system(cmd) + + if success != 0: + logging.warn("Calling variants returned non-zero exit status.") + return False + + self.last_command = cmd + + return True + + def create_aux_files(self, ref): + """Create auxiliary files needed for this variant. + + Tools needed: samtools and picard tools. Picard tools is a Java + library that can be defined using environment variable: PICARD_JAR + specifying path to picard.jar or PICARD_TOOLS_PATH specifying path + to the directory where separate jars are (older version before jars + were merged into a single picard.jar). + Parameters: + ----------- + ref: str + Path to the reference file. + + Returns: + -------- + bool: + True if auxiliary files were created, False otherwise. + """ + + ref_name, _ = os.path.splitext(ref) + + success = os.system("samtools faidx %s" % ref) + + if success != 0: + logging.warn("Fasta index could not be created.") + return False + + d = {"ref": ref, "ref_name": ref_name} + + if os.environ.get("PICARD_TOOLS_PATH"): + d["picard_tools_path"] = os.path.join(os.environ["PICARD_TOOLS_PATH"], "CreateSequenceDictionary.jar") + elif os.environ.get("PICARD_JAR"): + # This is used in newer version of PICARD tool where multiple + # jars were merged into a single jar file. + d["picard_tools_path"] = "%s %s" % (os.environ["PICARD_JAR"], "CreateSequenceDictionary") + else: + logging.error("Picard tools are not present in the path.") + return False + + success = os.system("java -jar %(picard_tools_path)s R=%(ref)s O=%(ref_name)s.dict" % d) + + if success != 0: + logging.warn("Dictionary for the %s reference could not be created", ref) + return False |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec phe/variant/MPileupVariantCaller.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/phe/variant/MPileupVariantCaller.py Wed Dec 16 07:32:22 2015 -0500 |
[ |
@@ -0,0 +1,96 @@ +''' +Created on 22 Sep 2015 + +@author: alex +''' +from collections import OrderedDict +import logging +import os +import subprocess +import tempfile + +from phe.variant import VariantCaller + + +class MPileupVariantCaller(VariantCaller): + """Implemetation of the Broad institute's variant caller.""" + + name = "mpileup" + """Plain text name of the variant caller.""" + + _default_options = "-m -f GQ" + """Default options for the variant caller.""" + + def __init__(self, cmd_options=None): + """Constructor""" + if cmd_options is None: + cmd_options = self._default_options + + super(MPileupVariantCaller, self).__init__(cmd_options=cmd_options) + + self.last_command = None + + def get_info(self, plain=False): + d = {"name": self.name, "version": self.get_version(), "command": self.last_command} + + if plain: + result = "mpileup(%(version)s): %(command)s" % d + else: + result = OrderedDict(d) + + return result + + def get_version(self): + + p = subprocess.Popen(["samtools", "--version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) + (output, _) = p.communicate() + + # first line is the version of the samtools + version = output.split("\n")[0].split(" ")[1] + + return version + + def make_vcf(self, *args, **kwargs): + ref = kwargs.get("ref") + bam = kwargs.get("bam") + + if kwargs.get("vcf_file") is None: + kwargs["vcf_file"] = "variants.vcf" + + opts = {"ref": os.path.abspath(ref), + "bam": os.path.abspath(bam), + "all_variants_file": os.path.abspath(kwargs.get("vcf_file")), + "extra_cmd_options": self.cmd_options} + + with tempfile.NamedTemporaryFile(suffix=".pileup") as tmp: + opts["pileup_file"] = tmp.name + cmd = "samtools mpileup -t DP,DV,DP4,DPR,SP -Auf %(ref)s %(bam)s | bcftools call %(extra_cmd_options)s > %(all_variants_file)s" % opts + print cmd + self.last_command = cmd + if os.system(cmd) != 0: + logging.warn("Pileup creation was not successful.") + return False + + return True + + def create_aux_files(self, ref): + """Index reference with faidx from samtools. + + Parameters: + ----------- + ref: str + Path to the reference file. + + Returns: + -------- + bool: + True if auxiliary files were created, False otherwise. + """ + + success = os.system("samtools faidx %s" % ref) + + if success != 0: + logging.warn("Fasta index could not be created.") + return False + + return True |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec phe/variant/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/phe/variant/__init__.py Wed Dec 16 07:32:22 2015 -0500 |
[ |
b'@@ -0,0 +1,288 @@\n+"""Classes and methods to work with variants and such."""\n+import abc\n+#ulf\n+# from collections import OrderedDict\n+try:\n+ from collections import OrderedDict\n+except ImportError:\n+ from ordereddict import OrderedDict\n+ \n+import logging\n+import pickle\n+\n+from vcf import filters\n+import vcf\n+from vcf.parser import _Filter\n+\n+from phe.metadata import PHEMetaData\n+from phe.variant_filters import make_filters, PHEFilterBase, str_to_filters\n+\n+\n+class VCFTemplate(object):\n+ """This is a small hack class for the Template used in generating\n+ VCF file."""\n+\n+ def __init__(self, vcf_reader):\n+ self.infos = vcf_reader.infos\n+ self.formats = vcf_reader.formats\n+ self.filters = vcf_reader.filters\n+ self.alts = vcf_reader.alts\n+ self.contigs = vcf_reader.contigs\n+ self.metadata = vcf_reader.metadata\n+ self._column_headers = vcf_reader._column_headers\n+ self.samples = vcf_reader.samples\n+\n+class VariantSet(object):\n+ """A convenient representation of set of variants.\n+ TODO: Implement iterator and generator for the variant set.\n+ """\n+\n+ _reader = None\n+\n+ def __init__(self, vcf_in, filters=None):\n+ """Constructor of variant set.\n+\n+ Parameters:\n+ -----------\n+ vcf_in: str\n+ Path to the VCF file for loading information.\n+ filters: str or dict, optional\n+ Dictionary or string of the filter:threshold key value pairs.\n+ """\n+ self.vcf_in = vcf_in\n+ self._reader = vcf.Reader(filename=vcf_in)\n+ self.out_template = VCFTemplate(self._reader)\n+\n+ self.filters = []\n+ if filters is not None:\n+ if isinstance(filters, str):\n+ self.filters = str_to_filters(filters)\n+ elif isinstance(filters, dict):\n+ self.filters = make_filters(config=filters)\n+ elif isinstance(filters, list):\n+ self.filters = filters\n+ else:\n+ logging.warn("Could not create filters from %s", filters)\n+ else:\n+ reader = vcf.Reader(filename=self.vcf_in)\n+ filters = {}\n+ for filter_id in reader.filters:\n+ filters.update(PHEFilterBase.decode(filter_id))\n+\n+ if filters:\n+ self.filters = make_filters(config=filters)\n+\n+ self.variants = []\n+\n+ def filter_variants(self, keep_only_snps=True):\n+ """Create a variant """\n+\n+ if self._reader is None:\n+ # Create a reader class from input VCF.\n+ self._reader = vcf.Reader(filename=self.vcf_in)\n+\n+ # get list of existing filters.\n+ existing_filters = {}\n+ removed_filters = []\n+\n+ for filter_id in self._reader.filters:\n+ conf = PHEFilterBase.decode(filter_id)\n+ tuple(conf.keys())\n+ existing_filters.update({tuple(conf.keys()):filter_id})\n+\n+ # Add each filter we are going to use to the record.\n+ # This is needed for writing out proper #FILTER header in VCF.\n+ for record_filter in self.filters:\n+ # We know that each filter has short description method.\n+ short_doc = record_filter.short_desc()\n+ short_doc = short_doc.split(\'\\n\')[0].lstrip()\n+\n+ filter_name = PHEFilterBase.decode(record_filter.filter_name())\n+\n+ # Check if the sample has been filtered for this type of filter\n+ # in the past. If so remove is, because it is going to be refiltered.\n+ if tuple(filter_name) in existing_filters:\n+ logging.info("Removing existing filter: %s", existing_filters[tuple(filter_name)])\n+ removed_filters.append(existing_filters[tuple(filter_name)])\n+ del self._reader.filters[existing_filters[tuple(filter_name)]]\n+\n+ self._reader.filters[record_filter.filter_name()] = _Filter(record_filter.filter_name(), short_doc)\n+\n+ '..b' (default: False).\n+\n+ Returns:\n+ int:\n+ Number of records written.\n+ """\n+ written_variants = 0\n+ with open(vcf_out, "w") as out_vcf:\n+ writer = vcf.Writer(out_vcf, self.out_template)\n+ for record in self.variants:\n+\n+ if only_snps and not record.is_snp:\n+ continue\n+\n+ if only_good and record.FILTER != "PASS" or record.FILTER is None:\n+ continue\n+\n+ writer.write_record(record)\n+ written_variants += 1\n+\n+ return written_variants\n+\n+ def _write_bad_variants(self, vcf_out):\n+ """**PRIVATE:** Write only those records that **haven\'t** passed."""\n+ written_variants = 0\n+ with open(vcf_out, "w") as out_vcf:\n+ writer = vcf.Writer(out_vcf, self.out_template)\n+ for record in self.variants:\n+ if record.FILTER != "PASS" and record.FILTER is not None:\n+ writer.write_record(record)\n+ written_variants += 1\n+ return written_variants\n+\n+ def serialise(self, out_file):\n+ """Save the data in this class to a file for future use/reload.\n+\n+ Parameters:\n+ -----------\n+ out_file: str\n+ path to file where the data should be written to.\n+\n+ Returns:\n+ --------\n+ int:\n+ Number of variants written.\n+ """\n+ written_variants = 0\n+ with open(out_file, "w") as out_vcf:\n+ writer = vcf.Writer(out_vcf, self.out_template)\n+ for record in self.variants:\n+ writer.write_record(record)\n+ written_variants += 1\n+\n+ return written_variants\n+\n+ def update_filters(self, new_filters):\n+ """Update internal filters in the output template."""\n+ for new_filter, filter_data in new_filters.items():\n+ self.out_template.filters[new_filter] = filter_data\n+\n+\n+class VariantCaller(PHEMetaData):\n+ """Abstract class used for access to the implemented variant callers."""\n+\n+ __metaclass__ = abc.ABCMeta\n+\n+ def __init__(self, cmd_options=None):\n+ """Constructor for variant caller.\n+\n+ Parameters:\n+ -----------\n+ cmd_options: str, optional\n+ Command options to pass to the variant command.\n+ """\n+ self.cmd_options = cmd_options\n+\n+ super(VariantCaller, self).__init__()\n+\n+ @abc.abstractmethod\n+ def make_vcf(self, *args, **kwargs):\n+ """Create a VCF from **BAM** file.\n+\n+ Parameters:\n+ -----------\n+ ref: str\n+ Path to the reference file.\n+ bam: str\n+ Path to the indexed **BAM** file for calling variants.\n+ vcf_file: str\n+ path to the VCF file where data will be written to.\n+\n+ Returns:\n+ --------\n+ bool:\n+ True if variant calling was successful, False otherwise.\n+ """\n+ raise NotImplementedError("make_vcf is not implemented yet.")\n+\n+ @abc.abstractmethod\n+ def create_aux_files(self, ref):\n+ """Create needed (if any) auxiliary files.\n+ These files are required for proper functioning of the variant caller.\n+ """\n+ raise NotImplementedError("create_aux_files is not implemeted.")\n+\n+ @abc.abstractmethod\n+ def get_info(self, plain=False):\n+ """Get information about this variant caller."""\n+ raise NotImplementedError("Get info has not been implemented yet."\n+ )\n+ def get_meta(self):\n+ """Get the metadata about this variant caller."""\n+ od = self.get_info()\n+ od["ID"] = "VariantCaller"\n+ return OrderedDict({"PHEVariantMetaData": [od]})\n+\n+ @abc.abstractmethod\n+ def get_version(self):\n+ """Get the version of the underlying command used."""\n+ raise NotImplementedError("Get version has not been implemented yet.")\n' |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec phe/variant/variant_factory.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/phe/variant/variant_factory.py Wed Dec 16 07:32:22 2015 -0500 |
[ |
@@ -0,0 +1,92 @@ +'''Classes and functions for working with variant callers. + +Created on 22 Sep 2015 + +@author: alex +''' +import glob +import inspect +import logging +import os +import sys + +from phe.variant import VariantCaller + +def dynamic_caller_loader(): + """Fancy way of dynamically importing existing variant callers. + + Returns + ------- + dict: + Available variant callers dictionary. Keys are parameters that + can be used to call variants. + """ + + # We assume the caller are in the same directory as THIS file. + variants_dir = os.path.dirname(__file__) + variants_dir = os.path.abspath(variants_dir) + + # This is populated when the module is first imported. + avail_callers = {} + + # Add this directory to the syspath. + sys.path.insert(0, variants_dir) + + # Find all "py" files. + for caller_mod in glob.glob(os.path.join(variants_dir, "*.py")): + + # Derive name of the module where caller is. + caller_mod_file = os.path.basename(caller_mod) + + # Ignore __init__ file, only base class is there. + if caller_mod_file.startswith("__init__"): + continue + + # Import the module with a caller. + mod = __import__(caller_mod_file.replace(".pyc", "").replace(".py", "")) + + # Find all the classes contained in this module. + classes = inspect.getmembers(mod, inspect.isclass) + for cls_name, cls in classes: + # For each class, if it is a sublass of VariantCaller, add it. + if cls_name != "VariantCaller" and issubclass(cls, VariantCaller): + # The name is inherited and defined within each caller. + avail_callers[cls.name] = cls + + sys.path.remove(variants_dir) + + return avail_callers + +_avail_variant_callers = dynamic_caller_loader() + +def available_callers(): + """Return list of available variant callers.""" + return _avail_variant_callers.keys() + +def factory(variant=None, custom_options=None): + """Make an instance of a variant class. + + Parameters: + ----------- + variant: str, optional + Name of the variant class to instantiate. + custom_options: str, optional + Custom options to be passed directly to the implementing class. + + Returns: + -------- + :py:class:`phe.variant.VariantCaller`: + Instance of the :py:class:`phe.variant.VariantCaller` for given + variant name, or None if one couldn't be found. + """ + if variant is not None and isinstance(variant, str): + + variant = variant.lower() + if variant in _avail_variant_callers: + return _avail_variant_callers[variant](cmd_options=custom_options) + else: + logging.error("No implementation for %s mapper.") + return None + + logging.warn("Unknown parameters. Mapper could not be initialised.") + return None |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec phe/variant_filters/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/phe/variant_filters/__init__.py Wed Dec 16 07:32:22 2015 -0500 |
[ |
@@ -0,0 +1,209 @@ +"""Classes and functions for working with variant filters.""" + +from __builtin__ import __import__ +from abc import abstractproperty +import abc +import argparse +import glob +import inspect +import logging +import os +import re +import sys + +import vcf +import vcf.filters +from vcf.parser import _Filter + +IUPAC_CODES = {frozenset(["A", "G"]): "R", + frozenset(["C", "T"]): "Y", + frozenset(["G", "C"]): "S", + frozenset(["A", "T"]): "W", + frozenset(["G", "T"]): "K", + frozenset(["A", "C"]): "M", + frozenset(["C", "G", "T"]): "B", + frozenset(["A", "G", "T"]): "D", + frozenset(["A", "C", "T"]): "H", + frozenset(["A", "C", "G"]): "V" + } + +class PHEFilterBase(vcf.filters.Base): + """Base class for VCF filters.""" + __meta__ = abc.ABCMeta + + magic_sep = ":" + decoder_pattern = re.compile(magic_sep) + + @abc.abstractproperty + def parameter(self): + """Short name of parameter being filtered.""" + return self.parameter + + @abc.abstractproperty + def _default_threshold(self): + """Default threshold for filtering.""" + return self._default_threshold + + def __init__(self, args): + super(PHEFilterBase, self).__init__(args) + + # Change the threshold to custom gq value. + self.threshold = self._default_threshold + + if isinstance(args, dict): + self.threshold = args.get(self.parameter) + + def __str__(self): + return self.filter_name() + + @abc.abstractmethod + def short_desc(self): + """Short description of the filter (included in VCF).""" + raise NotImplementedError("Get short description is not implemented.") + + def get_config(self): + """This is used for reconstructing filter.""" + return {self.parameter: self.threshold} + + def filter_name(self): + """Create filter names by their parameter separated by magic. + E.g. if filter parameter is ad_ratio and threshold is 0.9 then + ad_ratio:0.9 if the filter name. + """ + return "%s%s%s" % (self.parameter, self.magic_sep, self.threshold) + + @staticmethod + def decode(filter_id): + """Decode name of filter.""" + conf = {} + + if PHEFilterBase.magic_sep in filter_id: + info = PHEFilterBase.decoder_pattern.split(filter_id) + assert len(info) == 2 + conf[info[0]] = info[1] + return conf + + def is_gap(self): + return False + + def is_n(self): + return True + + @staticmethod + def call_concensus(record): + extended_code = "N" + try: + sample_ad = set([str(c) for c in record.ALT] + [record.REF]) + + + for code, cov in IUPAC_CODES.items(): + if sample_ad == cov: + extended_code = code + break + except AttributeError: + extended_code = "N" + + return extended_code + +def dynamic_filter_loader(): + """Fancy way of dynamically importing existing filters. + + Returns + ------- + dict: + Available filters dictionary. Keys are parameters that + can be supplied to the filters. + """ + + # We assume the filters are in the same directory as THIS file. + filter_dir = os.path.dirname(__file__) + filter_dir = os.path.abspath(filter_dir) + + # This is populated when the module is first imported. + avail_filters = {} + + # Add this directory to the syspath. + sys.path.insert(0, filter_dir) + + # Find all "py" files. + for filter_mod in glob.glob(os.path.join(filter_dir, "*.py")): + + # Derive name of the module where filter is. + filter_mod_file = os.path.basename(filter_mod) + + # Ignore this file, obviously. + if filter_mod_file.startswith("__init__"): + continue + + # Import the module with a filter. + mod = __import__(filter_mod_file.replace(".pyc", "").replace(".py", "")) + + # Find all the classes contained in this module. + classes = inspect.getmembers(mod, inspect.isclass) + for cls_name, cls in classes: + # For each class, if it is a sublass of PHEFilterBase, add it. + if cls_name != "PHEFilterBase" and issubclass(cls, PHEFilterBase): + # The parameters are inherited and defined within each filter. + avail_filters[cls.parameter] = cls + + sys.path.remove(filter_dir) + + return avail_filters + +_avail_filters = dynamic_filter_loader() + +def available_filters(): + """Return list of available filters.""" + return _avail_filters.keys() + +def str_to_filters(filters): + """Convert from filter string to array of filters. + E.g. ad_ration:0.9,min_depth:5 + + Parameters: + ----------- + filters: str + String version of filters, separated by comma. + + Returns: + -------- + list: + List of :py:class:`phe.variant_filters.PHEFilterBase` instances. + """ + + config = {} + for kv_pair in filters.split(","): + pair = kv_pair.split(":") + assert len(pair) == 2, "Filters should be separated by ':' %s" % kv_pair + + # We don't care about casting them to correct type because Filters + # will do it for us. + config[pair[0]] = pair[1] + + return make_filters(config) + +def make_filters(config): + """Create a list of filters from *config*. + + Parameters: + ----------- + config: dict, optional + Dictionary with parameter: value pairs. For each parameter, an + appropriate Filter will be found and instanciated. + + Returns: + -------- + list: + List of :py:class:`PHEFilterBase` filters. + """ + filters = [] + + if config: + for custom_filter in config: + if custom_filter in _avail_filters: + filters.append(_avail_filters[custom_filter](config)) + else: + logging.warn("Could not find appropriate filter for %s", + custom_filter) + + return filters |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec test-data/1_short.vcf --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/1_short.vcf Wed Dec 16 07:32:22 2015 -0500 |
[ |
b'@@ -0,0 +1,50000 @@\n+##fileformat=VCFv4.1\n+##FILTER=<ID=LowQual,Description="Low quality">\n+##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">\n+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">\n+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">\n+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n+##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">\n+##GATKCommandLine=<ID=SelectVariants,Version=2.6-5-gba531bd,Date="Thu May 22 09:24:36 BST 2014",Epoch=1400747076888,CommandLineOptions="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/phengs/galaxy/database/tmp/tmp-gatk-lMh4vi/input_variant.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] keepIDs=null fullyDecode=false forceGenotypesDecode=false justRead=false maxIndelSize=2147483647 ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">\n+##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-5-gba531bd,Date="Thu May 22 09:10:24 BST 2014",Epoch=1400746224902,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input_0.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-x7S0pF/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=NONE downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOrigina'..b'.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49922\t.\tT\t.\t434.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49923\t.\tC\t.\t434.23\t.\tAN=2;DP=143;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n+gi|15829254|ref|NC_002695.1|\t49924\t.\tA\t.\t428.23\t.\tAN=2;DP=141;MQ=60.00;MQ0=0\tGT:DP\t0/0:141\n+gi|15829254|ref|NC_002695.1|\t49925\t.\tA\t.\t440.23\t.\tAN=2;DP=149;MQ=60.00;MQ0=0\tGT:DP\t0/0:149\n+gi|15829254|ref|NC_002695.1|\t49926\t.\tT\t.\t440.23\t.\tAN=2;DP=152;MQ=60.00;MQ0=0\tGT:DP\t0/0:152\n+gi|15829254|ref|NC_002695.1|\t49927\t.\tG\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49928\t.\tT\t.\t452.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n+gi|15829254|ref|NC_002695.1|\t49929\t.\tC\t.\t443.23\t.\tAN=2;DP=153;MQ=60.00;MQ0=0\tGT:DP\t0/0:153\n+gi|15829254|ref|NC_002695.1|\t49930\t.\tG\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49931\t.\tA\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49932\t.\tT\t.\t461.23\t.\tAN=2;DP=159;MQ=60.00;MQ0=0\tGT:DP\t0/0:159\n+gi|15829254|ref|NC_002695.1|\t49933\t.\tG\t.\t449.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49934\t.\tA\t.\t455.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49935\t.\tA\t.\t452.23\t.\tAN=2;DP=156;MQ=60.00;MQ0=0\tGT:DP\t0/0:156\n+gi|15829254|ref|NC_002695.1|\t49936\t.\tG\t.\t455.23\t.\tAN=2;DP=158;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n+gi|15829254|ref|NC_002695.1|\t49937\t.\tA\t.\t440.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49938\t.\tG\t.\t449.23\t.\tAN=2;DP=157;MQ=60.00;MQ0=0\tGT:DP\t0/0:157\n+gi|15829254|ref|NC_002695.1|\t49939\t.\tC\t.\t443.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49940\t.\tA\t.\t440.23\t.\tAN=2;DP=154;MQ=60.00;MQ0=0\tGT:DP\t0/0:154\n+gi|15829254|ref|NC_002695.1|\t49941\t.\tT\t.\t449.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49942\t.\tC\t.\t446.23\t.\tAN=2;DP=155;MQ=60.00;MQ0=0\tGT:DP\t0/0:155\n+gi|15829254|ref|NC_002695.1|\t49943\t.\tC\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49944\t.\tG\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49945\t.\tC\t.\t431.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49946\t.\tA\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49947\t.\tC\t.\t431.23\t.\tAN=2;DP=145;MQ=60.00;MQ0=0\tGT:DP\t0/0:145\n+gi|15829254|ref|NC_002695.1|\t49948\t.\tA\t.\t404.23\t.\tAN=2;DP=137;MQ=60.00;MQ0=0\tGT:DP\t0/0:137\n+gi|15829254|ref|NC_002695.1|\t49949\t.\tT\t.\t425.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:143\n+gi|15829254|ref|NC_002695.1|\t49950\t.\tT\t.\t419.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49951\t.\tG\t.\t431.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49952\t.\tT\t.\t437.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49953\t.\tT\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49954\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49955\t.\tT\t.\t428.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49956\t.\tG\t.\t434.23\t.\tAN=2;DP=146;MQ=60.00;MQ0=0\tGT:DP\t0/0:146\n+gi|15829254|ref|NC_002695.1|\t49957\t.\tA\t.\t428.23\t.\tAN=2;DP=144;MQ=60.00;MQ0=0\tGT:DP\t0/0:144\n+gi|15829254|ref|NC_002695.1|\t49958\t.\tA\t.\t416.23\t.\tAN=2;DP=140;MQ=60.00;MQ0=0\tGT:DP\t0/0:140\n+gi|15829254|ref|NC_002695.1|\t49959\t.\tA\t.\t407.23\t.\tAN=2;DP=138;MQ=60.00;MQ0=0\tGT:DP\t0/0:138\n+gi|15829254|ref|NC_002695.1|\t49960\t.\tG\t.\t395.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n+gi|15829254|ref|NC_002695.1|\t49961\t.\tC\t.\t404.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n+gi|15829254|ref|NC_002695.1|\t49962\t.\tC\t.\t392.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n+gi|15829254|ref|NC_002695.1|\t49963\t.\tG\t.\t398.23\t.\tAN=2;DP=134;MQ=60.00;MQ0=0\tGT:DP\t0/0:134\n+gi|15829254|ref|NC_002695.1|\t49964\t.\tA\t.\t398.23\t.\tAN=2;DP=136;MQ=60.00;MQ0=0\tGT:DP\t0/0:136\n' |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec test-data/2_short.vcf --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/2_short.vcf Wed Dec 16 07:32:22 2015 -0500 |
[ |
b'@@ -0,0 +1,50000 @@\n+##fileformat=VCFv4.1\n+##FILTER=<ID=LowQual,Description="Low quality">\n+##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">\n+##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">\n+##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">\n+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n+##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">\n+##GATKCommandLine=<ID=SelectVariants,Version=2.6-5-gba531bd,Date="Thu May 22 09:15:54 BST 2014",Epoch=1400746554393,CommandLineOptions="analysis_type=SelectVariants input_file=[] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-4mpB9s/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 allow_bqsr_on_reduced_bams_despite_repeated_warnings=false validation_strictness=SILENT remove_program_records=false keep_program_records=false unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=2 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/phengs/galaxy/database/tmp/tmp-gatk-4mpB9s/input_variant.vcf) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] select_expressions=[] excludeNonVariants=false excludeFiltered=false restrictAllelesTo=ALL keepOriginalAC=false mendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] keepIDs=null fullyDecode=false forceGenotypesDecode=false justRead=false maxIndelSize=2147483647 ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">\n+##GATKCommandLine=<ID=UnifiedGenotyper,Version=2.6-5-gba531bd,Date="Thu May 22 09:06:54 BST 2014",Epoch=1400746014908,CommandLineOptions="analysis_type=UnifiedGenotyper input_file=[/phengs/galaxy/database/tmp/tmp-gatk-Gz3f1A/gatk_input_0.bam] read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/phengs/galaxy/database/tmp/tmp-gatk-Gz3f1A/gatk_input.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=NONE downsample_to_fraction=null downsample_to_coverage=null baq=OFF baqGapOpenPenalty=40.0 fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOrigina'..b'AN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49921\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49922\t.\tT\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49923\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49924\t.\tA\t.\t151.23\t.\tAN=2;DP=45;MQ=60.00;MQ0=0\tGT:DP\t0/0:45\n+gi|15829254|ref|NC_002695.1|\t49925\t.\tA\t.\t163.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49926\t.\tT\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49927\t.\tG\t.\t160.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49928\t.\tT\t.\t160.23\t.\tAN=2;DP=52;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49929\t.\tC\t.\t160.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49930\t.\tG\t.\t163.23\t.\tAN=2;DP=53;MQ=60.00;MQ0=0\tGT:DP\t0/0:53\n+gi|15829254|ref|NC_002695.1|\t49931\t.\tA\t.\t166.23\t.\tAN=2;DP=54;MQ=60.00;MQ0=0\tGT:DP\t0/0:54\n+gi|15829254|ref|NC_002695.1|\t49932\t.\tT\t.\t169.23\t.\tAN=2;DP=55;MQ=60.00;MQ0=0\tGT:DP\t0/0:55\n+gi|15829254|ref|NC_002695.1|\t49933\t.\tG\t.\t163.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49934\t.\tA\t.\t166.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49935\t.\tA\t.\t169.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49936\t.\tG\t.\t169.23\t.\tAN=2;DP=51;MQ=60.00;MQ0=0\tGT:DP\t0/0:51\n+gi|15829254|ref|NC_002695.1|\t49937\t.\tA\t.\t163.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49938\t.\tG\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49939\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49940\t.\tA\t.\t151.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49941\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49942\t.\tC\t.\t160.23\t.\tAN=2;DP=50;MQ=60.00;MQ0=0\tGT:DP\t0/0:50\n+gi|15829254|ref|NC_002695.1|\t49943\t.\tC\t.\t157.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49944\t.\tG\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49945\t.\tC\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49946\t.\tA\t.\t154.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49947\t.\tC\t.\t154.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49948\t.\tA\t.\t151.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49949\t.\tT\t.\t151.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49950\t.\tT\t.\t148.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49951\t.\tG\t.\t154.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49952\t.\tT\t.\t157.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49953\t.\tT\t.\t154.23\t.\tAN=2;DP=49;MQ=60.00;MQ0=0\tGT:DP\t0/0:49\n+gi|15829254|ref|NC_002695.1|\t49954\t.\tG\t.\t157.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49955\t.\tT\t.\t151.23\t.\tAN=2;DP=48;MQ=60.00;MQ0=0\tGT:DP\t0/0:48\n+gi|15829254|ref|NC_002695.1|\t49956\t.\tG\t.\t154.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49957\t.\tA\t.\t151.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49958\t.\tA\t.\t148.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49959\t.\tA\t.\t148.23\t.\tAN=2;DP=47;MQ=60.00;MQ0=0\tGT:DP\t0/0:47\n+gi|15829254|ref|NC_002695.1|\t49960\t.\tG\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49961\t.\tC\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49962\t.\tC\t.\t148.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49963\t.\tG\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n+gi|15829254|ref|NC_002695.1|\t49964\t.\tA\t.\t145.23\t.\tAN=2;DP=46;MQ=60.00;MQ0=0\tGT:DP\t0/0:46\n' |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec test-data/testresult.fa --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/testresult.fa Wed Dec 16 07:32:22 2015 -0500 |
b |
@@ -0,0 +1,6 @@ +>1_short +TTCTTCATGA +>2_short +TTTTACACGA +>reference +CCCATAGCAG |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Wed Dec 16 07:32:22 2015 -0500 |
b |
@@ -0,0 +1,21 @@ +<?xml version="1.0"?> +<tool_dependency> + <package name="python" version="2.7.10"> + <repository changeset_revision="0339c4a9b87b" name="package_python_2_7_10" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu" /> + </package> + <package name="pyvcf" version="0.6.7"> + <repository changeset_revision="f6d05f9a22bf" name="package_python_2_7_pyvcf_0_6_7" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/> + </package> + <package name="pyyaml" version="3.11"> + <repository changeset_revision="99267d131c05" name="package_python_2_7_pyyaml_3_11" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/> + </package> + <package name="bintrees" version="2.0.2"> + <repository changeset_revision="1d16207341c7" name="package_python_2_7_bintrees_2_0_2" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/> + </package> + <package name="biopython" version="1.66"> + <repository changeset_revision="5d5355863287" name="package_python_2_7_biopython_1_66" owner="ulfschaefer" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/> + </package> + <package name="matplotlib" version="1.4"> + <repository changeset_revision="f7424e1cf115" name="package_python_2_7_matplotlib_1_4" owner="iuc" prior_installation_required="True" toolshed="https://toolshed.g2.bx.psu.edu"/> + </package> +</tool_dependency> |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec vcfs2fasta.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/vcfs2fasta.py Wed Dec 16 07:32:22 2015 -0500 |
[ |
b'@@ -0,0 +1,415 @@\n+#!/usr/bin/env python\n+\'\'\'\n+Merge SNP data from multiple VCF files into a single fasta file.\n+\n+Created on 5 Oct 2015\n+\n+@author: alex\n+\'\'\'\n+import argparse\n+from collections import OrderedDict\n+import glob\n+import itertools\n+import logging\n+import os\n+\n+from Bio import SeqIO\n+from bintrees import FastRBTree\n+\n+# Try importing the matplotlib and numpy for stats.\n+try:\n+ from matplotlib import pyplot as plt\n+ import numpy\n+ can_stats = True\n+except ImportError:\n+ can_stats = False\n+\n+import vcf\n+\n+from phe.variant_filters import IUPAC_CODES\n+\n+\n+def plot_stats(pos_stats, total_samples, plots_dir="plots", discarded={}):\n+ if not os.path.exists(plots_dir):\n+ os.makedirs(plots_dir)\n+\n+ for contig in pos_stats:\n+\n+ plt.style.use(\'ggplot\')\n+\n+ x = numpy.array([pos for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+ y = numpy.array([ float(pos_stats[contig][pos]["mut"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, []) ])\n+\n+ f, (ax1, ax2, ax3, ax4) = plt.subplots(4, sharex=True, sharey=True)\n+ f.set_size_inches(12, 15)\n+ ax1.plot(x, y, \'ro\')\n+ ax1.set_title("Fraction of samples with SNPs")\n+ plt.ylim(0, 1.1)\n+\n+ y = numpy.array([ float(pos_stats[contig][pos]["N"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+ ax2.plot(x, y, \'bo\')\n+ ax2.set_title("Fraction of samples with Ns")\n+\n+ y = numpy.array([ float(pos_stats[contig][pos]["mix"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+ ax3.plot(x, y, \'go\')\n+ ax3.set_title("Fraction of samples with mixed bases")\n+\n+ y = numpy.array([ float(pos_stats[contig][pos]["gap"]) / total_samples for pos in pos_stats[contig] if pos not in discarded.get(contig, [])])\n+ ax4.plot(x, y, \'yo\')\n+ ax4.set_title("Fraction of samples with uncallable genotype (gap)")\n+\n+ plt.savefig(os.path.join(plots_dir, "%s.png" % contig), dpi=100)\n+\n+def get_mixture(record, threshold):\n+ mixtures = {}\n+ try:\n+ if len(record.samples[0].data.AD) > 1:\n+\n+ total_depth = sum(record.samples[0].data.AD)\n+ # Go over all combinations of touples.\n+ for comb in itertools.combinations(range(0, len(record.samples[0].data.AD)), 2):\n+ i = comb[0]\n+ j = comb[1]\n+\n+ alleles = list()\n+\n+ if 0 in comb:\n+ alleles.append(str(record.REF))\n+\n+ if i != 0:\n+ alleles.append(str(record.ALT[i - 1]))\n+ mixture = record.samples[0].data.AD[i]\n+ if j != 0:\n+ alleles.append(str(record.ALT[j - 1]))\n+ mixture = record.samples[0].data.AD[j]\n+\n+ ratio = float(mixture) / total_depth\n+ if ratio == 1.0:\n+ logging.debug("This is only designed for mixtures! %s %s %s %s", record, ratio, record.samples[0].data.AD, record.FILTER)\n+\n+ if ratio not in mixtures:\n+ mixtures[ratio] = []\n+ mixtures[ratio].append(alleles.pop())\n+\n+ elif ratio >= threshold:\n+ try:\n+ code = IUPAC_CODES[frozenset(alleles)]\n+ if ratio not in mixtures:\n+ mixtures[ratio] = []\n+ mixtures[ratio].append(code)\n+ except KeyError:\n+ logging.warn("Could not retrieve IUPAC code for %s from %s", alleles, record)\n+ except AttributeError:\n+ mixtures = {}\n+\n+ return mixtures\n+\n+def print_stats(stats, pos_stats, total_vars):\n+ for contig in stats:\n+ for sample, info in stats[contig].items():\n+ print "%s,%i,%i" % (sample, len(info.get("n_pos", [])), total_vars)\n+\n'..b'n_ratio = float(len(sample_stats[contig][sample]["n_pos"])) / len(avail_pos[contig])\n+ if sample_n_ratio > args.sample_Ns:\n+ for pos in sample_stats[contig][sample]["n_pos"]:\n+ pos_stats[contig][pos]["N"] -= 1\n+\n+ logging.info("Removing %s due to high Ns in sample: %s", sample , sample_n_ratio)\n+\n+ delete_samples.append(sample)\n+\n+ samples = [sample for sample in samples if sample not in delete_samples]\n+ snp_positions = []\n+ with open(args.out, "w") as fp:\n+\n+ for sample in samples:\n+ sample_seq = ""\n+ for contig in contigs:\n+ if contig in avail_pos:\n+ if args.reference:\n+ positions = xrange(1, len(args.reference[contig]) + 1)\n+ else:\n+ positions = avail_pos[contig].keys()\n+ for pos in positions:\n+ if pos in avail_pos[contig]:\n+ if not args.column_Ns or float(pos_stats[contig][pos]["N"]) / len(samples) < args.column_Ns and \\\n+ float(pos_stats[contig][pos]["-"]) / len(samples) < args.column_Ns:\n+ sample_seq += all_data[contig][sample][pos]\n+ else:\n+ if contig not in discarded:\n+ discarded[contig] = []\n+ discarded[contig].append(pos)\n+ elif args.reference:\n+ sample_seq += args.reference[contig][pos - 1]\n+ elif args.reference:\n+ sample_seq += args.reference[contig]\n+\n+ fp.write(">%s\\n%s\\n" % (sample, sample_seq))\n+ # Do the same for reference data.\n+ ref_snps = ""\n+\n+ for contig in contigs:\n+ if contig in avail_pos:\n+ if args.reference:\n+ positions = xrange(1, len(args.reference[contig]) + 1)\n+ else:\n+ positions = avail_pos[contig].keys()\n+ for pos in positions:\n+ if pos in avail_pos[contig]:\n+ if not args.column_Ns or float(pos_stats[contig][pos]["N"]) / len(samples) < args.column_Ns and \\\n+ float(pos_stats[contig][pos]["-"]) / len(samples) < args.column_Ns:\n+\n+ ref_snps += str(avail_pos[contig][pos])\n+ snp_positions.append((contig, pos,))\n+ elif args.reference:\n+ ref_snps += args.reference[contig][pos - 1]\n+ elif args.reference:\n+ ref_snps += args.reference[contig]\n+\n+ fp.write(">reference\\n%s\\n" % ref_snps)\n+\n+ if can_stats and args.with_stats:\n+ with open(args.with_stats, "wb") as fp:\n+ fp.write("contig\\tposition\\tmutations\\tn_frac\\n")\n+ for values in snp_positions:\n+ fp.write("%s\\t%s\\t%s\\t%s\\n" % (values[0],\n+ values[1],\n+ float(pos_stats[values[0]][values[1]]["mut"]) / len(args.input),\n+ float(pos_stats[values[0]][values[1]]["N"]) / len(args.input)))\n+ plot_stats(pos_stats, len(samples), discarded=discarded, plots_dir=os.path.abspath(args.plots_dir))\n+ # print_stats(sample_stats, pos_stats, total_vars=len(avail_pos[contig]))\n+\n+ total_discarded = 0\n+ for _, i in discarded.items():\n+ total_discarded += len(i)\n+ logging.info("Discarded total of %i poor quality columns", float(total_discarded) / len(args.input))\n+ return 0\n+\n+if __name__ == \'__main__\':\n+ import time\n+\n+# with PyCallGraph(output=graphviz):\n+# T0 = time.time()\n+ r = main()\n+# T1 = time.time()\n+\n+# print "Time taken: %i" % (T1 - T0)\n+ exit(r)\n' |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec vcfs2fasta.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/vcfs2fasta.sh Wed Dec 16 07:32:22 2015 -0500 |
[ |
@@ -0,0 +1,51 @@ +#!/bin/bash + +echo $@ + +OUTPUT=$1 +shift +WITHMIXTURES=$1 +shift +COLUMNNS=$1 +shift +SAMPLENS=$1 +shift +REFERENCE=$1 +shift +INCLUDE=$1 +shift +EXCLUDE=$1 +shift +INPUT=$@ + +DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +export PATH=$PATH:$DIR + +CMD="vcfs2fasta.py --out $OUTPUT --input $INPUT" + +if [ $WITHMIXTURES != "NOTTHERE" ]; then + CMD="$CMD --with-mixtures $WITHMIXTURES" +fi + +if [ $COLUMNNS != "NOTTHERE" ]; then + CMD="$CMD --column-Ns $COLUMNNS" +fi + +if [ $SAMPLENS != "NOTTHERE" ]; then + CMD="$CMD --sample-Ns $SAMPLENS" +fi + +if [ $REFERENCE != "NOTTHERE" ]; then + CMD="$CMD --reference $REFERENCE" +fi + +if [ $INCLUDE != "NOTTHERE" ]; then + CMD="$CMD --include INCLUDE" +fi + +if [ $EXCLUDE != "NOTTHERE" ]; then + CMD="$CMD --exclude EXCLUDE" +fi + +echo $CMD +eval $CMD |
b |
diff -r fd42dbe82c12 -r 1d0bc21232ec vcfs2fasta.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/vcfs2fasta.xml Wed Dec 16 07:32:22 2015 -0500 |
b |
@@ -0,0 +1,123 @@ +<tool id="vcfs2fasta" name="VCFs to fasta" version="1.0"> + <description>Takes a set of VCF files and outputs a multi fasta file with only the variant positions.</description> + <requirements> + <requirement type="package" version="2.7.10">python</requirement> + <requirement type="package" version="0.6.7">pyvcf</requirement> + <requirement type="package" version="3.11">pyyaml</requirement> + <requirement type="package" version="2.0.2">bintrees</requirement> + <requirement type="package" version="1.66">biopython</requirement> + <requirement type="package" version="1.4">matplotlib</requirement> + </requirements> + <stdio> + <!-- Assume anything other than zero is an error --> + <exit_code range="1:" /> + <exit_code range=":-1" /> + </stdio> + <command interpreter="bash"> + vcfs2fasta.sh + $output + #if str($mix_cond.mix) == "yes": + $mix_cond.mix_value + #else + NOTTHERE + #end if + #if str($cols_cond.cols) == "yes": + $cols_cond.column_ns + #else + NOTTHERE + #end if + #if str($sample_cond.sample) == "yes": + $sample_cond.sample_ns + #else + NOTTHERE + #end if + #if str($reference_cond.reference) == "yes": + $reference_cond.ref_fa + #else + NOTTHERE + #end if + #if str($include_cond.include) == "yes": + $include_cond.in_bed + #else + NOTTHERE + #end if + #if str($exclude_cond.exclude) == "yes": + $exclude_cond.ex_bed + #else + NOTTHERE + #end if + #for $i, $input_vcf in enumerate( $input_vcfs ): + "${input_vcf}" + #end for + </command> + + <inputs> + <param name="input_vcfs" type="data" multiple="true" format="vcf" label="Input VCF file(s)" /> + <conditional name="mix_cond"> + <param name="mix" type="select" label="With Mixtures" help="Specify this option with a threshold to output mixtures above this threshold."> + <option value="yes">Specify</option> + <option value="no" selected="true">Do not specify</option> + </param> + <when value="yes"> + <param name="mix_value" type="float" value="0.5" label="Mixture value" /> + </when> + </conditional> + <conditional name="cols_cond"> + <param name="cols" type="select" label="Column Ns" help="Keeps columns with fraction of Ns above specified threshold."> + <option value="yes">Specify</option> + <option value="no" selected="true">Do not specify</option> + </param> + <when value="yes"> + <param name="column_ns" type="float" value="0.5" label="Column Ns value" /> + </when> + </conditional> + <conditional name="sample_cond"> + <param name="sample" type="select" label="Sample Ns" help="Keeps samples with fraction of Ns above specified threshold."> + <option value="yes">Specify</option> + <option value="no" selected="true">Do not specify</option> + </param> + <when value="yes"> + <param name="sample_ns" type="float" value="0.5" label="Sample Ns value" /> + </when> + </conditional> + <conditional name="reference_cond"> + <param name="reference" type="select" label="Reference genome file" help="If path to reference specified, then whole genome will be outputted."> + <option value="yes">Specify</option> + <option value="no" selected="true">Do not specify</option> + </param> + <when value="yes"> + <param name="ref_fa" type="data" format="fasta" label="Reference fasta file" help="Fasta format"/> + </when> + </conditional> + <conditional name="include_cond"> + <param name="include" type="select" label="Include region" help="Specify regions to include in a bed file."> + <option value="yes">Specify</option> + <option value="no" selected="true">Do not specify</option> + </param> + <when value="yes"> + <param name="in_bed" type="data" format="bed" label="Include regions bed file" help="bed format"/> + </when> + </conditional> + <conditional name="exclude_cond"> + <param name="exclude" type="select" label="Exclude region" help="Specify regions to exclude in a bed file."> + <option value="yes">Specify</option> + <option value="no" selected="true">Do not specify</option> + </param> + <when value="yes"> + <param name="ex_bed" type="data" format="bed" label="Exclude regions bed file" help="bed format"/> + </when> + </conditional> + </inputs> + + <outputs> + <data format="fasta" name="output" label="${tool.name} on ${on_string}: FASTA file" /> + </outputs> + <test> + <param name="input_vcfs" value="1_short.vcf" ftype="vcf" /> + <param name="input_vcfs" value="2_short.vcf" ftype="vcf" /> + <output name="output" file="testresult.fa" ftype="fasta" /> + </test> + <help> + + </help> +</tool> |