Mercurial > repos > althonos > gecco
changeset 19:cc91d730cc4f draft
Fix syntax of Galaxy script for GECCO
author | althonos |
---|---|
date | Mon, 16 Jan 2023 18:35:56 +0000 |
parents | 3dd71eaa2909 |
children | 64b724dd8d04 |
files | CHANGELOG.md gecco.xml test-data/BGC0001866.1_cluster_1.gbk test-data/clusters.tsv test-data/features.tsv test-data/sideload.json |
diffstat | 6 files changed, 267 insertions(+), 125 deletions(-) [+] |
line wrap: on
line diff
--- a/CHANGELOG.md Wed Aug 10 12:36:38 2022 +0000 +++ b/CHANGELOG.md Mon Jan 16 18:35:56 2023 +0000 @@ -5,11 +5,31 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). ## [Unreleased] -[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.5...master +[Unreleased]: https://github.com/zellerlab/GECCO/compare/v0.9.6...master + + +## [v0.9.6] - 2023-01-11 +[v0.9.6]: https://github.com/zellerlab/GECCO/compare/v0.9.5...v0.9.6 + +### Added +- Gene Ontology annotations to `gecco.interpro` local metadata. +- Reference to Gene Ontology terms and derived functions to `gecco.model.Domain` objects. +- Gene color based on predicted function in `gecco.model.Gene.to_seq_feature`. + +### Fixed +- Missing `gzip` import in the CLI preventing usage of gzip-compressed inputs. +- Invalid coordinates of domains found in reverse-strand genes. +- Detection of entry points with `importlib.metadata` on older Python versions. + +### Changed +- `bgc_id` columns of cluster tables are renamed `cluster_id`. +- `gecco.model.ProductType` is renamed to `gecco.model.ClusterType`. +- Bumped `pyrodigal` dependency to `v2.0`. +- Bumped `pyhmmer` dependency to `v0.7`. ## [v0.9.5] - 2022-08-10 -[v0.9.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.4...v0.9.5 +[v0.9.5]: https://github.com/zellerlab/GECCO/compare/v0.9.4...v0.9.5 ### Added - `gecco predict` command to predict BGCs from an annotated genome. @@ -21,7 +41,7 @@ ## [v0.9.4] - 2022-05-31 -[v0.9.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.3...v0.9.4 +[v0.9.4]: https://github.com/zellerlab/GECCO/compare/v0.9.3...v0.9.4 ### Added - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`. @@ -39,7 +59,7 @@ ## [v0.9.3] - 2022-05-13 -[v0.9.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.2...v0.9.3 +[v0.9.3]: https://github.com/zellerlab/GECCO/compare/v0.9.2...v0.9.3 ### Changed - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`. @@ -49,20 +69,20 @@ ## [v0.9.2] - 2022-04-11 -[v0.9.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1...v0.9.2 +[v0.9.2]: https://github.com/zellerlab/GECCO/compare/v0.9.1...v0.9.2 ### Added - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`. ## [v0.9.1] - 2022-04-05 -[v0.9.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha4...v0.9.1 +[v0.9.1]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha4...v0.9.1 ### Changed - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window. - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`. ## [v0.9.1-alpha4] - 2022-03-31 -[v0.9.1-alpha4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4 +[v0.9.1-alpha4]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4 Retrain internal model with: ``` @@ -74,7 +94,7 @@ ``` ## [v0.9.1-alpha3] - 2022-03-23 -[v0.9.1-alpha3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3 +[v0.9.1-alpha3]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3 ### Added - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains. @@ -85,33 +105,33 @@ - `gecco train` expects a gene table instead of a GFF file for the gene coordinates. ## [v0.9.1-alpha2] - 2022-03-23 -[v0.9.1-alpha2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2 +[v0.9.1-alpha2]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2 ### Fixed - `TypeClassifier.trained` not being able to read unknown types from type tables. ## [v0.9.1-alpha1] - 2022-03-20 -[v0.9.1-alpha1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.10...v0.9.1-alpha1 +[v0.9.1-alpha1]: https://github.com/zellerlab/GECCO/compare/v0.8.10...v0.9.1-alpha1 Candidate release with support for a sliding window in the CRF prediction algorithm. ## [v0.8.10] - 2022-02-23 -[v0.8.10]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.9...v0.8.10 +[v0.8.10]: https://github.com/zellerlab/GECCO/compare/v0.8.9...v0.8.10 ### Fixed - `--antismash-sideload` flag of `gecco run` causing command to crash. ## [v0.8.9] - 2022-02-22 -[v0.8.9]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.8...v0.8.9 +[v0.8.9]: https://github.com/zellerlab/GECCO/compare/v0.8.8...v0.8.9 ### Removed - Prediction and support for the *Other* biosynthetic type of MIBiG clusters. ## [v0.8.8] - 2022-02-21 -[v0.8.8]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.7...v0.8.8 +[v0.8.8]: https://github.com/zellerlab/GECCO/compare/v0.8.7...v0.8.8 ### Fixed - `ClusterRefiner` filtering method for edge genes not working as intended. - `gecco run` and `gecco annotate` commands crashing on missing input files instead of nicely rendering the error. ## [v0.8.7] - 2022-02-18 -[v0.8.7]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.6...v0.8.7 +[v0.8.7]: https://github.com/zellerlab/GECCO/compare/v0.8.6...v0.8.7 ### Fixed - `interpro.json` metadata file not being included in distribution files. - Missing docstring for `Protein.with_domains` method. @@ -119,7 +139,7 @@ - Bump minimum `scikit-learn` version to `v1.0` for Python3.7+. ## [v0.8.6] - 2022-02-17 - YANKED -[v0.8.6]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.5...v0.8.6 +[v0.8.6]: https://github.com/zellerlab/GECCO/compare/v0.8.5...v0.8.6 ### Added - CLI flag for enabling region masking for contigs processed by Prodigal. - CLI flag for controlling region distance used for edge distance filtering. @@ -133,12 +153,12 @@ - Progress bar messages are now in consistent format. ## [v0.8.5] - 2021-11-21 -[v0.8.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.4...v0.8.5 +[v0.8.5]: https://github.com/zellerlab/GECCO/compare/v0.8.4...v0.8.5 ### Added - Minimal compatibility support for running GECCO inside of Galaxy workflows. ## [v0.8.4] - 2021-09-26 -[v0.8.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...v0.8.4 +[v0.8.4]: https://github.com/zellerlab/GECCO/compare/v0.8.3-post1...v0.8.4 ### Fixed - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)). - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input. @@ -146,17 +166,17 @@ - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported. ## [v0.8.3-post1] - 2021-08-23 -[v0.8.3-post1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3...v0.8.3-post1 +[v0.8.3-post1]: https://github.com/zellerlab/GECCO/compare/v0.8.3...v0.8.3-post1 ### Fixed - Wrong default value for `--threshold` being shown in `gecco run` help message. ## [v0.8.3] - 2021-08-23 -[v0.8.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.2...v0.8.3 +[v0.8.3]: https://github.com/zellerlab/GECCO/compare/v0.8.2...v0.8.3 ### Changed - Default probability threshold for segmentation to 0.3 (from 0.4). ## [v0.8.2] - 2021-07-31 -[v0.8.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.1...v0.8.2 +[v0.8.2]: https://github.com/zellerlab/GECCO/compare/v0.8.1...v0.8.2 ### Fixed - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class. ### Changed @@ -164,7 +184,7 @@ - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier. ## [v0.8.1] - 2021-07-29 -[v0.8.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.0...v0.8.1 +[v0.8.1]: https://github.com/zellerlab/GECCO/compare/v0.8.0...v0.8.1 ### Changed - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`. ### Fixed @@ -173,7 +193,7 @@ - Missing documentation for the `strand` attribute of `gecco.model.Gene`. ## [v0.8.0] - 2021-07-03 -[v0.8.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.7.0...v0.8.0 +[v0.8.0]: https://github.com/zellerlab/GECCO/compare/v0.7.0...v0.8.0 ### Changed - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0. - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling. @@ -195,7 +215,7 @@ - Tigrfam domains, which is not improving performance on the new training data. ## [v0.7.0] - 2021-05-31 -[v0.7.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.3...v0.7.0 +[v0.7.0]: https://github.com/zellerlab/GECCO/compare/v0.6.3...v0.7.0 ### Added - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow. - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand. @@ -207,7 +227,7 @@ - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command. ## [v0.6.3] - 2021-05-10 -[v0.6.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.2...v0.6.3 +[v0.6.3]: https://github.com/zellerlab/GECCO/compare/v0.6.2...v0.6.3 ### Fixed - HMMER annotation not properly handling inputs with multiple contigs. - Some progress bar totals displaying as floats in the CLI. @@ -218,7 +238,7 @@ - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable. ## [v0.6.2] - 2021-05-04 -[v0.6.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.1...v0.6.2 +[v0.6.2]: https://github.com/zellerlab/GECCO/compare/v0.6.1...v0.6.2 ### Fixed - `gecco cv loto` crashing because of outdated code. ### Changed @@ -227,7 +247,7 @@ - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record. ## [v0.6.1] - 2021-03-15 -[v0.6.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.0...v0.6.1 +[v0.6.1]: https://github.com/zellerlab/GECCO/compare/v0.6.0...v0.6.1 ### Fixed - Progress bar not being disabled by `-q` flag in CLI. - Fallback to using HMM name if accession is not available in `PyHMMER`. @@ -239,7 +259,7 @@ - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`. ## [v0.6.0] - 2021-02-28 -[v0.6.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.5...v0.6.0 +[v0.6.0]: https://github.com/zellerlab/GECCO/compare/v0.5.5...v0.6.0 ### Changed - Updated internal model with a cleaned-up version of the MIBiG-2.0 Pfam-33.1/Tigrfam-15.0 embedding. @@ -250,12 +270,12 @@ protein IDs. ## [v0.5.5] - 2021-02-28 -[v0.5.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.4...v0.5.5 +[v0.5.5]: https://github.com/zellerlab/GECCO/compare/v0.5.4...v0.5.5 ### Fixed - `gecco cv` bug causing only the last fold to be written. ## [v0.5.4] - 2021-02-28 -[v0.5.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.3...v0.5.4 +[v0.5.4]: https://github.com/zellerlab/GECCO/compare/v0.5.3...v0.5.4 ### Changed - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`. ### Removed @@ -265,7 +285,7 @@ - `gecco embed` to embed BGCs into non-BGC regions using feature tables. ## [v0.5.3] - 2021-02-21 -[v0.5.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.2...v0.5.3 +[v0.5.3]: https://github.com/zellerlab/GECCO/compare/v0.5.2...v0.5.3 ### Fixed - Coordinates of genes in output GenBank files. - Potential issue with the number of CPUs in `PyHMMER.run`. @@ -273,7 +293,7 @@ - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow. ## [v0.5.2] - 2021-01-29 -[v0.5.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.1...v0.5.2 +[v0.5.2]: https://github.com/zellerlab/GECCO/compare/v0.5.1...v0.5.2 ### Added - Support for downloading HMM files directly from GitHub releases assets. - Validation of filtered HMMs with MD5 checksum. @@ -284,13 +304,13 @@ - Bump required `pyhmmer` version to `v0.2.1`. ## [v0.5.1] - 2021-01-15 -[v0.5.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.0...v0.5.1 +[v0.5.1]: https://github.com/zellerlab/GECCO/compare/v0.5.0...v0.5.1 ### Fixed - `--hmm` flag being ignored in in `gecco run` command. - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs. ## [v0.5.0] - 2021-01-11 -[v0.5.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.5...v0.5.0 +[v0.5.0]: https://github.com/zellerlab/GECCO/compare/v0.4.5...v0.5.0 ### Added - Explicit support for Python 3.9. ### Changed @@ -300,7 +320,7 @@ - `gecco cv` now requires *training* dependencies. ## [v0.4.5] - 2020-11-23 -[v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5 +[v0.4.5]: https://github.com/zellerlab/GECCO/compare/v0.4.4...v0.4.5 ### Added - Additional `fold` column to cross-validation table output. ### Changed @@ -309,7 +329,7 @@ - `gecco.orf` was rewritten to extract genes from input sequences in parallel. ## [v0.4.4] - 2020-09-30 -[v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4 +[v0.4.4]: https://github.com/zellerlab/GECCO/compare/v0.4.3...v0.4.4 ### Added - `gecco cv loto` command to run LOTO cross-validation using BGC types for stratification. @@ -325,7 +345,7 @@ - Bumped `pandas` training dependency to `v1.0`. ## [v0.4.3] - 2020-09-07 -[v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3 +[v0.4.3]: https://github.com/zellerlab/GECCO/compare/v0.4.2...v0.4.3 ### Fixed - GenBank files being written with invalid `/cds` feature type. ### Changed @@ -333,18 +353,18 @@ and breaks the current code. ## [v0.4.2] - 2020-08-07 -[v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2 +[v0.4.2]: https://github.com/zellerlab/GECCO/compare/v0.4.1...v0.4.2 ### Fixed - `TypeClassifier.predict_types` using inverse type probabilities when given several clusters to process. ## [v0.4.1] - 2020-08-07 -[v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1 +[v0.4.1]: https://github.com/zellerlab/GECCO/compare/v0.4.0...v0.4.1 ### Fixed - `gecco run` command crashing on input sequences not containing any genes. ## [v0.4.0] - 2020-08-06 -[v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0 +[v0.4.0]: https://github.com/zellerlab/GECCO/compare/v0.3.0...v0.4.0 ### Added - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC. ### Removed @@ -356,7 +376,7 @@ table to know the types of the input BGCs. ## [v0.3.0] - 2020-08-03 -[v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0 +[v0.3.0]: https://github.com/zellerlab/GECCO/compare/v0.2.2...v0.3.0 ### Changed - Replaced Nearest-Neighbours classifier with Random Forest to perform type prediction for candidate BGCs. @@ -367,7 +387,7 @@ - `--metric` argument to the `gecco run` CLI command. ## [v0.2.2] - 2020-07-31 -[v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2 +[v0.2.2]: https://github.com/zellerlab/GECCO/compare/v0.2.1...v0.2.2 ### Changed - `Domain` and `Gene` can now carry qualifiers that are used when they are translated to a sequence feature. @@ -376,7 +396,7 @@ in GenBank output files. ## [v0.2.1] - 2020-07-23 -[v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1 +[v0.2.1]: https://github.com/zellerlab/GECCO/compare/v0.2.0...v0.2.1 ### Fixed - Various potential crashes in `ClusterRefiner` code. ### Removed @@ -384,7 +404,7 @@ Fisher Exact Test feature selection. ## [v0.2.0] - 2020-07-23 -[v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0 +[v0.2.0]: https://github.com/zellerlab/GECCO/compare/v0.1.1...v0.2.0 ### Fixed - `pandas` warning about unsorted columns in `gecco run`. ### Removed @@ -397,7 +417,7 @@ contain any domain annotation. ## [v0.1.1] - 2020-07-22 -[v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1 +[v0.1.1]: https://github.com/zellerlab/GECCO/compare/v0.1.0...v0.1.1 ### Added - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`. ### Changed @@ -410,9 +430,9 @@ - Included the `CHANGELOG.md` file to the generated docs. ## [v0.1.0] - 2020-07-17 -[v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0 +[v0.1.0]: https://github.com/zellerlab/GECCO/compare/v0.0.1...v0.1.0 Initial release. ## [v0.0.1] - 2018-08-13 -[v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1 +[v0.0.1]: https://github.com/zellerlab/GECCO/compare/37afb97...v0.0.1 Proof-of-concept.
--- a/gecco.xml Wed Aug 10 12:36:38 2022 +0000 +++ b/gecco.xml Mon Jan 16 18:35:56 2023 +0000 @@ -1,8 +1,17 @@ <?xml version='1.0' encoding='utf-8'?> -<tool id="gecco" name="GECCO" version="0.9.1" python_template_version="3.5"> +<tool id="gecco" name="GECCO" version="0.9.6" python_template_version="3.5"> <description>is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).</description> + <creator> + <organization name="Zeller Team" url="https://www.embl.org/groups/zeller/"/> + </creator> + <edam_topics> + <edam_topic>topic_0080</edam_topic> + </edam_topics> + <edam_operations> + <edam_operation>operation_0415</edam_operation> + </edam_operations> <requirements> - <requirement type="package" version="0.9.1">gecco</requirement> + <requirement type="package" version="0.9.6">gecco</requirement> </requirements> <version_command>gecco --version</version_command> <command detect_errors="aggressive"><![CDATA[ @@ -34,6 +43,9 @@ #if $antismash_sideload: --antismash-sideload #end if + #unless $pad: + --no-pad + #end unless && mv input_tempfile.genes.tsv '$genes' && mv input_tempfile.features.tsv '$features' @@ -46,6 +58,7 @@ <inputs> <param name="input" type="data" format="genbank,fasta,embl" label="Sequence file in GenBank, EMBL or FASTA format"/> <param argument="--mask" type="boolean" checked="false" label="Enable masking of regions with unknown nucleotides when finding ORFs"/> + <param argument="--pad" type="boolean" checked="true" label="Enable padding of gene sequences smaller than the CRF window length"/> <param argument="--cds" type="integer" min="0" value="" optional="true" label="Minimum number of genes required for a cluster"/> <param argument="--threshold" type="float" min="0" max="1" value="" optional="true" label="Probability threshold for cluster detection"/> <param argument="--postproc" type="select" label="Post-processing method for gene cluster validation"> @@ -72,10 +85,10 @@ <output name="features" file="features.tsv"/> <output name="genes" file="genes.tsv"/> <output name="clusters" file="clusters.tsv"/> + <param name="edge_distance" value="10"/> </test> <test> <param name="input" value="BGC0001866.fna"/> - <param name="edge_distance" value="0"/> <output name="features" file="features.tsv"/> <output name="genes" file="genes.tsv"/> <output name="clusters" file="clusters.tsv"/> @@ -86,7 +99,6 @@ <test> <param name="input" value="BGC0001866.fna"/> <param name="antismash_sideload" value="True"/> - <param name="edge_distance" value="0"/> <output name="features" file="features.tsv"/> <output name="genes" file="genes.tsv"/> <output name="clusters" file="clusters.tsv"/>
--- a/test-data/BGC0001866.1_cluster_1.gbk Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/BGC0001866.1_cluster_1.gbk Mon Jan 16 18:35:56 2023 +0000 @@ -1,4 +1,4 @@ -LOCUS BGC0001866.1_cluster_1 32633 bp DNA linear UNK 06-APR-2022 +LOCUS BGC0001866.1_cluster_1 32633 bp DNA linear UNK 16-JAN-2023 DEFINITION BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome Unknown C8Q69scaffold_14, whole genome shotgun sequence. ACCESSION BGC0001866.1_cluster_1 @@ -15,19 +15,19 @@ JOURNAL bioRxiv (2021.05.03.442509) REMARK doi:10.1101/2021.05.03.442509 COMMENT ##GECCO-Data-START## - version :: GECCO v0.9.1 - creation_date :: 2022-04-06T01:08:36.965708 - biosyn_class :: Polyketide - alkaloid_probability :: 0.010000000000000009 - polyketide_probability :: 0.96 - ripp_probability :: 0.0 - saccharide_probability :: 0.0 - terpene_probability :: 0.010000000000000009 - nrp_probability :: 0.14 + version :: GECCO v0.9.6 + creation_date :: 2023-01-16T17:20:45.175113 + cluster_type :: Polyketide + alkaloid_probability :: 0.010 + nrp_probability :: 0.140 + polyketide_probability :: 0.960 + ripp_probability :: 0.000 + saccharide_probability :: 0.000 + terpene_probability :: 0.010 ##GECCO-Data-END## FEATURES Location/Qualifiers CDS complement(1..1143) - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_1" /translation="MWIYEVDGHYIEPRRADTFLIWAGERYSAMIRLDKKPMDYSIRVP @@ -37,98 +37,134 @@ QPESFNMVNPPYRDTFLTEFTGAMWVVLRYQVTSPGAWLLHCHFEMHLDNGMAMAILDG VDKWPHVPPEYTQGFHGFREHELPGPAGFWGLVSKILRPESLVWAGGAAVVLLSLFIGG LWRLWQRRMQGTYYVLSQEDERDRFSMDKEAWKSEETKRM*" - misc_feature 1..189 + /function="binding" + /function="catalytic activity" + /colour="129 14 21" + /ApEinfo_fwdcolor="#810e15" + /ApEinfo_revcolor="#810e15" + misc_feature 955..1143 /inference="protein motif" /db_xref="PFAM:PF00394" /db_xref="InterPro:IPR001117" /note="e-value: 2.262067179461254e-08" /note="p-value: 8.178117062405111e-12" - /function="Multicopper oxidase" + /function="Multicopper oxidase, type 1" /standard_name="PF00394" - misc_feature 448..843 + misc_feature 301..696 /inference="protein motif" /db_xref="PFAM:PF07731" /db_xref="InterPro:IPR011706" + /db_xref="GO:0005507" + /db_xref="GO:0016491" /note="e-value: 4.059222969454281e-23" /note="p-value: 1.467542649838858e-26" - /function="Multicopper oxidase" + /function="Multicopper oxidase, C-terminal" /standard_name="PF07731" CDS 1179..1670 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_2" /translation="MSSLRSSSHSPSGLPGQPRLPLLDRSREHSLPGDRAGWRTRSRLR ATDLLSMVRMGSTYTIIRDMNYTDDESPGRSPFVCDSVIRPALVHERDLLVNKPLMART IDAPFAVEKNTIDATDFISQSTRNVLISVHWNHTRSAVGCLHLLLYTGSSCSSPSQKAS *" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" CDS complement(2167..2376) - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_3" /translation="MPAYLLLLACNVLLVLGAHVQRELVLTWEEGAPNGQSRQMIKTNG QFPSPTLIFDEGDDVEVGGISFAN*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" CDS 2559..3032 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_4" /translation="MLFNSEVGVEEHVVLWSFQETTSITMAEEIKLTPLETFAQAISAS AKTIATYCRDSGHPQLSDDNSSGLTGDVLPPSAPQAVTAARQTILEASYRLQQLVTEPS QYLPRLTVYVSVEQSPMKDQTNDRKAPAPGCLTLAVPFQNPGAHPRARHQDIL*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" CDS 3007..3576 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_5" /translation="MQGTRTYYELATEAKVPLHQLQSIARMAITGSFLREPEPNIVAHS RTSAHFVENPSLRDWTLFLAEDTAPMAMKLVEATEKWGDTRSKTETAFNLALGTDLAFF KYLSSNPQFTQKFSGYMKNVTASEGTSIKHLVNGFDWASLGNAIVVDVRLQSSFTPYRS HTDVIFYRLAVLLVMQALLSRNRSPI*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" CDS 3600..4043 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_6" /translation="MVTSTSKDNREKTPLPETVASRISFESHDFFKPQPVQNADVYLLR MILHDWSFKEAGEILANLVPSVKQGARILIMDTVLPRHGTVPVTEEALLRVRDMTMMET FNSHEREIDEWKDLIQGVHTGLRVQQVIQPAGSSMAIIEVVRG*" + /function="catalytic activity" + /colour="129 14 21" + /ApEinfo_fwdcolor="#810e15" + /ApEinfo_revcolor="#810e15" misc_feature 3648..3962 /inference="protein motif" /db_xref="PFAM:PF00891" /db_xref="InterPro:IPR001077" + /db_xref="GO:0008171" /note="e-value: 4.890642309934635e-16" /note="p-value: 1.7681280946979883e-19" /function="O-methyltransferase domain" /standard_name="PF00891" CDS 4337..4792 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_7" /translation="MTQIVFGIAPTLLKTFSHLTALDLWRPSAPYVFDPVTSSTYLGTI ADGVEEFLGIFYGQDTGGSNRFAPPKPYIPSRHSFINASTAGAACPQPYVPLPADPYTV LTNVSEDCLSLRIARPENTKSTAKLPVMVWLYGGAYNRLPTDLQWET*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 4478..4756 /inference="protein motif" /db_xref="PFAM:PF00135" /db_xref="InterPro:IPR002018" /note="e-value: 4.819217021121008e-21" /note="p-value: 1.7423055029360116e-24" - /function="Carboxylesterase family" + /function="Carboxylesterase, type B" /standard_name="PF00135" CDS 5038..5466 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_8" /translation="MQDQRLGIEWIKNHISAFGGDPDNITLFGEDEGATYIALHILSNH EVPFHRAILQSGAAITHHDVNGNRSARNFAAVAARCNCLSDGDRQVDSQDTVDCLRRVP MEDLVNATFEVAHSVDPVNGFRALYVLLHFPSHKCKQD*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 5041..5379 /inference="protein motif" /db_xref="PFAM:PF00135" /db_xref="InterPro:IPR002018" /note="e-value: 4.0935350990176556e-30" /note="p-value: 1.4799476135277136e-33" - /function="Carboxylesterase family" + /function="Carboxylesterase, type B" /standard_name="PF00135" CDS 5477..6253 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_9" /translation="MPAVDGYMIPDEPSNLLSRGQVPANISILAGWTRDESSMSVPTSI @@ -136,16 +172,20 @@ LTLTCPTIFQAWSLRLSSNCTTPVYLYELRQSPFATALNNSGVGYLGIVHFSDVPYVFN ELERTYYITDPEENKLAQRMSASWTAFASGAFPLCERSERSLGRWEEAYGGDRVCRDRM PEHVRVKGIGDNGDQDDGDEIGKLMARCGFINRLEY*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 5480..6103 /inference="protein motif" /db_xref="PFAM:PF00135" /db_xref="InterPro:IPR002018" /note="e-value: 1.4624647008379705e-15" /note="p-value: 5.287291037013632e-19" - /function="Carboxylesterase family" + /function="Carboxylesterase, type B" /standard_name="PF00135" CDS 7412..8683 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_10" /translation="MTGARFDESDHKWTVEGINGSHGTIRIRCRWYILALGFASKPYIP @@ -156,39 +196,56 @@ FDTNTGALTSIHIQDTDGILLKDRWSYDGVMTTFGMSTSKFPNMFFFYGPQAPTAFSNG PSCIELQGEFVEELILDMIGKGVTRVDTTSEAEKRWKESTLSLWNQFVFSSTKGFYTGE NIPGKKAEPLNWYVLVLGLGVSKR*" + /function="binding" + /function="catalytic activity" + /colour="129 14 21" + /ApEinfo_fwdcolor="#810e15" + /ApEinfo_revcolor="#810e15" misc_feature 7448..7783 /inference="protein motif" /db_xref="PFAM:PF13434" /db_xref="InterPro:IPR025700" /note="e-value: 5.955898730893757e-08" /note="p-value: 2.153253337271785e-11" - /function="L-lysine 6-monooxygenase (NADPH-requiring)" + /function="L-lysine 6-monooxygenase/L-ornithine + 5-monooxygenase" /standard_name="PF13434" misc_feature 7517..7717 /inference="protein motif" /db_xref="PFAM:PF00743" /db_xref="InterPro:IPR020946" + /db_xref="GO:0004499" + /db_xref="GO:0050660" + /db_xref="GO:0050661" /note="e-value: 5.246542281818287e-07" /note="p-value: 1.8967976434628658e-10" - /function="Flavin-binding monooxygenase-like" + /function="Flavin monooxygenase-like" /standard_name="PF00743" CDS 9454..10038 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_11" /translation="MCRGRLTRTVDERGIVSTESAHAAQRHHLASHVLDARFAGSIARL GSLCLFLALLVAFVQELQKSESHHQRSGGVGLEDRRVVREGLVKPVVTHFGYVPFRRRS CGMGSQVRCGDSSVIHQEVDIPILGGDVVDDALKVSMRGNAALDRVDVAMGLSQIVSTI VIALWTWFVLNQPTAWLLLVRARAVARVCRL*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" CDS 10763..11191 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_12" /translation="MRAGQLVPLVSTPTPSCLALQIVFCCCSTFLSDPLVLQNHRKMAD EQKTPLESGQQPAVAQHTSTAELQTEKPGQMNGNGTADKPGPPGGKPFGPGMGPPIQYP TGFKLYSIMTGLYLASFLTALVGWRSITDLTDSETYIG*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" CDS 11204..12316 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_13" /translation="MLVVAIPQITDHFNSIDDIGWYGSAYLLTFCAFQLLFGKIYSFYN @@ -198,31 +255,43 @@ FVGVQLWLQDKGTIPPRVMKQRSIAAGMAFTICVTAGFMSFNYYLPIWFQAIKNASSFH SGVMMLPTVISSGVASLACGFIIHRVGYYTPFMIGGSVLMAIGAGLLTTFTPTTEHPKW IGYQVLWALGCGMSTFQPPFFARCIFVGGY*" + /function="transporter activity" + /colour="100 149 237" + /ApEinfo_fwdcolor="#6495ed" + /ApEinfo_revcolor="#6495ed" misc_feature 11204..12289 /inference="protein motif" /db_xref="PFAM:PF07690" /db_xref="InterPro:IPR011701" + /db_xref="GO:0022857" + /db_xref="GO:0055085" /note="e-value: 6.020530714201243e-37" /note="p-value: 2.1766199255969786e-40" - /function="Major Facilitator Superfamily" + /function="Major facilitator superfamily" /standard_name="PF07690" misc_feature 11252..11935 /inference="protein motif" /db_xref="PFAM:PF06609" /db_xref="InterPro:IPR010573" + /db_xref="GO:0022857" + /db_xref="GO:0055085" /note="e-value: 9.83839354265682e-09" /note="p-value: 3.55690294383833e-12" - /function="Fungal trichothecene efflux pump (TRI12)" + /function="Major facilitator transporter Str1/Tri12-like" /standard_name="PF06609" CDS 12335..12781 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_14" /translation="MQQASLAAQTVLPKPDAPIGISLIFFSQSLGGSVFLAVDDSIYSN RLAAKLGSIPNLPQSALTNTGATNIRNLVAPQYLGRLLGGYNDALMDVFRVAVASSCAC VVAAAFMEWKNVRAAKAAGPGGPGGPGGPGGPGGPEGLRGGNKV*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" CDS 14574..15566 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_15" /translation="MTFEEMLSRPSPPPFAGPSHNSNRPTNMASTNQDQYYHDKGKHGE @@ -231,16 +300,24 @@ ENGSHPTTDQVLKANSDAMKDAADLLACPCAKDFCFPIILGITACRVLAWYQVVIDMYD PEIPMATMPTAREDIKHCPIAFGAYQLDEEVSQAMTSQFVLRNLRAMTRFVKTYVENFC SDINKNRPGSCSLIYRSLGTFMQTRLGNTIEQLEDRLAAFDGEYTKNIG*" + /function="binding" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 14988..15245 /inference="protein motif" /db_xref="PFAM:PF08493" /db_xref="InterPro:IPR013700" + /db_xref="GO:0003677" + /db_xref="GO:0005634" + /db_xref="GO:0006355" + /db_xref="GO:0045122" /note="e-value: 2.686865976406516e-17" /note="p-value: 9.713904470016327e-21" /function="Aflatoxin regulatory protein" /standard_name="PF08493" CDS 16827..18797 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_16" /translation="MAICGIAVRLPGGISNDAQLWDFLLAKRDARSQVPGSRYNISGYH @@ -255,13 +332,17 @@ GAQWPGMGVELFKSNATFRRSILEMDSVLQSLPDAPAWSIADEISKEHQTSMLYLSSYS QPICTALQVALVNTLFELNIRPYAVIGHSSGELAAAYAAGRLTASQAVTLAYYRGIVAG KVAQAGCMAAVGMGASEIIHF*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 16830..17570 /inference="protein motif" /db_xref="PFAM:PF00109" /db_xref="InterPro:IPR014030" /note="e-value: 9.30510909096118e-60" /note="p-value: 3.364103069761815e-63" - /function="Beta-ketoacyl synthase, N-terminal domain" + /function="Beta-ketoacyl synthase, N-terminal" /standard_name="PF00109" misc_feature 17595..17930 /inference="protein motif" @@ -269,7 +350,7 @@ /db_xref="InterPro:IPR014031" /note="e-value: 2.2857331200304854e-35" /note="p-value: 8.263677223537547e-39" - /function="Beta-ketoacyl synthase, C-terminal domain" + /function="Beta-ketoacyl synthase, C-terminal" /standard_name="PF02801" misc_feature 17937..18290 /inference="protein motif" @@ -277,7 +358,7 @@ /db_xref="InterPro:IPR032821" /note="e-value: 4.800730099641783e-25" /note="p-value: 1.7356218726109122e-28" - /function="Ketoacyl-synthetase C-terminal extension" + /function="Polyketide synthase, C-terminal extension" /standard_name="PF16197" misc_feature 18360..18770 /inference="protein motif" @@ -285,10 +366,10 @@ /db_xref="InterPro:IPR014043" /note="e-value: 1.113401436161595e-26" /note="p-value: 4.025312495161225e-30" - /function="Acyl transferase domain" + /function="Acyl transferase" /standard_name="PF00698" CDS 18806..22078 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_17" /translation="MVVACENSPSSVTISGDIDQVQYVMQEISLAHPEILCRQIKSDTA @@ -310,13 +391,17 @@ WVTRSIQIDCRDPRYSPTLGVARTVRSEFGLDFGTCEVDTLKYTSIGLVIDVFEAFHGR RHGQNAYPEYEYAIREDTVHIGRLSSFSVQEELRRIQKAHVETKDNRISLVAGTSGFDS LAWQADAGQQVQLLGDDEVELQVDTAGVNFLVRCSFQFQGES*" + /function="catalytic activity" + /colour="129 14 21" + /ApEinfo_fwdcolor="#810e15" + /ApEinfo_revcolor="#810e15" misc_feature 18809..19258 /inference="protein motif" /db_xref="PFAM:PF00698" /db_xref="InterPro:IPR014043" /note="e-value: 2.7208690154402465e-16" /note="p-value: 9.836836642950999e-20" - /function="Acyl transferase domain" + /function="Acyl transferase" /standard_name="PF00698" misc_feature 19487..20317 /inference="protein motif" @@ -324,14 +409,13 @@ /db_xref="InterPro:IPR020807" /note="e-value: 2.598574865139864e-60" /note="p-value: 9.394703055458656e-64" - /function="Polyketide synthase dehydratase" + /function="Polyketide synthase, dehydratase domain" /standard_name="PF14765" misc_feature 20786..21256 /inference="protein motif" /db_xref="PFAM:PF13489" /note="e-value: 1.04446701072283e-12" /note="p-value: 3.776091868123029e-16" - /function="Methyltransferase domain" /standard_name="PF13489" misc_feature 20801..21133 /inference="protein motif" @@ -347,7 +431,7 @@ /db_xref="InterPro:IPR041698" /note="e-value: 2.4253465299984994e-13" /note="p-value: 8.76842563267715e-17" - /function="Methyltransferase domain" + /function="Methyltransferase domain 25" /standard_name="PF13649" misc_feature 20807..21103 /inference="protein motif" @@ -355,33 +439,38 @@ /db_xref="InterPro:IPR013217" /note="e-value: 3.7410690716593694e-22" /note="p-value: 1.3525195486837923e-25" - /function="Methyltransferase domain" + /function="Methyltransferase type 12" /standard_name="PF08242" misc_feature 20807..21106 /inference="protein motif" /db_xref="PFAM:PF08241" /db_xref="InterPro:IPR013216" + /db_xref="GO:0008168" /note="e-value: 5.4075572021556884e-12" /note="p-value: 1.9550098344742185e-15" - /function="Methyltransferase domain" + /function="Methyltransferase type 11" /standard_name="PF08241" CDS 22416..22889 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_18" /translation="MQTVLINSASDGVGLAAIQISKMIGATIYATVIGEDKVEYLTASH GIPRDHIFNSRDSSFLDGIMRVTNGRGVDLVLTSLSADFIQASCDCVANFGKLVNLSKP TAANQGQFPIDSFHPNMSYASVDIIDYIKRRPKESKRYVITFRHSYQLCPACN*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 22449..22766 /inference="protein motif" /db_xref="PFAM:PF00107" /db_xref="InterPro:IPR013149" /note="e-value: 1.1299405916297285e-15" /note="p-value: 4.085106983476965e-19" - /function="Zinc-binding dehydrogenase" + /function="Alcohol dehydrogenase-like, C-terminal" /standard_name="PF00107" CDS 22922..24277 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_19" /translation="MELYKQGHIQPITPVKTFTATDIRQCFDYMQSGQHIGQLRLSLKS @@ -392,13 +481,17 @@ QTLDYSRYENPAQFITGLRDTTGMLDSTGGKSMLLDSRLAAYVGNSAAVTAPTETKTSA NKLNNFVSSAATDSAILSEPSATQFVSLEIARWVFDLLMKPVDDDSEIDLSRSLVDVGL DSLAAVEMRSWLKSSLGLDISVLEIMASPSLAAMGEHVIRELVRKFGGDNKN*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 23114..23638 /inference="protein motif" /db_xref="PFAM:PF08659" /db_xref="InterPro:IPR013968" /note="e-value: 1.5610077818520667e-61" /note="p-value: 5.643556695054471e-65" - /function="KR domain" + /function="Polyketide synthase, ketoreductase domain" /standard_name="PF08659" misc_feature 23123..23584 /inference="protein motif" @@ -406,7 +499,7 @@ /db_xref="InterPro:IPR002347" /note="e-value: 1.1731018314976082e-07" /note="p-value: 4.2411490654288077e-11" - /function="short chain dehydrogenase" + /function="Short-chain dehydrogenase/reductase SDR" /standard_name="PF00106" misc_feature 24071..24232 /inference="protein motif" @@ -414,25 +507,28 @@ /db_xref="InterPro:IPR009081" /note="e-value: 3.463550267794435e-10" /note="p-value: 1.2521873708584363e-13" - /function="Phosphopantetheine attachment site" + /function="Phosphopantetheine binding ACP domain" /standard_name="PF00550" CDS 25423..25710 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_20" /translation="MAQKLRFYLFGDQTYDYDEQLRALLTSHDPVVRSFLERAYYTLRA EVARIPNGYQARISRFSSIAELLSQRREHGVDASLEQALTVVYQLASFMR*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 25444..25704 /inference="protein motif" /db_xref="PFAM:PF16073" /db_xref="InterPro:IPR032088" /note="e-value: 9.422238725791962e-24" /note="p-value: 3.406449286258844e-27" - /function="Starter unit:ACP transacylase in aflatoxin - biosynthesis" + /function="Starter unit:ACP transacylase" /standard_name="PF16073" CDS 26198..29653 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_21" /translation="MSRPYISAYASGGVTISGPPSVLAELRNTPGLSKLRAKDIPIHAP @@ -455,14 +551,17 @@ GMVKSTLGNSIKALPTLQRNRNTWEVLTESVSTLYCMGFDINWTEYHRDFPSSQRVLRL PSYSWDLKSYWIPYRNDWTLYKGDIVPESSIALPTHQNKPHSTSPKQQAPTPILETTTL HRIVDEKSTEGTFSITCESDVSRPDLSPLVQGHKVEGIGLCTPV*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 26201..26338 /inference="protein motif" /db_xref="PFAM:PF16073" /db_xref="InterPro:IPR032088" /note="e-value: 4.380197593141013e-11" /note="p-value: 1.5835855362042708e-14" - /function="Starter unit:ACP transacylase in aflatoxin - biosynthesis" + /function="Starter unit:ACP transacylase" /standard_name="PF16073" misc_feature 26729..27475 /inference="protein motif" @@ -470,7 +569,7 @@ /db_xref="InterPro:IPR014030" /note="e-value: 2.7499815692371726e-82" /note="p-value: 9.942088102809735e-86" - /function="Beta-ketoacyl synthase, N-terminal domain" + /function="Beta-ketoacyl synthase, N-terminal" /standard_name="PF00109" misc_feature 27497..27862 /inference="protein motif" @@ -478,7 +577,7 @@ /db_xref="InterPro:IPR014031" /note="e-value: 2.4774456171918303e-34" /note="p-value: 8.956780973217029e-38" - /function="Beta-ketoacyl synthase, C-terminal domain" + /function="Beta-ketoacyl synthase, C-terminal" /standard_name="PF02801" misc_feature 27896..28216 /inference="protein motif" @@ -486,7 +585,7 @@ /db_xref="InterPro:IPR032821" /note="e-value: 8.475099126640419e-07" /note="p-value: 3.0640271607521397e-10" - /function="Ketoacyl-synthetase C-terminal extension" + /function="Polyketide synthase, C-terminal extension" /standard_name="PF16197" misc_feature 28322..29233 /inference="protein motif" @@ -494,10 +593,10 @@ /db_xref="InterPro:IPR014043" /note="e-value: 4.739349423268586e-38" /note="p-value: 1.7134307387088164e-41" - /function="Acyl transferase domain" + /function="Acyl transferase" /standard_name="PF00698" CDS 29804..30544 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_22" /translation="MVIEKALMPLNAGPQLLRVTASLIWSEKEASVRFYSVDVRRPSSK @@ -505,16 +604,20 @@ YRFNGPMAYNMVQALAEFHPDYRCIDETILDNETLEAACTVSFGNVKKEGVFHTHPGYI DGLTQSGGFVMNANDKTNLGVEVFVNHGWDSFQLYEPVTDDRSYQTHVRMRPAESNQWK GDVVVLSGENLVACVRGLTVSRET*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 29918..30535 /inference="protein motif" /db_xref="PFAM:PF14765" /db_xref="InterPro:IPR020807" /note="e-value: 8.019334685871699e-11" /note="p-value: 2.8992533209948296e-14" - /function="Polyketide synthase dehydratase" + /function="Polyketide synthase, dehydratase domain" /standard_name="PF14765" CDS 30591..32633 - /inference="ab initio prediction:Prodigal:2.6" + /inference="ab initio prediction:Pyrodigal:2.0.4" /transl_table=11 /locus_tag="BGC0001866.1_23" /translation="MLTTFQIQGVPRRVLRYILQSSAKTTQTATSSVPAPSQAPVMVPQ @@ -529,13 +632,17 @@ LAYRAAQILQKAAANPQKPVVESLLLLDSPPPTGLGKLPKHFFDYCDQIGIFGQGTAKA PEWLITHFQGTNSVLHEYHATPFSFGTAPRTGIIWASQTVFETRAVAPPPVRPDDTEDM KFLTERRTDFSAGSWGHMFPGTEVLIETAYGADHFSLLVSLLFRD*" + /function="unknown" + /colour="128 128 128" + /ApEinfo_fwdcolor="#808080" + /ApEinfo_revcolor="#808080" misc_feature 30789..30974 /inference="protein motif" /db_xref="PFAM:PF00550" /db_xref="InterPro:IPR009081" /note="e-value: 6.066413293337807e-14" /note="p-value: 2.193207987468477e-17" - /function="Phosphopantetheine attachment site" + /function="Phosphopantetheine binding ACP domain" /standard_name="PF00550" misc_feature 31110..31304 /inference="protein motif" @@ -543,7 +650,7 @@ /db_xref="InterPro:IPR009081" /note="e-value: 4.042537132792419e-10" /note="p-value: 1.461510170930014e-13" - /function="Phosphopantetheine attachment site" + /function="Phosphopantetheine binding ACP domain" /standard_name="PF00550" misc_feature 31485..31670 /inference="protein motif" @@ -551,15 +658,16 @@ /db_xref="InterPro:IPR009081" /note="e-value: 1.4101442109719659e-08" /note="p-value: 5.098135252971677e-12" - /function="Phosphopantetheine attachment site" + /function="Phosphopantetheine binding ACP domain" /standard_name="PF00550" misc_feature 31917..32240 /inference="protein motif" /db_xref="PFAM:PF00975" /db_xref="InterPro:IPR001031" + /db_xref="GO:0009058" /note="e-value: 6.91897478936856e-24" /note="p-value: 2.5014370171252933e-27" - /function="Thioesterase domain" + /function="Thioesterase" /standard_name="PF00975" ORIGIN 1 ttacatccgc ttagtctcct cggacttcca tgcttccttg tccattgaga aacgatccct
--- a/test-data/clusters.tsv Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/clusters.tsv Mon Jan 16 18:35:56 2023 +0000 @@ -1,2 +1,2 @@ -sequence_id bgc_id start end average_p max_p type alkaloid_probability polyketide_probability ripp_probability saccharide_probability terpene_probability nrp_probability proteins domains -BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9958958770931704 0.9999999976946022 Polyketide 0.010000000000000009 0.96 0.0 0.0 0.010000000000000009 0.14 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197 +sequence_id cluster_id start end average_p max_p type alkaloid_probability nrp_probability polyketide_probability ripp_probability saccharide_probability terpene_probability proteins domains +BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9958958770931705 0.9999999976946022 Polyketide 0.010000000000000009 0.14 0.96 0.0 0.0 0.010000000000000009 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
--- a/test-data/features.tsv Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/features.tsv Mon Jan 16 18:35:56 2023 +0000 @@ -1,4 +1,4 @@ -sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end bgc_probability +sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end cluster_probability BGC0001866.1 BGC0001866.1_1 347 1489 - PF00394 Pfam 2.262067179461254e-08 8.178117062405111e-12 1 63 0.9791890143072265 BGC0001866.1 BGC0001866.1_1 347 1489 - PF07731 Pfam 4.059222969454281e-23 1.467542649838858e-26 150 281 0.9791890143072265 BGC0001866.1 BGC0001866.1_6 3946 4389 + PF00891 Pfam 4.890642309934635e-16 1.7681280946979883e-19 17 121 0.9955095513800687
--- a/test-data/sideload.json Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/sideload.json Mon Jan 16 18:35:56 2023 +0000 @@ -27,11 +27,13 @@ "e-filter": "None", "edge-distance": "0", "mask": "False", + "no-pad": "False", + "p-filter": "1e-09", "postproc": "'gecco'", "threshold": "0.8" }, "description": "Biosynthetic Gene Cluster prediction with Conditional Random Fields.", "name": "GECCO", - "version": "0.9.1" + "version": "0.9.6" } } \ No newline at end of file