changeset 19:cc91d730cc4f draft

Fix syntax of Galaxy script for GECCO
author althonos
date Mon, 16 Jan 2023 18:35:56 +0000
parents 3dd71eaa2909
children 64b724dd8d04
files CHANGELOG.md gecco.xml test-data/BGC0001866.1_cluster_1.gbk test-data/clusters.tsv test-data/features.tsv test-data/sideload.json
diffstat 6 files changed, 267 insertions(+), 125 deletions(-) [+]
line wrap: on
line diff
--- a/CHANGELOG.md	Wed Aug 10 12:36:38 2022 +0000
+++ b/CHANGELOG.md	Mon Jan 16 18:35:56 2023 +0000
@@ -5,11 +5,31 @@
 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
 ## [Unreleased]
-[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.5...master
+[Unreleased]: https://github.com/zellerlab/GECCO/compare/v0.9.6...master
+
+
+## [v0.9.6] - 2023-01-11
+[v0.9.6]: https://github.com/zellerlab/GECCO/compare/v0.9.5...v0.9.6
+
+### Added
+- Gene Ontology annotations to `gecco.interpro` local metadata.
+- Reference to Gene Ontology terms and derived functions to `gecco.model.Domain` objects.
+- Gene color based on predicted function in `gecco.model.Gene.to_seq_feature`.
+
+### Fixed
+- Missing `gzip` import in the CLI preventing usage of gzip-compressed inputs.
+- Invalid coordinates of domains found in reverse-strand genes.
+- Detection of entry points with `importlib.metadata` on older Python versions.
+
+### Changed
+- `bgc_id` columns of cluster tables are renamed `cluster_id`.
+- `gecco.model.ProductType` is renamed to `gecco.model.ClusterType`.
+- Bumped `pyrodigal` dependency to `v2.0`.
+- Bumped `pyhmmer` dependency to `v0.7`.
 
 
 ## [v0.9.5] - 2022-08-10
-[v0.9.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.4...v0.9.5
+[v0.9.5]: https://github.com/zellerlab/GECCO/compare/v0.9.4...v0.9.5
 
 ### Added
 - `gecco predict` command to predict BGCs from an annotated genome.
@@ -21,7 +41,7 @@
 
 
 ## [v0.9.4] - 2022-05-31
-[v0.9.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.3...v0.9.4
+[v0.9.4]: https://github.com/zellerlab/GECCO/compare/v0.9.3...v0.9.4
 
 ### Added
 - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`.
@@ -39,7 +59,7 @@
 
 
 ## [v0.9.3] - 2022-05-13
-[v0.9.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.2...v0.9.3
+[v0.9.3]: https://github.com/zellerlab/GECCO/compare/v0.9.2...v0.9.3
 
 ### Changed
 - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`.
@@ -49,20 +69,20 @@
 
 
 ## [v0.9.2] - 2022-04-11
-[v0.9.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1...v0.9.2
+[v0.9.2]: https://github.com/zellerlab/GECCO/compare/v0.9.1...v0.9.2
 
 ### Added
 - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`.
 
 ## [v0.9.1] - 2022-04-05
-[v0.9.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha4...v0.9.1
+[v0.9.1]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha4...v0.9.1
 
 ### Changed
 - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window.
 - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`.
 
 ## [v0.9.1-alpha4] - 2022-03-31
-[v0.9.1-alpha4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4
+[v0.9.1-alpha4]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4
 
 Retrain internal model with:
 ```
@@ -74,7 +94,7 @@
 ```
 
 ## [v0.9.1-alpha3] - 2022-03-23
-[v0.9.1-alpha3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3
+[v0.9.1-alpha3]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3
 
 ### Added
 - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains.
@@ -85,33 +105,33 @@
 - `gecco train` expects a gene table instead of a GFF file for the gene coordinates.
 
 ## [v0.9.1-alpha2] - 2022-03-23
-[v0.9.1-alpha2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2
+[v0.9.1-alpha2]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2
 
 ### Fixed
 - `TypeClassifier.trained` not being able to read unknown types from type tables.
 
 ## [v0.9.1-alpha1] - 2022-03-20
-[v0.9.1-alpha1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.10...v0.9.1-alpha1
+[v0.9.1-alpha1]: https://github.com/zellerlab/GECCO/compare/v0.8.10...v0.9.1-alpha1
 Candidate release with support for a sliding window in the CRF prediction algorithm.
 
 ## [v0.8.10] - 2022-02-23
-[v0.8.10]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.9...v0.8.10
+[v0.8.10]: https://github.com/zellerlab/GECCO/compare/v0.8.9...v0.8.10
 ### Fixed
 - `--antismash-sideload` flag of `gecco run` causing command to crash.
 
 ## [v0.8.9] - 2022-02-22
-[v0.8.9]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.8...v0.8.9
+[v0.8.9]: https://github.com/zellerlab/GECCO/compare/v0.8.8...v0.8.9
 ### Removed
 - Prediction and support for the *Other* biosynthetic type of MIBiG clusters.
 
 ## [v0.8.8] - 2022-02-21
-[v0.8.8]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.7...v0.8.8
+[v0.8.8]: https://github.com/zellerlab/GECCO/compare/v0.8.7...v0.8.8
 ### Fixed
 - `ClusterRefiner` filtering method for edge genes not working as intended.
 - `gecco run` and `gecco annotate` commands crashing on missing input files instead of nicely rendering the error.
 
 ## [v0.8.7] - 2022-02-18
-[v0.8.7]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.6...v0.8.7
+[v0.8.7]: https://github.com/zellerlab/GECCO/compare/v0.8.6...v0.8.7
 ### Fixed
 - `interpro.json` metadata file not being included in distribution files.
 - Missing docstring for `Protein.with_domains` method.
@@ -119,7 +139,7 @@
 - Bump minimum `scikit-learn` version to `v1.0` for Python3.7+.
 
 ## [v0.8.6] - 2022-02-17 - YANKED
-[v0.8.6]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.5...v0.8.6
+[v0.8.6]: https://github.com/zellerlab/GECCO/compare/v0.8.5...v0.8.6
 ### Added
 - CLI flag for enabling region masking for contigs processed by Prodigal.
 - CLI flag for controlling region distance used for edge distance filtering.
@@ -133,12 +153,12 @@
 - Progress bar messages are now in consistent format.
 
 ## [v0.8.5] - 2021-11-21
-[v0.8.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.4...v0.8.5
+[v0.8.5]: https://github.com/zellerlab/GECCO/compare/v0.8.4...v0.8.5
 ### Added
 - Minimal compatibility support for running GECCO inside of Galaxy workflows.
 
 ## [v0.8.4] - 2021-09-26
-[v0.8.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...v0.8.4
+[v0.8.4]: https://github.com/zellerlab/GECCO/compare/v0.8.3-post1...v0.8.4
 ### Fixed
 - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)).
 - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input.
@@ -146,17 +166,17 @@
 - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported.
 
 ## [v0.8.3-post1] - 2021-08-23
-[v0.8.3-post1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3...v0.8.3-post1
+[v0.8.3-post1]: https://github.com/zellerlab/GECCO/compare/v0.8.3...v0.8.3-post1
 ### Fixed
 - Wrong default value for `--threshold` being shown in `gecco run` help message.
 
 ## [v0.8.3] - 2021-08-23
-[v0.8.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.2...v0.8.3
+[v0.8.3]: https://github.com/zellerlab/GECCO/compare/v0.8.2...v0.8.3
 ### Changed
 - Default probability threshold for segmentation to 0.3 (from 0.4).
 
 ## [v0.8.2] - 2021-07-31
-[v0.8.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.1...v0.8.2
+[v0.8.2]: https://github.com/zellerlab/GECCO/compare/v0.8.1...v0.8.2
 ### Fixed
 - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class.
 ### Changed
@@ -164,7 +184,7 @@
 - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier.
 
 ## [v0.8.1] - 2021-07-29
-[v0.8.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.0...v0.8.1
+[v0.8.1]: https://github.com/zellerlab/GECCO/compare/v0.8.0...v0.8.1
 ### Changed
 - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`.
 ### Fixed
@@ -173,7 +193,7 @@
 - Missing documentation for the `strand` attribute of `gecco.model.Gene`.
 
 ## [v0.8.0] - 2021-07-03
-[v0.8.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.7.0...v0.8.0
+[v0.8.0]: https://github.com/zellerlab/GECCO/compare/v0.7.0...v0.8.0
 ### Changed
 - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0.
 - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling.
@@ -195,7 +215,7 @@
 - Tigrfam domains, which is not improving performance on the new training data.
 
 ## [v0.7.0] - 2021-05-31
-[v0.7.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.3...v0.7.0
+[v0.7.0]: https://github.com/zellerlab/GECCO/compare/v0.6.3...v0.7.0
 ### Added
 - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow.
 - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand.
@@ -207,7 +227,7 @@
 - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command.
 
 ## [v0.6.3] - 2021-05-10
-[v0.6.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.2...v0.6.3
+[v0.6.3]: https://github.com/zellerlab/GECCO/compare/v0.6.2...v0.6.3
 ### Fixed
 - HMMER annotation not properly handling inputs with multiple contigs.
 - Some progress bar totals displaying as floats in the CLI.
@@ -218,7 +238,7 @@
 - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable.
 
 ## [v0.6.2] - 2021-05-04
-[v0.6.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.1...v0.6.2
+[v0.6.2]: https://github.com/zellerlab/GECCO/compare/v0.6.1...v0.6.2
 ### Fixed
 - `gecco cv loto` crashing because of outdated code.
 ### Changed
@@ -227,7 +247,7 @@
 - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record.
 
 ## [v0.6.1] - 2021-03-15
-[v0.6.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.0...v0.6.1
+[v0.6.1]: https://github.com/zellerlab/GECCO/compare/v0.6.0...v0.6.1
 ### Fixed
 - Progress bar not being disabled by `-q` flag in CLI.
 - Fallback to using HMM name if accession is not available in `PyHMMER`.
@@ -239,7 +259,7 @@
 - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`.
 
 ## [v0.6.0] - 2021-02-28
-[v0.6.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.5...v0.6.0
+[v0.6.0]: https://github.com/zellerlab/GECCO/compare/v0.5.5...v0.6.0
 ### Changed
 - Updated internal model with a cleaned-up version of the MIBiG-2.0
   Pfam-33.1/Tigrfam-15.0 embedding.
@@ -250,12 +270,12 @@
   protein IDs.
 
 ## [v0.5.5] - 2021-02-28
-[v0.5.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.4...v0.5.5
+[v0.5.5]: https://github.com/zellerlab/GECCO/compare/v0.5.4...v0.5.5
 ### Fixed
 - `gecco cv` bug causing only the last fold to be written.
 
 ## [v0.5.4] - 2021-02-28
-[v0.5.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.3...v0.5.4
+[v0.5.4]: https://github.com/zellerlab/GECCO/compare/v0.5.3...v0.5.4
 ### Changed
 - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`.
 ### Removed
@@ -265,7 +285,7 @@
 - `gecco embed` to embed BGCs into non-BGC regions using feature tables.
 
 ## [v0.5.3] - 2021-02-21
-[v0.5.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.2...v0.5.3
+[v0.5.3]: https://github.com/zellerlab/GECCO/compare/v0.5.2...v0.5.3
 ### Fixed
 - Coordinates of genes in output GenBank files.
 - Potential issue with the number of CPUs in `PyHMMER.run`.
@@ -273,7 +293,7 @@
 - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow.
 
 ## [v0.5.2] - 2021-01-29
-[v0.5.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.1...v0.5.2
+[v0.5.2]: https://github.com/zellerlab/GECCO/compare/v0.5.1...v0.5.2
 ### Added
 - Support for downloading HMM files directly from GitHub releases assets.
 - Validation of filtered HMMs with MD5 checksum.
@@ -284,13 +304,13 @@
 - Bump required `pyhmmer` version to `v0.2.1`.
 
 ## [v0.5.1] - 2021-01-15
-[v0.5.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.0...v0.5.1
+[v0.5.1]: https://github.com/zellerlab/GECCO/compare/v0.5.0...v0.5.1
 ### Fixed
 - `--hmm` flag being ignored in in `gecco run` command.
 - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs.
 
 ## [v0.5.0] - 2021-01-11
-[v0.5.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.5...v0.5.0
+[v0.5.0]: https://github.com/zellerlab/GECCO/compare/v0.4.5...v0.5.0
 ### Added
 - Explicit support for Python 3.9.
 ### Changed
@@ -300,7 +320,7 @@
 - `gecco cv` now requires *training* dependencies.
 
 ## [v0.4.5] - 2020-11-23
-[v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5
+[v0.4.5]: https://github.com/zellerlab/GECCO/compare/v0.4.4...v0.4.5
 ### Added
 - Additional `fold` column to cross-validation table output.
 ### Changed
@@ -309,7 +329,7 @@
 - `gecco.orf` was rewritten to extract genes from input sequences in parallel.
 
 ## [v0.4.4] - 2020-09-30
-[v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4
+[v0.4.4]: https://github.com/zellerlab/GECCO/compare/v0.4.3...v0.4.4
 ### Added
 - `gecco cv loto` command to run LOTO cross-validation using BGC types
   for stratification.
@@ -325,7 +345,7 @@
 - Bumped `pandas` training dependency to `v1.0`.
 
 ## [v0.4.3] - 2020-09-07
-[v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3
+[v0.4.3]: https://github.com/zellerlab/GECCO/compare/v0.4.2...v0.4.3
 ### Fixed
 - GenBank files being written with invalid `/cds` feature type.
 ### Changed
@@ -333,18 +353,18 @@
   and breaks the current code.
 
 ## [v0.4.2] - 2020-08-07
-[v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2
+[v0.4.2]: https://github.com/zellerlab/GECCO/compare/v0.4.1...v0.4.2
 ### Fixed
 - `TypeClassifier.predict_types` using inverse type probabilities when
   given several clusters to process.
 
 ## [v0.4.1] - 2020-08-07
-[v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1
+[v0.4.1]: https://github.com/zellerlab/GECCO/compare/v0.4.0...v0.4.1
 ### Fixed
 - `gecco run` command crashing on input sequences not containing any genes.
 
 ## [v0.4.0] - 2020-08-06
-[v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0
+[v0.4.0]: https://github.com/zellerlab/GECCO/compare/v0.3.0...v0.4.0
 ### Added
 - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC.
 ### Removed
@@ -356,7 +376,7 @@
    table to know the types of the input BGCs.
 
 ## [v0.3.0] - 2020-08-03
-[v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0
+[v0.3.0]: https://github.com/zellerlab/GECCO/compare/v0.2.2...v0.3.0
 ### Changed
 - Replaced Nearest-Neighbours classifier with Random Forest to perform type
   prediction for candidate BGCs.
@@ -367,7 +387,7 @@
 - `--metric` argument to the `gecco run` CLI command.
 
 ## [v0.2.2] - 2020-07-31
-[v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2
+[v0.2.2]: https://github.com/zellerlab/GECCO/compare/v0.2.1...v0.2.2
 ### Changed
 - `Domain` and `Gene` can now carry qualifiers that are used when they
   are translated to a sequence feature.
@@ -376,7 +396,7 @@
   in GenBank output files.
 
 ## [v0.2.1] - 2020-07-23
-[v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1
+[v0.2.1]: https://github.com/zellerlab/GECCO/compare/v0.2.0...v0.2.1
 ### Fixed
 - Various potential crashes in `ClusterRefiner` code.
 ### Removed
@@ -384,7 +404,7 @@
   Fisher Exact Test feature selection.
 
 ## [v0.2.0] - 2020-07-23
-[v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0
+[v0.2.0]: https://github.com/zellerlab/GECCO/compare/v0.1.1...v0.2.0
 ### Fixed
 - `pandas` warning about unsorted columns in `gecco run`.
 ### Removed
@@ -397,7 +417,7 @@
   contain any domain annotation.
 
 ## [v0.1.1] - 2020-07-22
-[v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1
+[v0.1.1]: https://github.com/zellerlab/GECCO/compare/v0.1.0...v0.1.1
 ### Added
 - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`.
 ### Changed
@@ -410,9 +430,9 @@
 - Included the `CHANGELOG.md` file to the generated docs.
 
 ## [v0.1.0] - 2020-07-17
-[v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0
+[v0.1.0]: https://github.com/zellerlab/GECCO/compare/v0.0.1...v0.1.0
 Initial release.
 
 ## [v0.0.1] - 2018-08-13
-[v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1
+[v0.0.1]: https://github.com/zellerlab/GECCO/compare/37afb97...v0.0.1
 Proof-of-concept.
--- a/gecco.xml	Wed Aug 10 12:36:38 2022 +0000
+++ b/gecco.xml	Mon Jan 16 18:35:56 2023 +0000
@@ -1,8 +1,17 @@
 <?xml version='1.0' encoding='utf-8'?>
-<tool id="gecco" name="GECCO" version="0.9.1" python_template_version="3.5">
+<tool id="gecco" name="GECCO" version="0.9.6" python_template_version="3.5">
     <description>is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).</description>
+    <creator>
+        <organization name="Zeller Team" url="https://www.embl.org/groups/zeller/"/>
+    </creator>
+    <edam_topics>
+        <edam_topic>topic_0080</edam_topic>
+    </edam_topics>
+    <edam_operations>
+        <edam_operation>operation_0415</edam_operation>
+    </edam_operations>
     <requirements>
-        <requirement type="package" version="0.9.1">gecco</requirement>
+        <requirement type="package" version="0.9.6">gecco</requirement>
     </requirements>
     <version_command>gecco --version</version_command>
     <command detect_errors="aggressive"><![CDATA[
@@ -34,6 +43,9 @@
         #if $antismash_sideload:
             --antismash-sideload
         #end if
+        #unless $pad:
+            --no-pad
+        #end unless
 
         && mv input_tempfile.genes.tsv '$genes'
         && mv input_tempfile.features.tsv '$features'
@@ -46,6 +58,7 @@
     <inputs>
         <param name="input" type="data" format="genbank,fasta,embl" label="Sequence file in GenBank, EMBL or FASTA format"/>
         <param argument="--mask" type="boolean" checked="false" label="Enable masking of regions with unknown nucleotides when finding ORFs"/>
+        <param argument="--pad" type="boolean" checked="true" label="Enable padding of gene sequences smaller than the CRF window length"/>
         <param argument="--cds" type="integer" min="0" value="" optional="true" label="Minimum number of genes required for a cluster"/>
         <param argument="--threshold" type="float" min="0" max="1" value="" optional="true" label="Probability threshold for cluster detection"/>
         <param argument="--postproc" type="select" label="Post-processing method for gene cluster validation">
@@ -72,10 +85,10 @@
             <output name="features" file="features.tsv"/>
             <output name="genes" file="genes.tsv"/>
             <output name="clusters" file="clusters.tsv"/>
+            <param name="edge_distance" value="10"/>
         </test>
         <test>
             <param name="input" value="BGC0001866.fna"/>
-            <param name="edge_distance" value="0"/>
             <output name="features" file="features.tsv"/>
             <output name="genes" file="genes.tsv"/>
             <output name="clusters" file="clusters.tsv"/>
@@ -86,7 +99,6 @@
         <test>
             <param name="input" value="BGC0001866.fna"/>
             <param name="antismash_sideload" value="True"/>
-            <param name="edge_distance" value="0"/>
             <output name="features" file="features.tsv"/>
             <output name="genes" file="genes.tsv"/>
             <output name="clusters" file="clusters.tsv"/>
--- a/test-data/BGC0001866.1_cluster_1.gbk	Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/BGC0001866.1_cluster_1.gbk	Mon Jan 16 18:35:56 2023 +0000
@@ -1,4 +1,4 @@
-LOCUS       BGC0001866.1_cluster_1 32633 bp    DNA     linear   UNK 06-APR-2022
+LOCUS       BGC0001866.1_cluster_1 32633 bp    DNA     linear   UNK 16-JAN-2023
 DEFINITION  BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome
             Unknown C8Q69scaffold_14, whole genome shotgun sequence.
 ACCESSION   BGC0001866.1_cluster_1
@@ -15,19 +15,19 @@
   JOURNAL   bioRxiv (2021.05.03.442509)
   REMARK    doi:10.1101/2021.05.03.442509
 COMMENT     ##GECCO-Data-START##
-            version                :: GECCO v0.9.1
-            creation_date          :: 2022-04-06T01:08:36.965708
-            biosyn_class           :: Polyketide
-            alkaloid_probability   :: 0.010000000000000009
-            polyketide_probability :: 0.96
-            ripp_probability       :: 0.0
-            saccharide_probability :: 0.0
-            terpene_probability    :: 0.010000000000000009
-            nrp_probability        :: 0.14
+            version                :: GECCO v0.9.6
+            creation_date          :: 2023-01-16T17:20:45.175113
+            cluster_type           :: Polyketide
+            alkaloid_probability   :: 0.010
+            nrp_probability        :: 0.140
+            polyketide_probability :: 0.960
+            ripp_probability       :: 0.000
+            saccharide_probability :: 0.000
+            terpene_probability    :: 0.010
             ##GECCO-Data-END##
 FEATURES             Location/Qualifiers
      CDS             complement(1..1143)
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_1"
                      /translation="MWIYEVDGHYIEPRRADTFLIWAGERYSAMIRLDKKPMDYSIRVP
@@ -37,98 +37,134 @@
                      QPESFNMVNPPYRDTFLTEFTGAMWVVLRYQVTSPGAWLLHCHFEMHLDNGMAMAILDG
                      VDKWPHVPPEYTQGFHGFREHELPGPAGFWGLVSKILRPESLVWAGGAAVVLLSLFIGG
                      LWRLWQRRMQGTYYVLSQEDERDRFSMDKEAWKSEETKRM*"
-     misc_feature    1..189
+                     /function="binding"
+                     /function="catalytic activity"
+                     /colour="129 14 21"
+                     /ApEinfo_fwdcolor="#810e15"
+                     /ApEinfo_revcolor="#810e15"
+     misc_feature    955..1143
                      /inference="protein motif"
                      /db_xref="PFAM:PF00394"
                      /db_xref="InterPro:IPR001117"
                      /note="e-value: 2.262067179461254e-08"
                      /note="p-value: 8.178117062405111e-12"
-                     /function="Multicopper oxidase"
+                     /function="Multicopper oxidase, type 1"
                      /standard_name="PF00394"
-     misc_feature    448..843
+     misc_feature    301..696
                      /inference="protein motif"
                      /db_xref="PFAM:PF07731"
                      /db_xref="InterPro:IPR011706"
+                     /db_xref="GO:0005507"
+                     /db_xref="GO:0016491"
                      /note="e-value: 4.059222969454281e-23"
                      /note="p-value: 1.467542649838858e-26"
-                     /function="Multicopper oxidase"
+                     /function="Multicopper oxidase, C-terminal"
                      /standard_name="PF07731"
      CDS             1179..1670
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_2"
                      /translation="MSSLRSSSHSPSGLPGQPRLPLLDRSREHSLPGDRAGWRTRSRLR
                      ATDLLSMVRMGSTYTIIRDMNYTDDESPGRSPFVCDSVIRPALVHERDLLVNKPLMART
                      IDAPFAVEKNTIDATDFISQSTRNVLISVHWNHTRSAVGCLHLLLYTGSSCSSPSQKAS
                      *"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      CDS             complement(2167..2376)
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_3"
                      /translation="MPAYLLLLACNVLLVLGAHVQRELVLTWEEGAPNGQSRQMIKTNG
                      QFPSPTLIFDEGDDVEVGGISFAN*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      CDS             2559..3032
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_4"
                      /translation="MLFNSEVGVEEHVVLWSFQETTSITMAEEIKLTPLETFAQAISAS
                      AKTIATYCRDSGHPQLSDDNSSGLTGDVLPPSAPQAVTAARQTILEASYRLQQLVTEPS
                      QYLPRLTVYVSVEQSPMKDQTNDRKAPAPGCLTLAVPFQNPGAHPRARHQDIL*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      CDS             3007..3576
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_5"
                      /translation="MQGTRTYYELATEAKVPLHQLQSIARMAITGSFLREPEPNIVAHS
                      RTSAHFVENPSLRDWTLFLAEDTAPMAMKLVEATEKWGDTRSKTETAFNLALGTDLAFF
                      KYLSSNPQFTQKFSGYMKNVTASEGTSIKHLVNGFDWASLGNAIVVDVRLQSSFTPYRS
                      HTDVIFYRLAVLLVMQALLSRNRSPI*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      CDS             3600..4043
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_6"
                      /translation="MVTSTSKDNREKTPLPETVASRISFESHDFFKPQPVQNADVYLLR
                      MILHDWSFKEAGEILANLVPSVKQGARILIMDTVLPRHGTVPVTEEALLRVRDMTMMET
                      FNSHEREIDEWKDLIQGVHTGLRVQQVIQPAGSSMAIIEVVRG*"
+                     /function="catalytic activity"
+                     /colour="129 14 21"
+                     /ApEinfo_fwdcolor="#810e15"
+                     /ApEinfo_revcolor="#810e15"
      misc_feature    3648..3962
                      /inference="protein motif"
                      /db_xref="PFAM:PF00891"
                      /db_xref="InterPro:IPR001077"
+                     /db_xref="GO:0008171"
                      /note="e-value: 4.890642309934635e-16"
                      /note="p-value: 1.7681280946979883e-19"
                      /function="O-methyltransferase domain"
                      /standard_name="PF00891"
      CDS             4337..4792
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_7"
                      /translation="MTQIVFGIAPTLLKTFSHLTALDLWRPSAPYVFDPVTSSTYLGTI
                      ADGVEEFLGIFYGQDTGGSNRFAPPKPYIPSRHSFINASTAGAACPQPYVPLPADPYTV
                      LTNVSEDCLSLRIARPENTKSTAKLPVMVWLYGGAYNRLPTDLQWET*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    4478..4756
                      /inference="protein motif"
                      /db_xref="PFAM:PF00135"
                      /db_xref="InterPro:IPR002018"
                      /note="e-value: 4.819217021121008e-21"
                      /note="p-value: 1.7423055029360116e-24"
-                     /function="Carboxylesterase family"
+                     /function="Carboxylesterase, type B"
                      /standard_name="PF00135"
      CDS             5038..5466
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_8"
                      /translation="MQDQRLGIEWIKNHISAFGGDPDNITLFGEDEGATYIALHILSNH
                      EVPFHRAILQSGAAITHHDVNGNRSARNFAAVAARCNCLSDGDRQVDSQDTVDCLRRVP
                      MEDLVNATFEVAHSVDPVNGFRALYVLLHFPSHKCKQD*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    5041..5379
                      /inference="protein motif"
                      /db_xref="PFAM:PF00135"
                      /db_xref="InterPro:IPR002018"
                      /note="e-value: 4.0935350990176556e-30"
                      /note="p-value: 1.4799476135277136e-33"
-                     /function="Carboxylesterase family"
+                     /function="Carboxylesterase, type B"
                      /standard_name="PF00135"
      CDS             5477..6253
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_9"
                      /translation="MPAVDGYMIPDEPSNLLSRGQVPANISILAGWTRDESSMSVPTSI
@@ -136,16 +172,20 @@
                      LTLTCPTIFQAWSLRLSSNCTTPVYLYELRQSPFATALNNSGVGYLGIVHFSDVPYVFN
                      ELERTYYITDPEENKLAQRMSASWTAFASGAFPLCERSERSLGRWEEAYGGDRVCRDRM
                      PEHVRVKGIGDNGDQDDGDEIGKLMARCGFINRLEY*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    5480..6103
                      /inference="protein motif"
                      /db_xref="PFAM:PF00135"
                      /db_xref="InterPro:IPR002018"
                      /note="e-value: 1.4624647008379705e-15"
                      /note="p-value: 5.287291037013632e-19"
-                     /function="Carboxylesterase family"
+                     /function="Carboxylesterase, type B"
                      /standard_name="PF00135"
      CDS             7412..8683
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_10"
                      /translation="MTGARFDESDHKWTVEGINGSHGTIRIRCRWYILALGFASKPYIP
@@ -156,39 +196,56 @@
                      FDTNTGALTSIHIQDTDGILLKDRWSYDGVMTTFGMSTSKFPNMFFFYGPQAPTAFSNG
                      PSCIELQGEFVEELILDMIGKGVTRVDTTSEAEKRWKESTLSLWNQFVFSSTKGFYTGE
                      NIPGKKAEPLNWYVLVLGLGVSKR*"
+                     /function="binding"
+                     /function="catalytic activity"
+                     /colour="129 14 21"
+                     /ApEinfo_fwdcolor="#810e15"
+                     /ApEinfo_revcolor="#810e15"
      misc_feature    7448..7783
                      /inference="protein motif"
                      /db_xref="PFAM:PF13434"
                      /db_xref="InterPro:IPR025700"
                      /note="e-value: 5.955898730893757e-08"
                      /note="p-value: 2.153253337271785e-11"
-                     /function="L-lysine 6-monooxygenase (NADPH-requiring)"
+                     /function="L-lysine 6-monooxygenase/L-ornithine
+                     5-monooxygenase"
                      /standard_name="PF13434"
      misc_feature    7517..7717
                      /inference="protein motif"
                      /db_xref="PFAM:PF00743"
                      /db_xref="InterPro:IPR020946"
+                     /db_xref="GO:0004499"
+                     /db_xref="GO:0050660"
+                     /db_xref="GO:0050661"
                      /note="e-value: 5.246542281818287e-07"
                      /note="p-value: 1.8967976434628658e-10"
-                     /function="Flavin-binding monooxygenase-like"
+                     /function="Flavin monooxygenase-like"
                      /standard_name="PF00743"
      CDS             9454..10038
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_11"
                      /translation="MCRGRLTRTVDERGIVSTESAHAAQRHHLASHVLDARFAGSIARL
                      GSLCLFLALLVAFVQELQKSESHHQRSGGVGLEDRRVVREGLVKPVVTHFGYVPFRRRS
                      CGMGSQVRCGDSSVIHQEVDIPILGGDVVDDALKVSMRGNAALDRVDVAMGLSQIVSTI
                      VIALWTWFVLNQPTAWLLLVRARAVARVCRL*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      CDS             10763..11191
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_12"
                      /translation="MRAGQLVPLVSTPTPSCLALQIVFCCCSTFLSDPLVLQNHRKMAD
                      EQKTPLESGQQPAVAQHTSTAELQTEKPGQMNGNGTADKPGPPGGKPFGPGMGPPIQYP
                      TGFKLYSIMTGLYLASFLTALVGWRSITDLTDSETYIG*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      CDS             11204..12316
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_13"
                      /translation="MLVVAIPQITDHFNSIDDIGWYGSAYLLTFCAFQLLFGKIYSFYN
@@ -198,31 +255,43 @@
                      FVGVQLWLQDKGTIPPRVMKQRSIAAGMAFTICVTAGFMSFNYYLPIWFQAIKNASSFH
                      SGVMMLPTVISSGVASLACGFIIHRVGYYTPFMIGGSVLMAIGAGLLTTFTPTTEHPKW
                      IGYQVLWALGCGMSTFQPPFFARCIFVGGY*"
+                     /function="transporter activity"
+                     /colour="100 149 237"
+                     /ApEinfo_fwdcolor="#6495ed"
+                     /ApEinfo_revcolor="#6495ed"
      misc_feature    11204..12289
                      /inference="protein motif"
                      /db_xref="PFAM:PF07690"
                      /db_xref="InterPro:IPR011701"
+                     /db_xref="GO:0022857"
+                     /db_xref="GO:0055085"
                      /note="e-value: 6.020530714201243e-37"
                      /note="p-value: 2.1766199255969786e-40"
-                     /function="Major Facilitator Superfamily"
+                     /function="Major facilitator superfamily"
                      /standard_name="PF07690"
      misc_feature    11252..11935
                      /inference="protein motif"
                      /db_xref="PFAM:PF06609"
                      /db_xref="InterPro:IPR010573"
+                     /db_xref="GO:0022857"
+                     /db_xref="GO:0055085"
                      /note="e-value: 9.83839354265682e-09"
                      /note="p-value: 3.55690294383833e-12"
-                     /function="Fungal trichothecene efflux pump (TRI12)"
+                     /function="Major facilitator transporter Str1/Tri12-like"
                      /standard_name="PF06609"
      CDS             12335..12781
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_14"
                      /translation="MQQASLAAQTVLPKPDAPIGISLIFFSQSLGGSVFLAVDDSIYSN
                      RLAAKLGSIPNLPQSALTNTGATNIRNLVAPQYLGRLLGGYNDALMDVFRVAVASSCAC
                      VVAAAFMEWKNVRAAKAAGPGGPGGPGGPGGPGGPEGLRGGNKV*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      CDS             14574..15566
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_15"
                      /translation="MTFEEMLSRPSPPPFAGPSHNSNRPTNMASTNQDQYYHDKGKHGE
@@ -231,16 +300,24 @@
                      ENGSHPTTDQVLKANSDAMKDAADLLACPCAKDFCFPIILGITACRVLAWYQVVIDMYD
                      PEIPMATMPTAREDIKHCPIAFGAYQLDEEVSQAMTSQFVLRNLRAMTRFVKTYVENFC
                      SDINKNRPGSCSLIYRSLGTFMQTRLGNTIEQLEDRLAAFDGEYTKNIG*"
+                     /function="binding"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    14988..15245
                      /inference="protein motif"
                      /db_xref="PFAM:PF08493"
                      /db_xref="InterPro:IPR013700"
+                     /db_xref="GO:0003677"
+                     /db_xref="GO:0005634"
+                     /db_xref="GO:0006355"
+                     /db_xref="GO:0045122"
                      /note="e-value: 2.686865976406516e-17"
                      /note="p-value: 9.713904470016327e-21"
                      /function="Aflatoxin regulatory protein"
                      /standard_name="PF08493"
      CDS             16827..18797
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_16"
                      /translation="MAICGIAVRLPGGISNDAQLWDFLLAKRDARSQVPGSRYNISGYH
@@ -255,13 +332,17 @@
                      GAQWPGMGVELFKSNATFRRSILEMDSVLQSLPDAPAWSIADEISKEHQTSMLYLSSYS
                      QPICTALQVALVNTLFELNIRPYAVIGHSSGELAAAYAAGRLTASQAVTLAYYRGIVAG
                      KVAQAGCMAAVGMGASEIIHF*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    16830..17570
                      /inference="protein motif"
                      /db_xref="PFAM:PF00109"
                      /db_xref="InterPro:IPR014030"
                      /note="e-value: 9.30510909096118e-60"
                      /note="p-value: 3.364103069761815e-63"
-                     /function="Beta-ketoacyl synthase, N-terminal domain"
+                     /function="Beta-ketoacyl synthase, N-terminal"
                      /standard_name="PF00109"
      misc_feature    17595..17930
                      /inference="protein motif"
@@ -269,7 +350,7 @@
                      /db_xref="InterPro:IPR014031"
                      /note="e-value: 2.2857331200304854e-35"
                      /note="p-value: 8.263677223537547e-39"
-                     /function="Beta-ketoacyl synthase, C-terminal domain"
+                     /function="Beta-ketoacyl synthase, C-terminal"
                      /standard_name="PF02801"
      misc_feature    17937..18290
                      /inference="protein motif"
@@ -277,7 +358,7 @@
                      /db_xref="InterPro:IPR032821"
                      /note="e-value: 4.800730099641783e-25"
                      /note="p-value: 1.7356218726109122e-28"
-                     /function="Ketoacyl-synthetase C-terminal extension"
+                     /function="Polyketide synthase, C-terminal extension"
                      /standard_name="PF16197"
      misc_feature    18360..18770
                      /inference="protein motif"
@@ -285,10 +366,10 @@
                      /db_xref="InterPro:IPR014043"
                      /note="e-value: 1.113401436161595e-26"
                      /note="p-value: 4.025312495161225e-30"
-                     /function="Acyl transferase domain"
+                     /function="Acyl transferase"
                      /standard_name="PF00698"
      CDS             18806..22078
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_17"
                      /translation="MVVACENSPSSVTISGDIDQVQYVMQEISLAHPEILCRQIKSDTA
@@ -310,13 +391,17 @@
                      WVTRSIQIDCRDPRYSPTLGVARTVRSEFGLDFGTCEVDTLKYTSIGLVIDVFEAFHGR
                      RHGQNAYPEYEYAIREDTVHIGRLSSFSVQEELRRIQKAHVETKDNRISLVAGTSGFDS
                      LAWQADAGQQVQLLGDDEVELQVDTAGVNFLVRCSFQFQGES*"
+                     /function="catalytic activity"
+                     /colour="129 14 21"
+                     /ApEinfo_fwdcolor="#810e15"
+                     /ApEinfo_revcolor="#810e15"
      misc_feature    18809..19258
                      /inference="protein motif"
                      /db_xref="PFAM:PF00698"
                      /db_xref="InterPro:IPR014043"
                      /note="e-value: 2.7208690154402465e-16"
                      /note="p-value: 9.836836642950999e-20"
-                     /function="Acyl transferase domain"
+                     /function="Acyl transferase"
                      /standard_name="PF00698"
      misc_feature    19487..20317
                      /inference="protein motif"
@@ -324,14 +409,13 @@
                      /db_xref="InterPro:IPR020807"
                      /note="e-value: 2.598574865139864e-60"
                      /note="p-value: 9.394703055458656e-64"
-                     /function="Polyketide synthase dehydratase"
+                     /function="Polyketide synthase, dehydratase domain"
                      /standard_name="PF14765"
      misc_feature    20786..21256
                      /inference="protein motif"
                      /db_xref="PFAM:PF13489"
                      /note="e-value: 1.04446701072283e-12"
                      /note="p-value: 3.776091868123029e-16"
-                     /function="Methyltransferase domain"
                      /standard_name="PF13489"
      misc_feature    20801..21133
                      /inference="protein motif"
@@ -347,7 +431,7 @@
                      /db_xref="InterPro:IPR041698"
                      /note="e-value: 2.4253465299984994e-13"
                      /note="p-value: 8.76842563267715e-17"
-                     /function="Methyltransferase domain"
+                     /function="Methyltransferase domain 25"
                      /standard_name="PF13649"
      misc_feature    20807..21103
                      /inference="protein motif"
@@ -355,33 +439,38 @@
                      /db_xref="InterPro:IPR013217"
                      /note="e-value: 3.7410690716593694e-22"
                      /note="p-value: 1.3525195486837923e-25"
-                     /function="Methyltransferase domain"
+                     /function="Methyltransferase type 12"
                      /standard_name="PF08242"
      misc_feature    20807..21106
                      /inference="protein motif"
                      /db_xref="PFAM:PF08241"
                      /db_xref="InterPro:IPR013216"
+                     /db_xref="GO:0008168"
                      /note="e-value: 5.4075572021556884e-12"
                      /note="p-value: 1.9550098344742185e-15"
-                     /function="Methyltransferase domain"
+                     /function="Methyltransferase type 11"
                      /standard_name="PF08241"
      CDS             22416..22889
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_18"
                      /translation="MQTVLINSASDGVGLAAIQISKMIGATIYATVIGEDKVEYLTASH
                      GIPRDHIFNSRDSSFLDGIMRVTNGRGVDLVLTSLSADFIQASCDCVANFGKLVNLSKP
                      TAANQGQFPIDSFHPNMSYASVDIIDYIKRRPKESKRYVITFRHSYQLCPACN*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    22449..22766
                      /inference="protein motif"
                      /db_xref="PFAM:PF00107"
                      /db_xref="InterPro:IPR013149"
                      /note="e-value: 1.1299405916297285e-15"
                      /note="p-value: 4.085106983476965e-19"
-                     /function="Zinc-binding dehydrogenase"
+                     /function="Alcohol dehydrogenase-like, C-terminal"
                      /standard_name="PF00107"
      CDS             22922..24277
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_19"
                      /translation="MELYKQGHIQPITPVKTFTATDIRQCFDYMQSGQHIGQLRLSLKS
@@ -392,13 +481,17 @@
                      QTLDYSRYENPAQFITGLRDTTGMLDSTGGKSMLLDSRLAAYVGNSAAVTAPTETKTSA
                      NKLNNFVSSAATDSAILSEPSATQFVSLEIARWVFDLLMKPVDDDSEIDLSRSLVDVGL
                      DSLAAVEMRSWLKSSLGLDISVLEIMASPSLAAMGEHVIRELVRKFGGDNKN*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    23114..23638
                      /inference="protein motif"
                      /db_xref="PFAM:PF08659"
                      /db_xref="InterPro:IPR013968"
                      /note="e-value: 1.5610077818520667e-61"
                      /note="p-value: 5.643556695054471e-65"
-                     /function="KR domain"
+                     /function="Polyketide synthase, ketoreductase domain"
                      /standard_name="PF08659"
      misc_feature    23123..23584
                      /inference="protein motif"
@@ -406,7 +499,7 @@
                      /db_xref="InterPro:IPR002347"
                      /note="e-value: 1.1731018314976082e-07"
                      /note="p-value: 4.2411490654288077e-11"
-                     /function="short chain dehydrogenase"
+                     /function="Short-chain dehydrogenase/reductase SDR"
                      /standard_name="PF00106"
      misc_feature    24071..24232
                      /inference="protein motif"
@@ -414,25 +507,28 @@
                      /db_xref="InterPro:IPR009081"
                      /note="e-value: 3.463550267794435e-10"
                      /note="p-value: 1.2521873708584363e-13"
-                     /function="Phosphopantetheine attachment site"
+                     /function="Phosphopantetheine binding ACP domain"
                      /standard_name="PF00550"
      CDS             25423..25710
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_20"
                      /translation="MAQKLRFYLFGDQTYDYDEQLRALLTSHDPVVRSFLERAYYTLRA
                      EVARIPNGYQARISRFSSIAELLSQRREHGVDASLEQALTVVYQLASFMR*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    25444..25704
                      /inference="protein motif"
                      /db_xref="PFAM:PF16073"
                      /db_xref="InterPro:IPR032088"
                      /note="e-value: 9.422238725791962e-24"
                      /note="p-value: 3.406449286258844e-27"
-                     /function="Starter unit:ACP transacylase in aflatoxin
-                     biosynthesis"
+                     /function="Starter unit:ACP transacylase"
                      /standard_name="PF16073"
      CDS             26198..29653
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_21"
                      /translation="MSRPYISAYASGGVTISGPPSVLAELRNTPGLSKLRAKDIPIHAP
@@ -455,14 +551,17 @@
                      GMVKSTLGNSIKALPTLQRNRNTWEVLTESVSTLYCMGFDINWTEYHRDFPSSQRVLRL
                      PSYSWDLKSYWIPYRNDWTLYKGDIVPESSIALPTHQNKPHSTSPKQQAPTPILETTTL
                      HRIVDEKSTEGTFSITCESDVSRPDLSPLVQGHKVEGIGLCTPV*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    26201..26338
                      /inference="protein motif"
                      /db_xref="PFAM:PF16073"
                      /db_xref="InterPro:IPR032088"
                      /note="e-value: 4.380197593141013e-11"
                      /note="p-value: 1.5835855362042708e-14"
-                     /function="Starter unit:ACP transacylase in aflatoxin
-                     biosynthesis"
+                     /function="Starter unit:ACP transacylase"
                      /standard_name="PF16073"
      misc_feature    26729..27475
                      /inference="protein motif"
@@ -470,7 +569,7 @@
                      /db_xref="InterPro:IPR014030"
                      /note="e-value: 2.7499815692371726e-82"
                      /note="p-value: 9.942088102809735e-86"
-                     /function="Beta-ketoacyl synthase, N-terminal domain"
+                     /function="Beta-ketoacyl synthase, N-terminal"
                      /standard_name="PF00109"
      misc_feature    27497..27862
                      /inference="protein motif"
@@ -478,7 +577,7 @@
                      /db_xref="InterPro:IPR014031"
                      /note="e-value: 2.4774456171918303e-34"
                      /note="p-value: 8.956780973217029e-38"
-                     /function="Beta-ketoacyl synthase, C-terminal domain"
+                     /function="Beta-ketoacyl synthase, C-terminal"
                      /standard_name="PF02801"
      misc_feature    27896..28216
                      /inference="protein motif"
@@ -486,7 +585,7 @@
                      /db_xref="InterPro:IPR032821"
                      /note="e-value: 8.475099126640419e-07"
                      /note="p-value: 3.0640271607521397e-10"
-                     /function="Ketoacyl-synthetase C-terminal extension"
+                     /function="Polyketide synthase, C-terminal extension"
                      /standard_name="PF16197"
      misc_feature    28322..29233
                      /inference="protein motif"
@@ -494,10 +593,10 @@
                      /db_xref="InterPro:IPR014043"
                      /note="e-value: 4.739349423268586e-38"
                      /note="p-value: 1.7134307387088164e-41"
-                     /function="Acyl transferase domain"
+                     /function="Acyl transferase"
                      /standard_name="PF00698"
      CDS             29804..30544
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_22"
                      /translation="MVIEKALMPLNAGPQLLRVTASLIWSEKEASVRFYSVDVRRPSSK
@@ -505,16 +604,20 @@
                      YRFNGPMAYNMVQALAEFHPDYRCIDETILDNETLEAACTVSFGNVKKEGVFHTHPGYI
                      DGLTQSGGFVMNANDKTNLGVEVFVNHGWDSFQLYEPVTDDRSYQTHVRMRPAESNQWK
                      GDVVVLSGENLVACVRGLTVSRET*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    29918..30535
                      /inference="protein motif"
                      /db_xref="PFAM:PF14765"
                      /db_xref="InterPro:IPR020807"
                      /note="e-value: 8.019334685871699e-11"
                      /note="p-value: 2.8992533209948296e-14"
-                     /function="Polyketide synthase dehydratase"
+                     /function="Polyketide synthase, dehydratase domain"
                      /standard_name="PF14765"
      CDS             30591..32633
-                     /inference="ab initio prediction:Prodigal:2.6"
+                     /inference="ab initio prediction:Pyrodigal:2.0.4"
                      /transl_table=11
                      /locus_tag="BGC0001866.1_23"
                      /translation="MLTTFQIQGVPRRVLRYILQSSAKTTQTATSSVPAPSQAPVMVPQ
@@ -529,13 +632,17 @@
                      LAYRAAQILQKAAANPQKPVVESLLLLDSPPPTGLGKLPKHFFDYCDQIGIFGQGTAKA
                      PEWLITHFQGTNSVLHEYHATPFSFGTAPRTGIIWASQTVFETRAVAPPPVRPDDTEDM
                      KFLTERRTDFSAGSWGHMFPGTEVLIETAYGADHFSLLVSLLFRD*"
+                     /function="unknown"
+                     /colour="128 128 128"
+                     /ApEinfo_fwdcolor="#808080"
+                     /ApEinfo_revcolor="#808080"
      misc_feature    30789..30974
                      /inference="protein motif"
                      /db_xref="PFAM:PF00550"
                      /db_xref="InterPro:IPR009081"
                      /note="e-value: 6.066413293337807e-14"
                      /note="p-value: 2.193207987468477e-17"
-                     /function="Phosphopantetheine attachment site"
+                     /function="Phosphopantetheine binding ACP domain"
                      /standard_name="PF00550"
      misc_feature    31110..31304
                      /inference="protein motif"
@@ -543,7 +650,7 @@
                      /db_xref="InterPro:IPR009081"
                      /note="e-value: 4.042537132792419e-10"
                      /note="p-value: 1.461510170930014e-13"
-                     /function="Phosphopantetheine attachment site"
+                     /function="Phosphopantetheine binding ACP domain"
                      /standard_name="PF00550"
      misc_feature    31485..31670
                      /inference="protein motif"
@@ -551,15 +658,16 @@
                      /db_xref="InterPro:IPR009081"
                      /note="e-value: 1.4101442109719659e-08"
                      /note="p-value: 5.098135252971677e-12"
-                     /function="Phosphopantetheine attachment site"
+                     /function="Phosphopantetheine binding ACP domain"
                      /standard_name="PF00550"
      misc_feature    31917..32240
                      /inference="protein motif"
                      /db_xref="PFAM:PF00975"
                      /db_xref="InterPro:IPR001031"
+                     /db_xref="GO:0009058"
                      /note="e-value: 6.91897478936856e-24"
                      /note="p-value: 2.5014370171252933e-27"
-                     /function="Thioesterase domain"
+                     /function="Thioesterase"
                      /standard_name="PF00975"
 ORIGIN
         1 ttacatccgc ttagtctcct cggacttcca tgcttccttg tccattgaga aacgatccct
--- a/test-data/clusters.tsv	Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/clusters.tsv	Mon Jan 16 18:35:56 2023 +0000
@@ -1,2 +1,2 @@
-sequence_id	bgc_id	start	end	average_p	max_p	type	alkaloid_probability	polyketide_probability	ripp_probability	saccharide_probability	terpene_probability	nrp_probability	proteins	domains
-BGC0001866.1	BGC0001866.1_cluster_1	347	32979	0.9958958770931704	0.9999999976946022	Polyketide	0.010000000000000009	0.96	0.0	0.0	0.010000000000000009	0.14	BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23	PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
+sequence_id	cluster_id	start	end	average_p	max_p	type	alkaloid_probability	nrp_probability	polyketide_probability	ripp_probability	saccharide_probability	terpene_probability	proteins	domains
+BGC0001866.1	BGC0001866.1_cluster_1	347	32979	0.9958958770931705	0.9999999976946022	Polyketide	0.010000000000000009	0.14	0.96	0.0	0.0	0.010000000000000009	BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23	PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
--- a/test-data/features.tsv	Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/features.tsv	Mon Jan 16 18:35:56 2023 +0000
@@ -1,4 +1,4 @@
-sequence_id	protein_id	start	end	strand	domain	hmm	i_evalue	pvalue	domain_start	domain_end	bgc_probability
+sequence_id	protein_id	start	end	strand	domain	hmm	i_evalue	pvalue	domain_start	domain_end	cluster_probability
 BGC0001866.1	BGC0001866.1_1	347	1489	-	PF00394	Pfam	2.262067179461254e-08	8.178117062405111e-12	1	63	0.9791890143072265
 BGC0001866.1	BGC0001866.1_1	347	1489	-	PF07731	Pfam	4.059222969454281e-23	1.467542649838858e-26	150	281	0.9791890143072265
 BGC0001866.1	BGC0001866.1_6	3946	4389	+	PF00891	Pfam	4.890642309934635e-16	1.7681280946979883e-19	17	121	0.9955095513800687
--- a/test-data/sideload.json	Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/sideload.json	Mon Jan 16 18:35:56 2023 +0000
@@ -27,11 +27,13 @@
             "e-filter": "None",
             "edge-distance": "0",
             "mask": "False",
+            "no-pad": "False",
+            "p-filter": "1e-09",
             "postproc": "'gecco'",
             "threshold": "0.8"
         },
         "description": "Biosynthetic Gene Cluster prediction with Conditional Random Fields.",
         "name": "GECCO",
-        "version": "0.9.1"
+        "version": "0.9.6"
     }
 }
\ No newline at end of file