Repository 'gecco'
hg clone https://toolshed.g2.bx.psu.edu/repos/althonos/gecco

Changeset 19:cc91d730cc4f (2023-01-16)
Previous changeset 18:3dd71eaa2909 (2022-08-10) Next changeset 20:64b724dd8d04 (2023-03-27)
Commit message:
Fix syntax of Galaxy script for GECCO
modified:
CHANGELOG.md
gecco.xml
test-data/BGC0001866.1_cluster_1.gbk
test-data/clusters.tsv
test-data/features.tsv
test-data/sideload.json
b
diff -r 3dd71eaa2909 -r cc91d730cc4f CHANGELOG.md
--- a/CHANGELOG.md Wed Aug 10 12:36:38 2022 +0000
+++ b/CHANGELOG.md Mon Jan 16 18:35:56 2023 +0000
[
b'@@ -5,11 +5,31 @@\n and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).\n \n ## [Unreleased]\n-[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.5...master\n+[Unreleased]: https://github.com/zellerlab/GECCO/compare/v0.9.6...master\n+\n+\n+## [v0.9.6] - 2023-01-11\n+[v0.9.6]: https://github.com/zellerlab/GECCO/compare/v0.9.5...v0.9.6\n+\n+### Added\n+- Gene Ontology annotations to `gecco.interpro` local metadata.\n+- Reference to Gene Ontology terms and derived functions to `gecco.model.Domain` objects.\n+- Gene color based on predicted function in `gecco.model.Gene.to_seq_feature`.\n+\n+### Fixed\n+- Missing `gzip` import in the CLI preventing usage of gzip-compressed inputs.\n+- Invalid coordinates of domains found in reverse-strand genes.\n+- Detection of entry points with `importlib.metadata` on older Python versions.\n+\n+### Changed\n+- `bgc_id` columns of cluster tables are renamed `cluster_id`.\n+- `gecco.model.ProductType` is renamed to `gecco.model.ClusterType`.\n+- Bumped `pyrodigal` dependency to `v2.0`.\n+- Bumped `pyhmmer` dependency to `v0.7`.\n \n \n ## [v0.9.5] - 2022-08-10\n-[v0.9.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.4...v0.9.5\n+[v0.9.5]: https://github.com/zellerlab/GECCO/compare/v0.9.4...v0.9.5\n \n ### Added\n - `gecco predict` command to predict BGCs from an annotated genome.\n@@ -21,7 +41,7 @@\n \n \n ## [v0.9.4] - 2022-05-31\n-[v0.9.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.3...v0.9.4\n+[v0.9.4]: https://github.com/zellerlab/GECCO/compare/v0.9.3...v0.9.4\n \n ### Added\n - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`.\n@@ -39,7 +59,7 @@\n \n \n ## [v0.9.3] - 2022-05-13\n-[v0.9.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.2...v0.9.3\n+[v0.9.3]: https://github.com/zellerlab/GECCO/compare/v0.9.2...v0.9.3\n \n ### Changed\n - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`.\n@@ -49,20 +69,20 @@\n \n \n ## [v0.9.2] - 2022-04-11\n-[v0.9.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1...v0.9.2\n+[v0.9.2]: https://github.com/zellerlab/GECCO/compare/v0.9.1...v0.9.2\n \n ### Added\n - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`.\n \n ## [v0.9.1] - 2022-04-05\n-[v0.9.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha4...v0.9.1\n+[v0.9.1]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha4...v0.9.1\n \n ### Changed\n - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window.\n - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`.\n \n ## [v0.9.1-alpha4] - 2022-03-31\n-[v0.9.1-alpha4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4\n+[v0.9.1-alpha4]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4\n \n Retrain internal model with:\n ```\n@@ -74,7 +94,7 @@\n ```\n \n ## [v0.9.1-alpha3] - 2022-03-23\n-[v0.9.1-alpha3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3\n+[v0.9.1-alpha3]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3\n \n ### Added\n - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains.\n@@ -85,33 +105,33 @@\n - `gecco train` expects a gene table instead of a GFF file for the gene coordinates.\n \n ## [v0.9.1-alpha2] - 2022-03-23\n-[v0.9.1-alpha2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2\n+[v0.9.1-alpha2]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2\n \n ### Fixed\n - `TypeClassifier.trained` not being able to read unknown types from type tables.\n \n ## [v0.9.1-alpha1] - 2022-03-20\n-[v0.9.1-alpha1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.10...v0.9.1-alpha1\n+[v0.9.1-alpha1]: https://github.co'..b'`gecco cv` now requires *training* dependencies.\n \n ## [v0.4.5] - 2020-11-23\n-[v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5\n+[v0.4.5]: https://github.com/zellerlab/GECCO/compare/v0.4.4...v0.4.5\n ### Added\n - Additional `fold` column to cross-validation table output.\n ### Changed\n@@ -309,7 +329,7 @@\n - `gecco.orf` was rewritten to extract genes from input sequences in parallel.\n \n ## [v0.4.4] - 2020-09-30\n-[v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4\n+[v0.4.4]: https://github.com/zellerlab/GECCO/compare/v0.4.3...v0.4.4\n ### Added\n - `gecco cv loto` command to run LOTO cross-validation using BGC types\n   for stratification.\n@@ -325,7 +345,7 @@\n - Bumped `pandas` training dependency to `v1.0`.\n \n ## [v0.4.3] - 2020-09-07\n-[v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3\n+[v0.4.3]: https://github.com/zellerlab/GECCO/compare/v0.4.2...v0.4.3\n ### Fixed\n - GenBank files being written with invalid `/cds` feature type.\n ### Changed\n@@ -333,18 +353,18 @@\n   and breaks the current code.\n \n ## [v0.4.2] - 2020-08-07\n-[v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2\n+[v0.4.2]: https://github.com/zellerlab/GECCO/compare/v0.4.1...v0.4.2\n ### Fixed\n - `TypeClassifier.predict_types` using inverse type probabilities when\n   given several clusters to process.\n \n ## [v0.4.1] - 2020-08-07\n-[v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1\n+[v0.4.1]: https://github.com/zellerlab/GECCO/compare/v0.4.0...v0.4.1\n ### Fixed\n - `gecco run` command crashing on input sequences not containing any genes.\n \n ## [v0.4.0] - 2020-08-06\n-[v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0\n+[v0.4.0]: https://github.com/zellerlab/GECCO/compare/v0.3.0...v0.4.0\n ### Added\n - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC.\n ### Removed\n@@ -356,7 +376,7 @@\n    table to know the types of the input BGCs.\n \n ## [v0.3.0] - 2020-08-03\n-[v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0\n+[v0.3.0]: https://github.com/zellerlab/GECCO/compare/v0.2.2...v0.3.0\n ### Changed\n - Replaced Nearest-Neighbours classifier with Random Forest to perform type\n   prediction for candidate BGCs.\n@@ -367,7 +387,7 @@\n - `--metric` argument to the `gecco run` CLI command.\n \n ## [v0.2.2] - 2020-07-31\n-[v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2\n+[v0.2.2]: https://github.com/zellerlab/GECCO/compare/v0.2.1...v0.2.2\n ### Changed\n - `Domain` and `Gene` can now carry qualifiers that are used when they\n   are translated to a sequence feature.\n@@ -376,7 +396,7 @@\n   in GenBank output files.\n \n ## [v0.2.1] - 2020-07-23\n-[v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1\n+[v0.2.1]: https://github.com/zellerlab/GECCO/compare/v0.2.0...v0.2.1\n ### Fixed\n - Various potential crashes in `ClusterRefiner` code.\n ### Removed\n@@ -384,7 +404,7 @@\n   Fisher Exact Test feature selection.\n \n ## [v0.2.0] - 2020-07-23\n-[v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0\n+[v0.2.0]: https://github.com/zellerlab/GECCO/compare/v0.1.1...v0.2.0\n ### Fixed\n - `pandas` warning about unsorted columns in `gecco run`.\n ### Removed\n@@ -397,7 +417,7 @@\n   contain any domain annotation.\n \n ## [v0.1.1] - 2020-07-22\n-[v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1\n+[v0.1.1]: https://github.com/zellerlab/GECCO/compare/v0.1.0...v0.1.1\n ### Added\n - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`.\n ### Changed\n@@ -410,9 +430,9 @@\n - Included the `CHANGELOG.md` file to the generated docs.\n \n ## [v0.1.0] - 2020-07-17\n-[v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0\n+[v0.1.0]: https://github.com/zellerlab/GECCO/compare/v0.0.1...v0.1.0\n Initial release.\n \n ## [v0.0.1] - 2018-08-13\n-[v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1\n+[v0.0.1]: https://github.com/zellerlab/GECCO/compare/37afb97...v0.0.1\n Proof-of-concept.\n'
b
diff -r 3dd71eaa2909 -r cc91d730cc4f gecco.xml
--- a/gecco.xml Wed Aug 10 12:36:38 2022 +0000
+++ b/gecco.xml Mon Jan 16 18:35:56 2023 +0000
[
@@ -1,8 +1,17 @@
 <?xml version='1.0' encoding='utf-8'?>
-<tool id="gecco" name="GECCO" version="0.9.1" python_template_version="3.5">
+<tool id="gecco" name="GECCO" version="0.9.6" python_template_version="3.5">
     <description>is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).</description>
+    <creator>
+        <organization name="Zeller Team" url="https://www.embl.org/groups/zeller/"/>
+    </creator>
+    <edam_topics>
+        <edam_topic>topic_0080</edam_topic>
+    </edam_topics>
+    <edam_operations>
+        <edam_operation>operation_0415</edam_operation>
+    </edam_operations>
     <requirements>
-        <requirement type="package" version="0.9.1">gecco</requirement>
+        <requirement type="package" version="0.9.6">gecco</requirement>
     </requirements>
     <version_command>gecco --version</version_command>
     <command detect_errors="aggressive"><![CDATA[
@@ -34,6 +43,9 @@
         #if $antismash_sideload:
             --antismash-sideload
         #end if
+        #unless $pad:
+            --no-pad
+        #end unless
 
         && mv input_tempfile.genes.tsv '$genes'
         && mv input_tempfile.features.tsv '$features'
@@ -46,6 +58,7 @@
     <inputs>
         <param name="input" type="data" format="genbank,fasta,embl" label="Sequence file in GenBank, EMBL or FASTA format"/>
         <param argument="--mask" type="boolean" checked="false" label="Enable masking of regions with unknown nucleotides when finding ORFs"/>
+        <param argument="--pad" type="boolean" checked="true" label="Enable padding of gene sequences smaller than the CRF window length"/>
         <param argument="--cds" type="integer" min="0" value="" optional="true" label="Minimum number of genes required for a cluster"/>
         <param argument="--threshold" type="float" min="0" max="1" value="" optional="true" label="Probability threshold for cluster detection"/>
         <param argument="--postproc" type="select" label="Post-processing method for gene cluster validation">
@@ -72,10 +85,10 @@
             <output name="features" file="features.tsv"/>
             <output name="genes" file="genes.tsv"/>
             <output name="clusters" file="clusters.tsv"/>
+            <param name="edge_distance" value="10"/>
         </test>
         <test>
             <param name="input" value="BGC0001866.fna"/>
-            <param name="edge_distance" value="0"/>
             <output name="features" file="features.tsv"/>
             <output name="genes" file="genes.tsv"/>
             <output name="clusters" file="clusters.tsv"/>
@@ -86,7 +99,6 @@
         <test>
             <param name="input" value="BGC0001866.fna"/>
             <param name="antismash_sideload" value="True"/>
-            <param name="edge_distance" value="0"/>
             <output name="features" file="features.tsv"/>
             <output name="genes" file="genes.tsv"/>
             <output name="clusters" file="clusters.tsv"/>
b
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/BGC0001866.1_cluster_1.gbk
--- a/test-data/BGC0001866.1_cluster_1.gbk Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/BGC0001866.1_cluster_1.gbk Mon Jan 16 18:35:56 2023 +0000
b
b'@@ -1,4 +1,4 @@\n-LOCUS       BGC0001866.1_cluster_1 32633 bp    DNA     linear   UNK 06-APR-2022\n+LOCUS       BGC0001866.1_cluster_1 32633 bp    DNA     linear   UNK 16-JAN-2023\n DEFINITION  BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome\n             Unknown C8Q69scaffold_14, whole genome shotgun sequence.\n ACCESSION   BGC0001866.1_cluster_1\n@@ -15,19 +15,19 @@\n   JOURNAL   bioRxiv (2021.05.03.442509)\n   REMARK    doi:10.1101/2021.05.03.442509\n COMMENT     ##GECCO-Data-START##\n-            version                :: GECCO v0.9.1\n-            creation_date          :: 2022-04-06T01:08:36.965708\n-            biosyn_class           :: Polyketide\n-            alkaloid_probability   :: 0.010000000000000009\n-            polyketide_probability :: 0.96\n-            ripp_probability       :: 0.0\n-            saccharide_probability :: 0.0\n-            terpene_probability    :: 0.010000000000000009\n-            nrp_probability        :: 0.14\n+            version                :: GECCO v0.9.6\n+            creation_date          :: 2023-01-16T17:20:45.175113\n+            cluster_type           :: Polyketide\n+            alkaloid_probability   :: 0.010\n+            nrp_probability        :: 0.140\n+            polyketide_probability :: 0.960\n+            ripp_probability       :: 0.000\n+            saccharide_probability :: 0.000\n+            terpene_probability    :: 0.010\n             ##GECCO-Data-END##\n FEATURES             Location/Qualifiers\n      CDS             complement(1..1143)\n-                     /inference="ab initio prediction:Prodigal:2.6"\n+                     /inference="ab initio prediction:Pyrodigal:2.0.4"\n                      /transl_table=11\n                      /locus_tag="BGC0001866.1_1"\n                      /translation="MWIYEVDGHYIEPRRADTFLIWAGERYSAMIRLDKKPMDYSIRVP\n@@ -37,98 +37,134 @@\n                      QPESFNMVNPPYRDTFLTEFTGAMWVVLRYQVTSPGAWLLHCHFEMHLDNGMAMAILDG\n                      VDKWPHVPPEYTQGFHGFREHELPGPAGFWGLVSKILRPESLVWAGGAAVVLLSLFIGG\n                      LWRLWQRRMQGTYYVLSQEDERDRFSMDKEAWKSEETKRM*"\n-     misc_feature    1..189\n+                     /function="binding"\n+                     /function="catalytic activity"\n+                     /colour="129 14 21"\n+                     /ApEinfo_fwdcolor="#810e15"\n+                     /ApEinfo_revcolor="#810e15"\n+     misc_feature    955..1143\n                      /inference="protein motif"\n                      /db_xref="PFAM:PF00394"\n                      /db_xref="InterPro:IPR001117"\n                      /note="e-value: 2.262067179461254e-08"\n                      /note="p-value: 8.178117062405111e-12"\n-                     /function="Multicopper oxidase"\n+                     /function="Multicopper oxidase, type 1"\n                      /standard_name="PF00394"\n-     misc_feature    448..843\n+     misc_feature    301..696\n                      /inference="protein motif"\n                      /db_xref="PFAM:PF07731"\n                      /db_xref="InterPro:IPR011706"\n+                     /db_xref="GO:0005507"\n+                     /db_xref="GO:0016491"\n                      /note="e-value: 4.059222969454281e-23"\n                      /note="p-value: 1.467542649838858e-26"\n-                     /function="Multicopper oxidase"\n+                     /function="Multicopper oxidase, C-terminal"\n                      /standard_name="PF07731"\n      CDS             1179..1670\n-                     /inference="ab initio prediction:Prodigal:2.6"\n+                     /inference="ab initio prediction:Pyrodigal:2.0.4"\n                      /transl_table=11\n                      /locus_tag="BGC0001866.1_2"\n                      /translation="MSSLRSSSHSPSGLPGQPRLPLLDRSREHSLPGDRAGWRTRSRLR\n                      ATDLLSMVRMGSTYTIIRDMNYTDDESPGRSPFVCDSVIRPALVHERDLLVNKPLMART\n                      IDAPFAVEKNTIDATDFISQSTRNVLISVHWNHTRSAVGCLHLLLYTGSSCSSPSQKAS\n                      *"\n+                     /function="unknown"\n+                     /colo'..b'ference="ab initio prediction:Prodigal:2.6"\n+                     /inference="ab initio prediction:Pyrodigal:2.0.4"\n                      /transl_table=11\n                      /locus_tag="BGC0001866.1_22"\n                      /translation="MVIEKALMPLNAGPQLLRVTASLIWSEKEASVRFYSVDVRRPSSK\n@@ -505,16 +604,20 @@\n                      YRFNGPMAYNMVQALAEFHPDYRCIDETILDNETLEAACTVSFGNVKKEGVFHTHPGYI\n                      DGLTQSGGFVMNANDKTNLGVEVFVNHGWDSFQLYEPVTDDRSYQTHVRMRPAESNQWK\n                      GDVVVLSGENLVACVRGLTVSRET*"\n+                     /function="unknown"\n+                     /colour="128 128 128"\n+                     /ApEinfo_fwdcolor="#808080"\n+                     /ApEinfo_revcolor="#808080"\n      misc_feature    29918..30535\n                      /inference="protein motif"\n                      /db_xref="PFAM:PF14765"\n                      /db_xref="InterPro:IPR020807"\n                      /note="e-value: 8.019334685871699e-11"\n                      /note="p-value: 2.8992533209948296e-14"\n-                     /function="Polyketide synthase dehydratase"\n+                     /function="Polyketide synthase, dehydratase domain"\n                      /standard_name="PF14765"\n      CDS             30591..32633\n-                     /inference="ab initio prediction:Prodigal:2.6"\n+                     /inference="ab initio prediction:Pyrodigal:2.0.4"\n                      /transl_table=11\n                      /locus_tag="BGC0001866.1_23"\n                      /translation="MLTTFQIQGVPRRVLRYILQSSAKTTQTATSSVPAPSQAPVMVPQ\n@@ -529,13 +632,17 @@\n                      LAYRAAQILQKAAANPQKPVVESLLLLDSPPPTGLGKLPKHFFDYCDQIGIFGQGTAKA\n                      PEWLITHFQGTNSVLHEYHATPFSFGTAPRTGIIWASQTVFETRAVAPPPVRPDDTEDM\n                      KFLTERRTDFSAGSWGHMFPGTEVLIETAYGADHFSLLVSLLFRD*"\n+                     /function="unknown"\n+                     /colour="128 128 128"\n+                     /ApEinfo_fwdcolor="#808080"\n+                     /ApEinfo_revcolor="#808080"\n      misc_feature    30789..30974\n                      /inference="protein motif"\n                      /db_xref="PFAM:PF00550"\n                      /db_xref="InterPro:IPR009081"\n                      /note="e-value: 6.066413293337807e-14"\n                      /note="p-value: 2.193207987468477e-17"\n-                     /function="Phosphopantetheine attachment site"\n+                     /function="Phosphopantetheine binding ACP domain"\n                      /standard_name="PF00550"\n      misc_feature    31110..31304\n                      /inference="protein motif"\n@@ -543,7 +650,7 @@\n                      /db_xref="InterPro:IPR009081"\n                      /note="e-value: 4.042537132792419e-10"\n                      /note="p-value: 1.461510170930014e-13"\n-                     /function="Phosphopantetheine attachment site"\n+                     /function="Phosphopantetheine binding ACP domain"\n                      /standard_name="PF00550"\n      misc_feature    31485..31670\n                      /inference="protein motif"\n@@ -551,15 +658,16 @@\n                      /db_xref="InterPro:IPR009081"\n                      /note="e-value: 1.4101442109719659e-08"\n                      /note="p-value: 5.098135252971677e-12"\n-                     /function="Phosphopantetheine attachment site"\n+                     /function="Phosphopantetheine binding ACP domain"\n                      /standard_name="PF00550"\n      misc_feature    31917..32240\n                      /inference="protein motif"\n                      /db_xref="PFAM:PF00975"\n                      /db_xref="InterPro:IPR001031"\n+                     /db_xref="GO:0009058"\n                      /note="e-value: 6.91897478936856e-24"\n                      /note="p-value: 2.5014370171252933e-27"\n-                     /function="Thioesterase domain"\n+                     /function="Thioesterase"\n                      /standard_name="PF00975"\n ORIGIN\n         1 ttacatccgc ttagtctcct cggacttcca tgcttccttg tccattgaga aacgatccct\n'
b
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/clusters.tsv
--- a/test-data/clusters.tsv Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/clusters.tsv Mon Jan 16 18:35:56 2023 +0000
b
@@ -1,2 +1,2 @@
-sequence_id bgc_id start end average_p max_p type alkaloid_probability polyketide_probability ripp_probability saccharide_probability terpene_probability nrp_probability proteins domains
-BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9958958770931704 0.9999999976946022 Polyketide 0.010000000000000009 0.96 0.0 0.0 0.010000000000000009 0.14 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
+sequence_id cluster_id start end average_p max_p type alkaloid_probability nrp_probability polyketide_probability ripp_probability saccharide_probability terpene_probability proteins domains
+BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9958958770931705 0.9999999976946022 Polyketide 0.010000000000000009 0.14 0.96 0.0 0.0 0.010000000000000009 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
b
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/features.tsv
--- a/test-data/features.tsv Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/features.tsv Mon Jan 16 18:35:56 2023 +0000
b
@@ -1,4 +1,4 @@
-sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end bgc_probability
+sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end cluster_probability
 BGC0001866.1 BGC0001866.1_1 347 1489 - PF00394 Pfam 2.262067179461254e-08 8.178117062405111e-12 1 63 0.9791890143072265
 BGC0001866.1 BGC0001866.1_1 347 1489 - PF07731 Pfam 4.059222969454281e-23 1.467542649838858e-26 150 281 0.9791890143072265
 BGC0001866.1 BGC0001866.1_6 3946 4389 + PF00891 Pfam 4.890642309934635e-16 1.7681280946979883e-19 17 121 0.9955095513800687
b
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/sideload.json
--- a/test-data/sideload.json Wed Aug 10 12:36:38 2022 +0000
+++ b/test-data/sideload.json Mon Jan 16 18:35:56 2023 +0000
b
@@ -27,11 +27,13 @@
             "e-filter": "None",
             "edge-distance": "0",
             "mask": "False",
+            "no-pad": "False",
+            "p-filter": "1e-09",
             "postproc": "'gecco'",
             "threshold": "0.8"
         },
         "description": "Biosynthetic Gene Cluster prediction with Conditional Random Fields.",
         "name": "GECCO",
-        "version": "0.9.1"
+        "version": "0.9.6"
     }
 }
\ No newline at end of file