Previous changeset 18:3dd71eaa2909 (2022-08-10) Next changeset 20:64b724dd8d04 (2023-03-27) |
Commit message:
Fix syntax of Galaxy script for GECCO |
modified:
CHANGELOG.md gecco.xml test-data/BGC0001866.1_cluster_1.gbk test-data/clusters.tsv test-data/features.tsv test-data/sideload.json |
b |
diff -r 3dd71eaa2909 -r cc91d730cc4f CHANGELOG.md --- a/CHANGELOG.md Wed Aug 10 12:36:38 2022 +0000 +++ b/CHANGELOG.md Mon Jan 16 18:35:56 2023 +0000 |
[ |
b'@@ -5,11 +5,31 @@\n and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).\n \n ## [Unreleased]\n-[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.5...master\n+[Unreleased]: https://github.com/zellerlab/GECCO/compare/v0.9.6...master\n+\n+\n+## [v0.9.6] - 2023-01-11\n+[v0.9.6]: https://github.com/zellerlab/GECCO/compare/v0.9.5...v0.9.6\n+\n+### Added\n+- Gene Ontology annotations to `gecco.interpro` local metadata.\n+- Reference to Gene Ontology terms and derived functions to `gecco.model.Domain` objects.\n+- Gene color based on predicted function in `gecco.model.Gene.to_seq_feature`.\n+\n+### Fixed\n+- Missing `gzip` import in the CLI preventing usage of gzip-compressed inputs.\n+- Invalid coordinates of domains found in reverse-strand genes.\n+- Detection of entry points with `importlib.metadata` on older Python versions.\n+\n+### Changed\n+- `bgc_id` columns of cluster tables are renamed `cluster_id`.\n+- `gecco.model.ProductType` is renamed to `gecco.model.ClusterType`.\n+- Bumped `pyrodigal` dependency to `v2.0`.\n+- Bumped `pyhmmer` dependency to `v0.7`.\n \n \n ## [v0.9.5] - 2022-08-10\n-[v0.9.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.4...v0.9.5\n+[v0.9.5]: https://github.com/zellerlab/GECCO/compare/v0.9.4...v0.9.5\n \n ### Added\n - `gecco predict` command to predict BGCs from an annotated genome.\n@@ -21,7 +41,7 @@\n \n \n ## [v0.9.4] - 2022-05-31\n-[v0.9.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.3...v0.9.4\n+[v0.9.4]: https://github.com/zellerlab/GECCO/compare/v0.9.3...v0.9.4\n \n ### Added\n - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`.\n@@ -39,7 +59,7 @@\n \n \n ## [v0.9.3] - 2022-05-13\n-[v0.9.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.2...v0.9.3\n+[v0.9.3]: https://github.com/zellerlab/GECCO/compare/v0.9.2...v0.9.3\n \n ### Changed\n - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`.\n@@ -49,20 +69,20 @@\n \n \n ## [v0.9.2] - 2022-04-11\n-[v0.9.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1...v0.9.2\n+[v0.9.2]: https://github.com/zellerlab/GECCO/compare/v0.9.1...v0.9.2\n \n ### Added\n - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`.\n \n ## [v0.9.1] - 2022-04-05\n-[v0.9.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha4...v0.9.1\n+[v0.9.1]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha4...v0.9.1\n \n ### Changed\n - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window.\n - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`.\n \n ## [v0.9.1-alpha4] - 2022-03-31\n-[v0.9.1-alpha4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4\n+[v0.9.1-alpha4]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4\n \n Retrain internal model with:\n ```\n@@ -74,7 +94,7 @@\n ```\n \n ## [v0.9.1-alpha3] - 2022-03-23\n-[v0.9.1-alpha3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3\n+[v0.9.1-alpha3]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3\n \n ### Added\n - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains.\n@@ -85,33 +105,33 @@\n - `gecco train` expects a gene table instead of a GFF file for the gene coordinates.\n \n ## [v0.9.1-alpha2] - 2022-03-23\n-[v0.9.1-alpha2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2\n+[v0.9.1-alpha2]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2\n \n ### Fixed\n - `TypeClassifier.trained` not being able to read unknown types from type tables.\n \n ## [v0.9.1-alpha1] - 2022-03-20\n-[v0.9.1-alpha1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.10...v0.9.1-alpha1\n+[v0.9.1-alpha1]: https://github.co'..b'`gecco cv` now requires *training* dependencies.\n \n ## [v0.4.5] - 2020-11-23\n-[v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5\n+[v0.4.5]: https://github.com/zellerlab/GECCO/compare/v0.4.4...v0.4.5\n ### Added\n - Additional `fold` column to cross-validation table output.\n ### Changed\n@@ -309,7 +329,7 @@\n - `gecco.orf` was rewritten to extract genes from input sequences in parallel.\n \n ## [v0.4.4] - 2020-09-30\n-[v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4\n+[v0.4.4]: https://github.com/zellerlab/GECCO/compare/v0.4.3...v0.4.4\n ### Added\n - `gecco cv loto` command to run LOTO cross-validation using BGC types\n for stratification.\n@@ -325,7 +345,7 @@\n - Bumped `pandas` training dependency to `v1.0`.\n \n ## [v0.4.3] - 2020-09-07\n-[v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3\n+[v0.4.3]: https://github.com/zellerlab/GECCO/compare/v0.4.2...v0.4.3\n ### Fixed\n - GenBank files being written with invalid `/cds` feature type.\n ### Changed\n@@ -333,18 +353,18 @@\n and breaks the current code.\n \n ## [v0.4.2] - 2020-08-07\n-[v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2\n+[v0.4.2]: https://github.com/zellerlab/GECCO/compare/v0.4.1...v0.4.2\n ### Fixed\n - `TypeClassifier.predict_types` using inverse type probabilities when\n given several clusters to process.\n \n ## [v0.4.1] - 2020-08-07\n-[v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1\n+[v0.4.1]: https://github.com/zellerlab/GECCO/compare/v0.4.0...v0.4.1\n ### Fixed\n - `gecco run` command crashing on input sequences not containing any genes.\n \n ## [v0.4.0] - 2020-08-06\n-[v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0\n+[v0.4.0]: https://github.com/zellerlab/GECCO/compare/v0.3.0...v0.4.0\n ### Added\n - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC.\n ### Removed\n@@ -356,7 +376,7 @@\n table to know the types of the input BGCs.\n \n ## [v0.3.0] - 2020-08-03\n-[v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0\n+[v0.3.0]: https://github.com/zellerlab/GECCO/compare/v0.2.2...v0.3.0\n ### Changed\n - Replaced Nearest-Neighbours classifier with Random Forest to perform type\n prediction for candidate BGCs.\n@@ -367,7 +387,7 @@\n - `--metric` argument to the `gecco run` CLI command.\n \n ## [v0.2.2] - 2020-07-31\n-[v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2\n+[v0.2.2]: https://github.com/zellerlab/GECCO/compare/v0.2.1...v0.2.2\n ### Changed\n - `Domain` and `Gene` can now carry qualifiers that are used when they\n are translated to a sequence feature.\n@@ -376,7 +396,7 @@\n in GenBank output files.\n \n ## [v0.2.1] - 2020-07-23\n-[v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1\n+[v0.2.1]: https://github.com/zellerlab/GECCO/compare/v0.2.0...v0.2.1\n ### Fixed\n - Various potential crashes in `ClusterRefiner` code.\n ### Removed\n@@ -384,7 +404,7 @@\n Fisher Exact Test feature selection.\n \n ## [v0.2.0] - 2020-07-23\n-[v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0\n+[v0.2.0]: https://github.com/zellerlab/GECCO/compare/v0.1.1...v0.2.0\n ### Fixed\n - `pandas` warning about unsorted columns in `gecco run`.\n ### Removed\n@@ -397,7 +417,7 @@\n contain any domain annotation.\n \n ## [v0.1.1] - 2020-07-22\n-[v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1\n+[v0.1.1]: https://github.com/zellerlab/GECCO/compare/v0.1.0...v0.1.1\n ### Added\n - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`.\n ### Changed\n@@ -410,9 +430,9 @@\n - Included the `CHANGELOG.md` file to the generated docs.\n \n ## [v0.1.0] - 2020-07-17\n-[v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0\n+[v0.1.0]: https://github.com/zellerlab/GECCO/compare/v0.0.1...v0.1.0\n Initial release.\n \n ## [v0.0.1] - 2018-08-13\n-[v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1\n+[v0.0.1]: https://github.com/zellerlab/GECCO/compare/37afb97...v0.0.1\n Proof-of-concept.\n' |
b |
diff -r 3dd71eaa2909 -r cc91d730cc4f gecco.xml --- a/gecco.xml Wed Aug 10 12:36:38 2022 +0000 +++ b/gecco.xml Mon Jan 16 18:35:56 2023 +0000 |
[ |
@@ -1,8 +1,17 @@ <?xml version='1.0' encoding='utf-8'?> -<tool id="gecco" name="GECCO" version="0.9.1" python_template_version="3.5"> +<tool id="gecco" name="GECCO" version="0.9.6" python_template_version="3.5"> <description>is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).</description> + <creator> + <organization name="Zeller Team" url="https://www.embl.org/groups/zeller/"/> + </creator> + <edam_topics> + <edam_topic>topic_0080</edam_topic> + </edam_topics> + <edam_operations> + <edam_operation>operation_0415</edam_operation> + </edam_operations> <requirements> - <requirement type="package" version="0.9.1">gecco</requirement> + <requirement type="package" version="0.9.6">gecco</requirement> </requirements> <version_command>gecco --version</version_command> <command detect_errors="aggressive"><![CDATA[ @@ -34,6 +43,9 @@ #if $antismash_sideload: --antismash-sideload #end if + #unless $pad: + --no-pad + #end unless && mv input_tempfile.genes.tsv '$genes' && mv input_tempfile.features.tsv '$features' @@ -46,6 +58,7 @@ <inputs> <param name="input" type="data" format="genbank,fasta,embl" label="Sequence file in GenBank, EMBL or FASTA format"/> <param argument="--mask" type="boolean" checked="false" label="Enable masking of regions with unknown nucleotides when finding ORFs"/> + <param argument="--pad" type="boolean" checked="true" label="Enable padding of gene sequences smaller than the CRF window length"/> <param argument="--cds" type="integer" min="0" value="" optional="true" label="Minimum number of genes required for a cluster"/> <param argument="--threshold" type="float" min="0" max="1" value="" optional="true" label="Probability threshold for cluster detection"/> <param argument="--postproc" type="select" label="Post-processing method for gene cluster validation"> @@ -72,10 +85,10 @@ <output name="features" file="features.tsv"/> <output name="genes" file="genes.tsv"/> <output name="clusters" file="clusters.tsv"/> + <param name="edge_distance" value="10"/> </test> <test> <param name="input" value="BGC0001866.fna"/> - <param name="edge_distance" value="0"/> <output name="features" file="features.tsv"/> <output name="genes" file="genes.tsv"/> <output name="clusters" file="clusters.tsv"/> @@ -86,7 +99,6 @@ <test> <param name="input" value="BGC0001866.fna"/> <param name="antismash_sideload" value="True"/> - <param name="edge_distance" value="0"/> <output name="features" file="features.tsv"/> <output name="genes" file="genes.tsv"/> <output name="clusters" file="clusters.tsv"/> |
b |
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/BGC0001866.1_cluster_1.gbk --- a/test-data/BGC0001866.1_cluster_1.gbk Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/BGC0001866.1_cluster_1.gbk Mon Jan 16 18:35:56 2023 +0000 |
b |
b'@@ -1,4 +1,4 @@\n-LOCUS BGC0001866.1_cluster_1 32633 bp DNA linear UNK 06-APR-2022\n+LOCUS BGC0001866.1_cluster_1 32633 bp DNA linear UNK 16-JAN-2023\n DEFINITION BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome\n Unknown C8Q69scaffold_14, whole genome shotgun sequence.\n ACCESSION BGC0001866.1_cluster_1\n@@ -15,19 +15,19 @@\n JOURNAL bioRxiv (2021.05.03.442509)\n REMARK doi:10.1101/2021.05.03.442509\n COMMENT ##GECCO-Data-START##\n- version :: GECCO v0.9.1\n- creation_date :: 2022-04-06T01:08:36.965708\n- biosyn_class :: Polyketide\n- alkaloid_probability :: 0.010000000000000009\n- polyketide_probability :: 0.96\n- ripp_probability :: 0.0\n- saccharide_probability :: 0.0\n- terpene_probability :: 0.010000000000000009\n- nrp_probability :: 0.14\n+ version :: GECCO v0.9.6\n+ creation_date :: 2023-01-16T17:20:45.175113\n+ cluster_type :: Polyketide\n+ alkaloid_probability :: 0.010\n+ nrp_probability :: 0.140\n+ polyketide_probability :: 0.960\n+ ripp_probability :: 0.000\n+ saccharide_probability :: 0.000\n+ terpene_probability :: 0.010\n ##GECCO-Data-END##\n FEATURES Location/Qualifiers\n CDS complement(1..1143)\n- /inference="ab initio prediction:Prodigal:2.6"\n+ /inference="ab initio prediction:Pyrodigal:2.0.4"\n /transl_table=11\n /locus_tag="BGC0001866.1_1"\n /translation="MWIYEVDGHYIEPRRADTFLIWAGERYSAMIRLDKKPMDYSIRVP\n@@ -37,98 +37,134 @@\n QPESFNMVNPPYRDTFLTEFTGAMWVVLRYQVTSPGAWLLHCHFEMHLDNGMAMAILDG\n VDKWPHVPPEYTQGFHGFREHELPGPAGFWGLVSKILRPESLVWAGGAAVVLLSLFIGG\n LWRLWQRRMQGTYYVLSQEDERDRFSMDKEAWKSEETKRM*"\n- misc_feature 1..189\n+ /function="binding"\n+ /function="catalytic activity"\n+ /colour="129 14 21"\n+ /ApEinfo_fwdcolor="#810e15"\n+ /ApEinfo_revcolor="#810e15"\n+ misc_feature 955..1143\n /inference="protein motif"\n /db_xref="PFAM:PF00394"\n /db_xref="InterPro:IPR001117"\n /note="e-value: 2.262067179461254e-08"\n /note="p-value: 8.178117062405111e-12"\n- /function="Multicopper oxidase"\n+ /function="Multicopper oxidase, type 1"\n /standard_name="PF00394"\n- misc_feature 448..843\n+ misc_feature 301..696\n /inference="protein motif"\n /db_xref="PFAM:PF07731"\n /db_xref="InterPro:IPR011706"\n+ /db_xref="GO:0005507"\n+ /db_xref="GO:0016491"\n /note="e-value: 4.059222969454281e-23"\n /note="p-value: 1.467542649838858e-26"\n- /function="Multicopper oxidase"\n+ /function="Multicopper oxidase, C-terminal"\n /standard_name="PF07731"\n CDS 1179..1670\n- /inference="ab initio prediction:Prodigal:2.6"\n+ /inference="ab initio prediction:Pyrodigal:2.0.4"\n /transl_table=11\n /locus_tag="BGC0001866.1_2"\n /translation="MSSLRSSSHSPSGLPGQPRLPLLDRSREHSLPGDRAGWRTRSRLR\n ATDLLSMVRMGSTYTIIRDMNYTDDESPGRSPFVCDSVIRPALVHERDLLVNKPLMART\n IDAPFAVEKNTIDATDFISQSTRNVLISVHWNHTRSAVGCLHLLLYTGSSCSSPSQKAS\n *"\n+ /function="unknown"\n+ /colo'..b'ference="ab initio prediction:Prodigal:2.6"\n+ /inference="ab initio prediction:Pyrodigal:2.0.4"\n /transl_table=11\n /locus_tag="BGC0001866.1_22"\n /translation="MVIEKALMPLNAGPQLLRVTASLIWSEKEASVRFYSVDVRRPSSK\n@@ -505,16 +604,20 @@\n YRFNGPMAYNMVQALAEFHPDYRCIDETILDNETLEAACTVSFGNVKKEGVFHTHPGYI\n DGLTQSGGFVMNANDKTNLGVEVFVNHGWDSFQLYEPVTDDRSYQTHVRMRPAESNQWK\n GDVVVLSGENLVACVRGLTVSRET*"\n+ /function="unknown"\n+ /colour="128 128 128"\n+ /ApEinfo_fwdcolor="#808080"\n+ /ApEinfo_revcolor="#808080"\n misc_feature 29918..30535\n /inference="protein motif"\n /db_xref="PFAM:PF14765"\n /db_xref="InterPro:IPR020807"\n /note="e-value: 8.019334685871699e-11"\n /note="p-value: 2.8992533209948296e-14"\n- /function="Polyketide synthase dehydratase"\n+ /function="Polyketide synthase, dehydratase domain"\n /standard_name="PF14765"\n CDS 30591..32633\n- /inference="ab initio prediction:Prodigal:2.6"\n+ /inference="ab initio prediction:Pyrodigal:2.0.4"\n /transl_table=11\n /locus_tag="BGC0001866.1_23"\n /translation="MLTTFQIQGVPRRVLRYILQSSAKTTQTATSSVPAPSQAPVMVPQ\n@@ -529,13 +632,17 @@\n LAYRAAQILQKAAANPQKPVVESLLLLDSPPPTGLGKLPKHFFDYCDQIGIFGQGTAKA\n PEWLITHFQGTNSVLHEYHATPFSFGTAPRTGIIWASQTVFETRAVAPPPVRPDDTEDM\n KFLTERRTDFSAGSWGHMFPGTEVLIETAYGADHFSLLVSLLFRD*"\n+ /function="unknown"\n+ /colour="128 128 128"\n+ /ApEinfo_fwdcolor="#808080"\n+ /ApEinfo_revcolor="#808080"\n misc_feature 30789..30974\n /inference="protein motif"\n /db_xref="PFAM:PF00550"\n /db_xref="InterPro:IPR009081"\n /note="e-value: 6.066413293337807e-14"\n /note="p-value: 2.193207987468477e-17"\n- /function="Phosphopantetheine attachment site"\n+ /function="Phosphopantetheine binding ACP domain"\n /standard_name="PF00550"\n misc_feature 31110..31304\n /inference="protein motif"\n@@ -543,7 +650,7 @@\n /db_xref="InterPro:IPR009081"\n /note="e-value: 4.042537132792419e-10"\n /note="p-value: 1.461510170930014e-13"\n- /function="Phosphopantetheine attachment site"\n+ /function="Phosphopantetheine binding ACP domain"\n /standard_name="PF00550"\n misc_feature 31485..31670\n /inference="protein motif"\n@@ -551,15 +658,16 @@\n /db_xref="InterPro:IPR009081"\n /note="e-value: 1.4101442109719659e-08"\n /note="p-value: 5.098135252971677e-12"\n- /function="Phosphopantetheine attachment site"\n+ /function="Phosphopantetheine binding ACP domain"\n /standard_name="PF00550"\n misc_feature 31917..32240\n /inference="protein motif"\n /db_xref="PFAM:PF00975"\n /db_xref="InterPro:IPR001031"\n+ /db_xref="GO:0009058"\n /note="e-value: 6.91897478936856e-24"\n /note="p-value: 2.5014370171252933e-27"\n- /function="Thioesterase domain"\n+ /function="Thioesterase"\n /standard_name="PF00975"\n ORIGIN\n 1 ttacatccgc ttagtctcct cggacttcca tgcttccttg tccattgaga aacgatccct\n' |
b |
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/clusters.tsv --- a/test-data/clusters.tsv Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/clusters.tsv Mon Jan 16 18:35:56 2023 +0000 |
b |
@@ -1,2 +1,2 @@ -sequence_id bgc_id start end average_p max_p type alkaloid_probability polyketide_probability ripp_probability saccharide_probability terpene_probability nrp_probability proteins domains -BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9958958770931704 0.9999999976946022 Polyketide 0.010000000000000009 0.96 0.0 0.0 0.010000000000000009 0.14 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197 +sequence_id cluster_id start end average_p max_p type alkaloid_probability nrp_probability polyketide_probability ripp_probability saccharide_probability terpene_probability proteins domains +BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9958958770931705 0.9999999976946022 Polyketide 0.010000000000000009 0.14 0.96 0.0 0.0 0.010000000000000009 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197 |
b |
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/features.tsv --- a/test-data/features.tsv Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/features.tsv Mon Jan 16 18:35:56 2023 +0000 |
b |
@@ -1,4 +1,4 @@ -sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end bgc_probability +sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end cluster_probability BGC0001866.1 BGC0001866.1_1 347 1489 - PF00394 Pfam 2.262067179461254e-08 8.178117062405111e-12 1 63 0.9791890143072265 BGC0001866.1 BGC0001866.1_1 347 1489 - PF07731 Pfam 4.059222969454281e-23 1.467542649838858e-26 150 281 0.9791890143072265 BGC0001866.1 BGC0001866.1_6 3946 4389 + PF00891 Pfam 4.890642309934635e-16 1.7681280946979883e-19 17 121 0.9955095513800687 |
b |
diff -r 3dd71eaa2909 -r cc91d730cc4f test-data/sideload.json --- a/test-data/sideload.json Wed Aug 10 12:36:38 2022 +0000 +++ b/test-data/sideload.json Mon Jan 16 18:35:56 2023 +0000 |
b |
@@ -27,11 +27,13 @@ "e-filter": "None", "edge-distance": "0", "mask": "False", + "no-pad": "False", + "p-filter": "1e-09", "postproc": "'gecco'", "threshold": "0.8" }, "description": "Biosynthetic Gene Cluster prediction with Conditional Random Fields.", "name": "GECCO", - "version": "0.9.1" + "version": "0.9.6" } } \ No newline at end of file |