Mercurial > repos > althonos > gecco
diff CHANGELOG.md @ 19:cc91d730cc4f draft
Fix syntax of Galaxy script for GECCO
author | althonos |
---|---|
date | Mon, 16 Jan 2023 18:35:56 +0000 |
parents | 3dd71eaa2909 |
children | 6ba37b7dea42 |
line wrap: on
line diff
--- a/CHANGELOG.md Wed Aug 10 12:36:38 2022 +0000 +++ b/CHANGELOG.md Mon Jan 16 18:35:56 2023 +0000 @@ -5,11 +5,31 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). ## [Unreleased] -[Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.5...master +[Unreleased]: https://github.com/zellerlab/GECCO/compare/v0.9.6...master + + +## [v0.9.6] - 2023-01-11 +[v0.9.6]: https://github.com/zellerlab/GECCO/compare/v0.9.5...v0.9.6 + +### Added +- Gene Ontology annotations to `gecco.interpro` local metadata. +- Reference to Gene Ontology terms and derived functions to `gecco.model.Domain` objects. +- Gene color based on predicted function in `gecco.model.Gene.to_seq_feature`. + +### Fixed +- Missing `gzip` import in the CLI preventing usage of gzip-compressed inputs. +- Invalid coordinates of domains found in reverse-strand genes. +- Detection of entry points with `importlib.metadata` on older Python versions. + +### Changed +- `bgc_id` columns of cluster tables are renamed `cluster_id`. +- `gecco.model.ProductType` is renamed to `gecco.model.ClusterType`. +- Bumped `pyrodigal` dependency to `v2.0`. +- Bumped `pyhmmer` dependency to `v0.7`. ## [v0.9.5] - 2022-08-10 -[v0.9.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.4...v0.9.5 +[v0.9.5]: https://github.com/zellerlab/GECCO/compare/v0.9.4...v0.9.5 ### Added - `gecco predict` command to predict BGCs from an annotated genome. @@ -21,7 +41,7 @@ ## [v0.9.4] - 2022-05-31 -[v0.9.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.3...v0.9.4 +[v0.9.4]: https://github.com/zellerlab/GECCO/compare/v0.9.3...v0.9.4 ### Added - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`. @@ -39,7 +59,7 @@ ## [v0.9.3] - 2022-05-13 -[v0.9.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.2...v0.9.3 +[v0.9.3]: https://github.com/zellerlab/GECCO/compare/v0.9.2...v0.9.3 ### Changed - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`. @@ -49,20 +69,20 @@ ## [v0.9.2] - 2022-04-11 -[v0.9.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1...v0.9.2 +[v0.9.2]: https://github.com/zellerlab/GECCO/compare/v0.9.1...v0.9.2 ### Added - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`. ## [v0.9.1] - 2022-04-05 -[v0.9.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha4...v0.9.1 +[v0.9.1]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha4...v0.9.1 ### Changed - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window. - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`. ## [v0.9.1-alpha4] - 2022-03-31 -[v0.9.1-alpha4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4 +[v0.9.1-alpha4]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4 Retrain internal model with: ``` @@ -74,7 +94,7 @@ ``` ## [v0.9.1-alpha3] - 2022-03-23 -[v0.9.1-alpha3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3 +[v0.9.1-alpha3]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3 ### Added - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains. @@ -85,33 +105,33 @@ - `gecco train` expects a gene table instead of a GFF file for the gene coordinates. ## [v0.9.1-alpha2] - 2022-03-23 -[v0.9.1-alpha2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2 +[v0.9.1-alpha2]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2 ### Fixed - `TypeClassifier.trained` not being able to read unknown types from type tables. ## [v0.9.1-alpha1] - 2022-03-20 -[v0.9.1-alpha1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.10...v0.9.1-alpha1 +[v0.9.1-alpha1]: https://github.com/zellerlab/GECCO/compare/v0.8.10...v0.9.1-alpha1 Candidate release with support for a sliding window in the CRF prediction algorithm. ## [v0.8.10] - 2022-02-23 -[v0.8.10]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.9...v0.8.10 +[v0.8.10]: https://github.com/zellerlab/GECCO/compare/v0.8.9...v0.8.10 ### Fixed - `--antismash-sideload` flag of `gecco run` causing command to crash. ## [v0.8.9] - 2022-02-22 -[v0.8.9]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.8...v0.8.9 +[v0.8.9]: https://github.com/zellerlab/GECCO/compare/v0.8.8...v0.8.9 ### Removed - Prediction and support for the *Other* biosynthetic type of MIBiG clusters. ## [v0.8.8] - 2022-02-21 -[v0.8.8]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.7...v0.8.8 +[v0.8.8]: https://github.com/zellerlab/GECCO/compare/v0.8.7...v0.8.8 ### Fixed - `ClusterRefiner` filtering method for edge genes not working as intended. - `gecco run` and `gecco annotate` commands crashing on missing input files instead of nicely rendering the error. ## [v0.8.7] - 2022-02-18 -[v0.8.7]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.6...v0.8.7 +[v0.8.7]: https://github.com/zellerlab/GECCO/compare/v0.8.6...v0.8.7 ### Fixed - `interpro.json` metadata file not being included in distribution files. - Missing docstring for `Protein.with_domains` method. @@ -119,7 +139,7 @@ - Bump minimum `scikit-learn` version to `v1.0` for Python3.7+. ## [v0.8.6] - 2022-02-17 - YANKED -[v0.8.6]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.5...v0.8.6 +[v0.8.6]: https://github.com/zellerlab/GECCO/compare/v0.8.5...v0.8.6 ### Added - CLI flag for enabling region masking for contigs processed by Prodigal. - CLI flag for controlling region distance used for edge distance filtering. @@ -133,12 +153,12 @@ - Progress bar messages are now in consistent format. ## [v0.8.5] - 2021-11-21 -[v0.8.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.4...v0.8.5 +[v0.8.5]: https://github.com/zellerlab/GECCO/compare/v0.8.4...v0.8.5 ### Added - Minimal compatibility support for running GECCO inside of Galaxy workflows. ## [v0.8.4] - 2021-09-26 -[v0.8.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...v0.8.4 +[v0.8.4]: https://github.com/zellerlab/GECCO/compare/v0.8.3-post1...v0.8.4 ### Fixed - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)). - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input. @@ -146,17 +166,17 @@ - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported. ## [v0.8.3-post1] - 2021-08-23 -[v0.8.3-post1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3...v0.8.3-post1 +[v0.8.3-post1]: https://github.com/zellerlab/GECCO/compare/v0.8.3...v0.8.3-post1 ### Fixed - Wrong default value for `--threshold` being shown in `gecco run` help message. ## [v0.8.3] - 2021-08-23 -[v0.8.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.2...v0.8.3 +[v0.8.3]: https://github.com/zellerlab/GECCO/compare/v0.8.2...v0.8.3 ### Changed - Default probability threshold for segmentation to 0.3 (from 0.4). ## [v0.8.2] - 2021-07-31 -[v0.8.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.1...v0.8.2 +[v0.8.2]: https://github.com/zellerlab/GECCO/compare/v0.8.1...v0.8.2 ### Fixed - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class. ### Changed @@ -164,7 +184,7 @@ - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier. ## [v0.8.1] - 2021-07-29 -[v0.8.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.0...v0.8.1 +[v0.8.1]: https://github.com/zellerlab/GECCO/compare/v0.8.0...v0.8.1 ### Changed - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`. ### Fixed @@ -173,7 +193,7 @@ - Missing documentation for the `strand` attribute of `gecco.model.Gene`. ## [v0.8.0] - 2021-07-03 -[v0.8.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.7.0...v0.8.0 +[v0.8.0]: https://github.com/zellerlab/GECCO/compare/v0.7.0...v0.8.0 ### Changed - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0. - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling. @@ -195,7 +215,7 @@ - Tigrfam domains, which is not improving performance on the new training data. ## [v0.7.0] - 2021-05-31 -[v0.7.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.3...v0.7.0 +[v0.7.0]: https://github.com/zellerlab/GECCO/compare/v0.6.3...v0.7.0 ### Added - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow. - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand. @@ -207,7 +227,7 @@ - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command. ## [v0.6.3] - 2021-05-10 -[v0.6.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.2...v0.6.3 +[v0.6.3]: https://github.com/zellerlab/GECCO/compare/v0.6.2...v0.6.3 ### Fixed - HMMER annotation not properly handling inputs with multiple contigs. - Some progress bar totals displaying as floats in the CLI. @@ -218,7 +238,7 @@ - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable. ## [v0.6.2] - 2021-05-04 -[v0.6.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.1...v0.6.2 +[v0.6.2]: https://github.com/zellerlab/GECCO/compare/v0.6.1...v0.6.2 ### Fixed - `gecco cv loto` crashing because of outdated code. ### Changed @@ -227,7 +247,7 @@ - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record. ## [v0.6.1] - 2021-03-15 -[v0.6.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.0...v0.6.1 +[v0.6.1]: https://github.com/zellerlab/GECCO/compare/v0.6.0...v0.6.1 ### Fixed - Progress bar not being disabled by `-q` flag in CLI. - Fallback to using HMM name if accession is not available in `PyHMMER`. @@ -239,7 +259,7 @@ - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`. ## [v0.6.0] - 2021-02-28 -[v0.6.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.5...v0.6.0 +[v0.6.0]: https://github.com/zellerlab/GECCO/compare/v0.5.5...v0.6.0 ### Changed - Updated internal model with a cleaned-up version of the MIBiG-2.0 Pfam-33.1/Tigrfam-15.0 embedding. @@ -250,12 +270,12 @@ protein IDs. ## [v0.5.5] - 2021-02-28 -[v0.5.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.4...v0.5.5 +[v0.5.5]: https://github.com/zellerlab/GECCO/compare/v0.5.4...v0.5.5 ### Fixed - `gecco cv` bug causing only the last fold to be written. ## [v0.5.4] - 2021-02-28 -[v0.5.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.3...v0.5.4 +[v0.5.4]: https://github.com/zellerlab/GECCO/compare/v0.5.3...v0.5.4 ### Changed - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`. ### Removed @@ -265,7 +285,7 @@ - `gecco embed` to embed BGCs into non-BGC regions using feature tables. ## [v0.5.3] - 2021-02-21 -[v0.5.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.2...v0.5.3 +[v0.5.3]: https://github.com/zellerlab/GECCO/compare/v0.5.2...v0.5.3 ### Fixed - Coordinates of genes in output GenBank files. - Potential issue with the number of CPUs in `PyHMMER.run`. @@ -273,7 +293,7 @@ - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow. ## [v0.5.2] - 2021-01-29 -[v0.5.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.1...v0.5.2 +[v0.5.2]: https://github.com/zellerlab/GECCO/compare/v0.5.1...v0.5.2 ### Added - Support for downloading HMM files directly from GitHub releases assets. - Validation of filtered HMMs with MD5 checksum. @@ -284,13 +304,13 @@ - Bump required `pyhmmer` version to `v0.2.1`. ## [v0.5.1] - 2021-01-15 -[v0.5.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.0...v0.5.1 +[v0.5.1]: https://github.com/zellerlab/GECCO/compare/v0.5.0...v0.5.1 ### Fixed - `--hmm` flag being ignored in in `gecco run` command. - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs. ## [v0.5.0] - 2021-01-11 -[v0.5.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.5...v0.5.0 +[v0.5.0]: https://github.com/zellerlab/GECCO/compare/v0.4.5...v0.5.0 ### Added - Explicit support for Python 3.9. ### Changed @@ -300,7 +320,7 @@ - `gecco cv` now requires *training* dependencies. ## [v0.4.5] - 2020-11-23 -[v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5 +[v0.4.5]: https://github.com/zellerlab/GECCO/compare/v0.4.4...v0.4.5 ### Added - Additional `fold` column to cross-validation table output. ### Changed @@ -309,7 +329,7 @@ - `gecco.orf` was rewritten to extract genes from input sequences in parallel. ## [v0.4.4] - 2020-09-30 -[v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4 +[v0.4.4]: https://github.com/zellerlab/GECCO/compare/v0.4.3...v0.4.4 ### Added - `gecco cv loto` command to run LOTO cross-validation using BGC types for stratification. @@ -325,7 +345,7 @@ - Bumped `pandas` training dependency to `v1.0`. ## [v0.4.3] - 2020-09-07 -[v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3 +[v0.4.3]: https://github.com/zellerlab/GECCO/compare/v0.4.2...v0.4.3 ### Fixed - GenBank files being written with invalid `/cds` feature type. ### Changed @@ -333,18 +353,18 @@ and breaks the current code. ## [v0.4.2] - 2020-08-07 -[v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2 +[v0.4.2]: https://github.com/zellerlab/GECCO/compare/v0.4.1...v0.4.2 ### Fixed - `TypeClassifier.predict_types` using inverse type probabilities when given several clusters to process. ## [v0.4.1] - 2020-08-07 -[v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1 +[v0.4.1]: https://github.com/zellerlab/GECCO/compare/v0.4.0...v0.4.1 ### Fixed - `gecco run` command crashing on input sequences not containing any genes. ## [v0.4.0] - 2020-08-06 -[v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0 +[v0.4.0]: https://github.com/zellerlab/GECCO/compare/v0.3.0...v0.4.0 ### Added - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC. ### Removed @@ -356,7 +376,7 @@ table to know the types of the input BGCs. ## [v0.3.0] - 2020-08-03 -[v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0 +[v0.3.0]: https://github.com/zellerlab/GECCO/compare/v0.2.2...v0.3.0 ### Changed - Replaced Nearest-Neighbours classifier with Random Forest to perform type prediction for candidate BGCs. @@ -367,7 +387,7 @@ - `--metric` argument to the `gecco run` CLI command. ## [v0.2.2] - 2020-07-31 -[v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2 +[v0.2.2]: https://github.com/zellerlab/GECCO/compare/v0.2.1...v0.2.2 ### Changed - `Domain` and `Gene` can now carry qualifiers that are used when they are translated to a sequence feature. @@ -376,7 +396,7 @@ in GenBank output files. ## [v0.2.1] - 2020-07-23 -[v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1 +[v0.2.1]: https://github.com/zellerlab/GECCO/compare/v0.2.0...v0.2.1 ### Fixed - Various potential crashes in `ClusterRefiner` code. ### Removed @@ -384,7 +404,7 @@ Fisher Exact Test feature selection. ## [v0.2.0] - 2020-07-23 -[v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0 +[v0.2.0]: https://github.com/zellerlab/GECCO/compare/v0.1.1...v0.2.0 ### Fixed - `pandas` warning about unsorted columns in `gecco run`. ### Removed @@ -397,7 +417,7 @@ contain any domain annotation. ## [v0.1.1] - 2020-07-22 -[v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1 +[v0.1.1]: https://github.com/zellerlab/GECCO/compare/v0.1.0...v0.1.1 ### Added - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`. ### Changed @@ -410,9 +430,9 @@ - Included the `CHANGELOG.md` file to the generated docs. ## [v0.1.0] - 2020-07-17 -[v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0 +[v0.1.0]: https://github.com/zellerlab/GECCO/compare/v0.0.1...v0.1.0 Initial release. ## [v0.0.1] - 2018-08-13 -[v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1 +[v0.0.1]: https://github.com/zellerlab/GECCO/compare/37afb97...v0.0.1 Proof-of-concept.