comparison CHANGELOG.md @ 19:cc91d730cc4f draft

Fix syntax of Galaxy script for GECCO
author althonos
date Mon, 16 Jan 2023 18:35:56 +0000
parents 3dd71eaa2909
children 6ba37b7dea42
comparison
equal deleted inserted replaced
18:3dd71eaa2909 19:cc91d730cc4f
3 3
4 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) 4 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
5 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). 5 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
6 6
7 ## [Unreleased] 7 ## [Unreleased]
8 [Unreleased]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.5...master 8 [Unreleased]: https://github.com/zellerlab/GECCO/compare/v0.9.6...master
9
10
11 ## [v0.9.6] - 2023-01-11
12 [v0.9.6]: https://github.com/zellerlab/GECCO/compare/v0.9.5...v0.9.6
13
14 ### Added
15 - Gene Ontology annotations to `gecco.interpro` local metadata.
16 - Reference to Gene Ontology terms and derived functions to `gecco.model.Domain` objects.
17 - Gene color based on predicted function in `gecco.model.Gene.to_seq_feature`.
18
19 ### Fixed
20 - Missing `gzip` import in the CLI preventing usage of gzip-compressed inputs.
21 - Invalid coordinates of domains found in reverse-strand genes.
22 - Detection of entry points with `importlib.metadata` on older Python versions.
23
24 ### Changed
25 - `bgc_id` columns of cluster tables are renamed `cluster_id`.
26 - `gecco.model.ProductType` is renamed to `gecco.model.ClusterType`.
27 - Bumped `pyrodigal` dependency to `v2.0`.
28 - Bumped `pyhmmer` dependency to `v0.7`.
9 29
10 30
11 ## [v0.9.5] - 2022-08-10 31 ## [v0.9.5] - 2022-08-10
12 [v0.9.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.4...v0.9.5 32 [v0.9.5]: https://github.com/zellerlab/GECCO/compare/v0.9.4...v0.9.5
13 33
14 ### Added 34 ### Added
15 - `gecco predict` command to predict BGCs from an annotated genome. 35 - `gecco predict` command to predict BGCs from an annotated genome.
16 - `Protein.with_seq` function to assign a new sequence to a protein object. 36 - `Protein.with_seq` function to assign a new sequence to a protein object.
17 37
19 - Issue with antiSMASH sideload JSON file generation in `gecco run` and `gecco predict`. 39 - Issue with antiSMASH sideload JSON file generation in `gecco run` and `gecco predict`.
20 - Make `gecco.orf` handle STOP codons consistently ([#9](https://github.com/zellerlab/GECCO/issues/9)). 40 - Make `gecco.orf` handle STOP codons consistently ([#9](https://github.com/zellerlab/GECCO/issues/9)).
21 41
22 42
23 ## [v0.9.4] - 2022-05-31 43 ## [v0.9.4] - 2022-05-31
24 [v0.9.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.3...v0.9.4 44 [v0.9.4]: https://github.com/zellerlab/GECCO/compare/v0.9.3...v0.9.4
25 45
26 ### Added 46 ### Added
27 - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`. 47 - `classes_` property to `TypeClassifier` to access the `classes_` attribute of the `TypeBinarizer`.
28 - Alternative ORF finder `CDSFinder` which simply extracts CDS features from input sequences ([#8](https://github.com/zellerlab/GECCO/issues/8)). 48 - Alternative ORF finder `CDSFinder` which simply extracts CDS features from input sequences ([#8](https://github.com/zellerlab/GECCO/issues/8)).
29 - Support for annotating domains with "exclusive" HMMs to annotate genes with *at most* one HMM from the library. 49 - Support for annotating domains with "exclusive" HMMs to annotate genes with *at most* one HMM from the library.
37 ### Fixed 57 ### Fixed
38 - Broken MyPy type annotations in the `gecco.model` and `gecco.cli` modules. 58 - Broken MyPy type annotations in the `gecco.model` and `gecco.cli` modules.
39 59
40 60
41 ## [v0.9.3] - 2022-05-13 61 ## [v0.9.3] - 2022-05-13
42 [v0.9.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.2...v0.9.3 62 [v0.9.3]: https://github.com/zellerlab/GECCO/compare/v0.9.2...v0.9.3
43 63
44 ### Changed 64 ### Changed
45 - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`. 65 - `--format` flag of `gecco annotate` and `gecco run` CLI commands is now made lowercase before giving value to `Bio.SeqIO`.
46 66
47 ### Fixed 67 ### Fixed
48 - Genes with duplicate IDs being silently ignored in `HMMER.run`. 68 - Genes with duplicate IDs being silently ignored in `HMMER.run`.
49 69
50 70
51 ## [v0.9.2] - 2022-04-11 71 ## [v0.9.2] - 2022-04-11
52 [v0.9.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1...v0.9.2 72 [v0.9.2]: https://github.com/zellerlab/GECCO/compare/v0.9.1...v0.9.2
53 73
54 ### Added 74 ### Added
55 - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`. 75 - Padding of short sequences with empty genes when predicting probabilities in `ClusterCRF`.
56 76
57 ## [v0.9.1] - 2022-04-05 77 ## [v0.9.1] - 2022-04-05
58 [v0.9.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha4...v0.9.1 78 [v0.9.1]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha4...v0.9.1
59 79
60 ### Changed 80 ### Changed
61 - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window. 81 - Make the `genes.tsv` and `features.tsv` table contain all genes even when they come from a contig too short to be processed by the CRF sliding window.
62 - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`. 82 - Replaced the `--force-clusters-tsv` flag with a `--force-tsv` flag to force writing TSV tables even when no genes or clusters were found in `gecco run` or `gecco annotate`.
63 83
64 ## [v0.9.1-alpha4] - 2022-03-31 84 ## [v0.9.1-alpha4] - 2022-03-31
65 [v0.9.1-alpha4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4 85 [v0.9.1-alpha4]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha3...v0.9.1-alpha4
66 86
67 Retrain internal model with: 87 Retrain internal model with:
68 ``` 88 ```
69 $ python -m gecco -vv train --c1 0.4 --c2 0 --select 0.25 --window-size 20 \ 89 $ python -m gecco -vv train --c1 0.4 --c2 0 --select 0.25 --window-size 20 \
70 -f mibig-2.0.proG2.Pfam-v35.0.features.tsv \ 90 -f mibig-2.0.proG2.Pfam-v35.0.features.tsv \
72 -g GECCO-data/data/embeddings/mibig-2.0.proG2.genes.tsv \ 92 -g GECCO-data/data/embeddings/mibig-2.0.proG2.genes.tsv \
73 -o models/v0.9.1-alpha4 93 -o models/v0.9.1-alpha4
74 ``` 94 ```
75 95
76 ## [v0.9.1-alpha3] - 2022-03-23 96 ## [v0.9.1-alpha3] - 2022-03-23
77 [v0.9.1-alpha3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3 97 [v0.9.1-alpha3]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha2...v0.9.1-alpha3
78 98
79 ### Added 99 ### Added
80 - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains. 100 - `gecco.model.GeneTable` class to store gene coordinates independently of protein domains.
81 101
82 ### Changed 102 ### Changed
83 - Refactored implementation of `load` and `dump` methods for `Table` classes into a dedicated base class. 103 - Refactored implementation of `load` and `dump` methods for `Table` classes into a dedicated base class.
84 - `gecco run` and `gecco annotate` now output a gene table in addition to the feature and cluster tables. 104 - `gecco run` and `gecco annotate` now output a gene table in addition to the feature and cluster tables.
85 - `gecco train` expects a gene table instead of a GFF file for the gene coordinates. 105 - `gecco train` expects a gene table instead of a GFF file for the gene coordinates.
86 106
87 ## [v0.9.1-alpha2] - 2022-03-23 107 ## [v0.9.1-alpha2] - 2022-03-23
88 [v0.9.1-alpha2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2 108 [v0.9.1-alpha2]: https://github.com/zellerlab/GECCO/compare/v0.9.1-alpha1...v0.9.1-alpha2
89 109
90 ### Fixed 110 ### Fixed
91 - `TypeClassifier.trained` not being able to read unknown types from type tables. 111 - `TypeClassifier.trained` not being able to read unknown types from type tables.
92 112
93 ## [v0.9.1-alpha1] - 2022-03-20 113 ## [v0.9.1-alpha1] - 2022-03-20
94 [v0.9.1-alpha1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.10...v0.9.1-alpha1 114 [v0.9.1-alpha1]: https://github.com/zellerlab/GECCO/compare/v0.8.10...v0.9.1-alpha1
95 Candidate release with support for a sliding window in the CRF prediction algorithm. 115 Candidate release with support for a sliding window in the CRF prediction algorithm.
96 116
97 ## [v0.8.10] - 2022-02-23 117 ## [v0.8.10] - 2022-02-23
98 [v0.8.10]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.9...v0.8.10 118 [v0.8.10]: https://github.com/zellerlab/GECCO/compare/v0.8.9...v0.8.10
99 ### Fixed 119 ### Fixed
100 - `--antismash-sideload` flag of `gecco run` causing command to crash. 120 - `--antismash-sideload` flag of `gecco run` causing command to crash.
101 121
102 ## [v0.8.9] - 2022-02-22 122 ## [v0.8.9] - 2022-02-22
103 [v0.8.9]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.8...v0.8.9 123 [v0.8.9]: https://github.com/zellerlab/GECCO/compare/v0.8.8...v0.8.9
104 ### Removed 124 ### Removed
105 - Prediction and support for the *Other* biosynthetic type of MIBiG clusters. 125 - Prediction and support for the *Other* biosynthetic type of MIBiG clusters.
106 126
107 ## [v0.8.8] - 2022-02-21 127 ## [v0.8.8] - 2022-02-21
108 [v0.8.8]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.7...v0.8.8 128 [v0.8.8]: https://github.com/zellerlab/GECCO/compare/v0.8.7...v0.8.8
109 ### Fixed 129 ### Fixed
110 - `ClusterRefiner` filtering method for edge genes not working as intended. 130 - `ClusterRefiner` filtering method for edge genes not working as intended.
111 - `gecco run` and `gecco annotate` commands crashing on missing input files instead of nicely rendering the error. 131 - `gecco run` and `gecco annotate` commands crashing on missing input files instead of nicely rendering the error.
112 132
113 ## [v0.8.7] - 2022-02-18 133 ## [v0.8.7] - 2022-02-18
114 [v0.8.7]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.6...v0.8.7 134 [v0.8.7]: https://github.com/zellerlab/GECCO/compare/v0.8.6...v0.8.7
115 ### Fixed 135 ### Fixed
116 - `interpro.json` metadata file not being included in distribution files. 136 - `interpro.json` metadata file not being included in distribution files.
117 - Missing docstring for `Protein.with_domains` method. 137 - Missing docstring for `Protein.with_domains` method.
118 ### Changed 138 ### Changed
119 - Bump minimum `scikit-learn` version to `v1.0` for Python3.7+. 139 - Bump minimum `scikit-learn` version to `v1.0` for Python3.7+.
120 140
121 ## [v0.8.6] - 2022-02-17 - YANKED 141 ## [v0.8.6] - 2022-02-17 - YANKED
122 [v0.8.6]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.5...v0.8.6 142 [v0.8.6]: https://github.com/zellerlab/GECCO/compare/v0.8.5...v0.8.6
123 ### Added 143 ### Added
124 - CLI flag for enabling region masking for contigs processed by Prodigal. 144 - CLI flag for enabling region masking for contigs processed by Prodigal.
125 - CLI flag for controlling region distance used for edge distance filtering. 145 - CLI flag for controlling region distance used for edge distance filtering.
126 ### Changed 146 ### Changed
127 - `gecco.model.Gene` and `gecco.model.Protein` are now immutable data classes. 147 - `gecco.model.Gene` and `gecco.model.Protein` are now immutable data classes.
131 ### Fixed 151 ### Fixed
132 - Mark `BGC0000930` as `Terpene` in the type classifier data. 152 - Mark `BGC0000930` as `Terpene` in the type classifier data.
133 - Progress bar messages are now in consistent format. 153 - Progress bar messages are now in consistent format.
134 154
135 ## [v0.8.5] - 2021-11-21 155 ## [v0.8.5] - 2021-11-21
136 [v0.8.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.4...v0.8.5 156 [v0.8.5]: https://github.com/zellerlab/GECCO/compare/v0.8.4...v0.8.5
137 ### Added 157 ### Added
138 - Minimal compatibility support for running GECCO inside of Galaxy workflows. 158 - Minimal compatibility support for running GECCO inside of Galaxy workflows.
139 159
140 ## [v0.8.4] - 2021-09-26 160 ## [v0.8.4] - 2021-09-26
141 [v0.8.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3-post1...v0.8.4 161 [v0.8.4]: https://github.com/zellerlab/GECCO/compare/v0.8.3-post1...v0.8.4
142 ### Fixed 162 ### Fixed
143 - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)). 163 - `gecco convert gbk --format bigslice` failing to run because of outdated code ([#5](https://github.com/zellerlab/GECCO/issues/5)).
144 - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input. 164 - `gecco convert gbk --format bigslice` not creating files with names conforming to BiG-SLiCE expected input.
145 ### Changed 165 ### Changed
146 - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported. 166 - Bump minimum `pyrodigal` version to `v0.6.2` to use platform-accelerated code if supported.
147 167
148 ## [v0.8.3-post1] - 2021-08-23 168 ## [v0.8.3-post1] - 2021-08-23
149 [v0.8.3-post1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.3...v0.8.3-post1 169 [v0.8.3-post1]: https://github.com/zellerlab/GECCO/compare/v0.8.3...v0.8.3-post1
150 ### Fixed 170 ### Fixed
151 - Wrong default value for `--threshold` being shown in `gecco run` help message. 171 - Wrong default value for `--threshold` being shown in `gecco run` help message.
152 172
153 ## [v0.8.3] - 2021-08-23 173 ## [v0.8.3] - 2021-08-23
154 [v0.8.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.2...v0.8.3 174 [v0.8.3]: https://github.com/zellerlab/GECCO/compare/v0.8.2...v0.8.3
155 ### Changed 175 ### Changed
156 - Default probability threshold for segmentation to 0.3 (from 0.4). 176 - Default probability threshold for segmentation to 0.3 (from 0.4).
157 177
158 ## [v0.8.2] - 2021-07-31 178 ## [v0.8.2] - 2021-07-31
159 [v0.8.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.1...v0.8.2 179 [v0.8.2]: https://github.com/zellerlab/GECCO/compare/v0.8.1...v0.8.2
160 ### Fixed 180 ### Fixed
161 - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class. 181 - `gecco run` crashing on Python 3.6 because of missing `contextlib.nullcontext` class.
162 ### Changed 182 ### Changed
163 - `gecco run` and `gecco annotate` will not try to count the number of profiles when given an external HMM file with the `--hmm` flag. 183 - `gecco run` and `gecco annotate` will not try to count the number of profiles when given an external HMM file with the `--hmm` flag.
164 - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier. 184 - `PyHMMER.run` now reports the *p-value* of each domain in addition to the *e-value* as a `/note` qualifier.
165 185
166 ## [v0.8.1] - 2021-07-29 186 ## [v0.8.1] - 2021-07-29
167 [v0.8.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.8.0...v0.8.1 187 [v0.8.1]: https://github.com/zellerlab/GECCO/compare/v0.8.0...v0.8.1
168 ### Changed 188 ### Changed
169 - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`. 189 - `gecco run` now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom `--model`.
170 ### Fixed 190 ### Fixed
171 - `gecco` reporting about using Pfam `v33.1` while actually using `v34.0` because of an outdated field in `gecco/hmmer/Pfam.ini`. 191 - `gecco` reporting about using Pfam `v33.1` while actually using `v34.0` because of an outdated field in `gecco/hmmer/Pfam.ini`.
172 ### Added 192 ### Added
173 - Missing documentation for the `strand` attribute of `gecco.model.Gene`. 193 - Missing documentation for the `strand` attribute of `gecco.model.Gene`.
174 194
175 ## [v0.8.0] - 2021-07-03 195 ## [v0.8.0] - 2021-07-03
176 [v0.8.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.7.0...v0.8.0 196 [v0.8.0]: https://github.com/zellerlab/GECCO/compare/v0.7.0...v0.8.0
177 ### Changed 197 ### Changed
178 - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0. 198 - Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0.
179 - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling. 199 - Bump minimum `pyhmmer` version to `v0.4.0` to improve exception handling.
180 - Bump minimum `pyrodigal` version to `v0.5.0` to fix sequence decoding on some platforms. 200 - Bump minimum `pyrodigal` version to `v0.5.0` to fix sequence decoding on some platforms.
181 - Use p-values instead of e-values to filter domains obtained with HMMER. 201 - Use p-values instead of e-values to filter domains obtained with HMMER.
193 - Outdated `gecco embed` command. 213 - Outdated `gecco embed` command.
194 - Unused `--truncate` flag from the `gecco train` CLI. 214 - Unused `--truncate` flag from the `gecco train` CLI.
195 - Tigrfam domains, which is not improving performance on the new training data. 215 - Tigrfam domains, which is not improving performance on the new training data.
196 216
197 ## [v0.7.0] - 2021-05-31 217 ## [v0.7.0] - 2021-05-31
198 [v0.7.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.3...v0.7.0 218 [v0.7.0]: https://github.com/zellerlab/GECCO/compare/v0.6.3...v0.7.0
199 ### Added 219 ### Added
200 - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow. 220 - Support for writing an AntiSMASH sideload JSON file after a `gecco run` workflow.
201 - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand. 221 - Code for converting GenBank files in BiG-SLiCE compatible format with the `gecco convert` subcommand.
202 - Documentation about using GECCO in combination with AntiSMASH or BiG-SLiCE. 222 - Documentation about using GECCO in combination with AntiSMASH or BiG-SLiCE.
203 ### Changed 223 ### Changed
205 - Internal domain composition shipped in the `gecco.types` with newer composition array obtained directly from MIBiG files. 225 - Internal domain composition shipped in the `gecco.types` with newer composition array obtained directly from MIBiG files.
206 ### Removed 226 ### Removed
207 - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command. 227 - Outdated notice about `-vvv` verbosity level in the help message of the main `gecco` command.
208 228
209 ## [v0.6.3] - 2021-05-10 229 ## [v0.6.3] - 2021-05-10
210 [v0.6.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.2...v0.6.3 230 [v0.6.3]: https://github.com/zellerlab/GECCO/compare/v0.6.2...v0.6.3
211 ### Fixed 231 ### Fixed
212 - HMMER annotation not properly handling inputs with multiple contigs. 232 - HMMER annotation not properly handling inputs with multiple contigs.
213 - Some progress bar totals displaying as floats in the CLI. 233 - Some progress bar totals displaying as floats in the CLI.
214 ### Changed 234 ### Changed
215 - `PyHMMER` now sets the `Z` and `domZ` values from the number of proteins given to the search pipeline. 235 - `PyHMMER` now sets the `Z` and `domZ` values from the number of proteins given to the search pipeline.
216 - `gecco.cli` delegates imports to make CLI more responsive. 236 - `gecco.cli` delegates imports to make CLI more responsive.
217 - `pkg_resources` has been replaced with `importlib.resources` and `importlib.metadata` where applicable. 237 - `pkg_resources` has been replaced with `importlib.resources` and `importlib.metadata` where applicable.
218 - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable. 238 - `multiprocessing.cpu_count` has been replaced with `os.cpu_count` where applicable.
219 239
220 ## [v0.6.2] - 2021-05-04 240 ## [v0.6.2] - 2021-05-04
221 [v0.6.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.1...v0.6.2 241 [v0.6.2]: https://github.com/zellerlab/GECCO/compare/v0.6.1...v0.6.2
222 ### Fixed 242 ### Fixed
223 - `gecco cv loto` crashing because of outdated code. 243 - `gecco cv loto` crashing because of outdated code.
224 ### Changed 244 ### Changed
225 - Logging-style prompt will only display if GECCO is running with `-vv` flag. 245 - Logging-style prompt will only display if GECCO is running with `-vv` flag.
226 ### Added 246 ### Added
227 - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record. 247 - GECCO bioRxiv paper reference to `Cluster.to_seq_record` output record.
228 248
229 ## [v0.6.1] - 2021-03-15 249 ## [v0.6.1] - 2021-03-15
230 [v0.6.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.6.0...v0.6.1 250 [v0.6.1]: https://github.com/zellerlab/GECCO/compare/v0.6.0...v0.6.1
231 ### Fixed 251 ### Fixed
232 - Progress bar not being disabled by `-q` flag in CLI. 252 - Progress bar not being disabled by `-q` flag in CLI.
233 - Fallback to using HMM name if accession is not available in `PyHMMER`. 253 - Fallback to using HMM name if accession is not available in `PyHMMER`.
234 - Group genes by source contig and process them separately in `PyHMMER` to avoid bogus E-values. 254 - Group genes by source contig and process them separately in `PyHMMER` to avoid bogus E-values.
235 ### Added 255 ### Added
237 - Support for using an arbitrary mapping of positives to negatives in `gecco embed`. 257 - Support for using an arbitrary mapping of positives to negatives in `gecco embed`.
238 ### Removed 258 ### Removed
239 - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`. 259 - Unused and outdated `HMMER` and `DomainRow` classes from `gecco.hmmer`.
240 260
241 ## [v0.6.0] - 2021-02-28 261 ## [v0.6.0] - 2021-02-28
242 [v0.6.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.5...v0.6.0 262 [v0.6.0]: https://github.com/zellerlab/GECCO/compare/v0.5.5...v0.6.0
243 ### Changed 263 ### Changed
244 - Updated internal model with a cleaned-up version of the MIBiG-2.0 264 - Updated internal model with a cleaned-up version of the MIBiG-2.0
245 Pfam-33.1/Tigrfam-15.0 embedding. 265 Pfam-33.1/Tigrfam-15.0 embedding.
246 - Updated internal InterPro catalog. 266 - Updated internal InterPro catalog.
247 ### Fixed 267 ### Fixed
248 - Features not being grouped together in `gecco cv` and `gecco train` 268 - Features not being grouped together in `gecco cv` and `gecco train`
249 when provided with a feature table where rows were not sorted by 269 when provided with a feature table where rows were not sorted by
250 protein IDs. 270 protein IDs.
251 271
252 ## [v0.5.5] - 2021-02-28 272 ## [v0.5.5] - 2021-02-28
253 [v0.5.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.4...v0.5.5 273 [v0.5.5]: https://github.com/zellerlab/GECCO/compare/v0.5.4...v0.5.5
254 ### Fixed 274 ### Fixed
255 - `gecco cv` bug causing only the last fold to be written. 275 - `gecco cv` bug causing only the last fold to be written.
256 276
257 ## [v0.5.4] - 2021-02-28 277 ## [v0.5.4] - 2021-02-28
258 [v0.5.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.3...v0.5.4 278 [v0.5.4]: https://github.com/zellerlab/GECCO/compare/v0.5.3...v0.5.4
259 ### Changed 279 ### Changed
260 - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`. 280 - Replaced `verboselogs`, `coloredlogs` and `better-exceptions` with `rich`.
261 ### Removed 281 ### Removed
262 - `tqdm` training dependency. 282 - `tqdm` training dependency.
263 ### Added 283 ### Added
264 - `gecco annotate` command to produce a feature table from a genomic file. 284 - `gecco annotate` command to produce a feature table from a genomic file.
265 - `gecco embed` to embed BGCs into non-BGC regions using feature tables. 285 - `gecco embed` to embed BGCs into non-BGC regions using feature tables.
266 286
267 ## [v0.5.3] - 2021-02-21 287 ## [v0.5.3] - 2021-02-21
268 [v0.5.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.2...v0.5.3 288 [v0.5.3]: https://github.com/zellerlab/GECCO/compare/v0.5.2...v0.5.3
269 ### Fixed 289 ### Fixed
270 - Coordinates of genes in output GenBank files. 290 - Coordinates of genes in output GenBank files.
271 - Potential issue with the number of CPUs in `PyHMMER.run`. 291 - Potential issue with the number of CPUs in `PyHMMER.run`.
272 ### Changed 292 ### Changed
273 - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow. 293 - Bump required `pyrodigal` version to `v0.4.2` to fix buffer overflow.
274 294
275 ## [v0.5.2] - 2021-01-29 295 ## [v0.5.2] - 2021-01-29
276 [v0.5.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.1...v0.5.2 296 [v0.5.2]: https://github.com/zellerlab/GECCO/compare/v0.5.1...v0.5.2
277 ### Added 297 ### Added
278 - Support for downloading HMM files directly from GitHub releases assets. 298 - Support for downloading HMM files directly from GitHub releases assets.
279 - Validation of filtered HMMs with MD5 checksum. 299 - Validation of filtered HMMs with MD5 checksum.
280 ### Fixed 300 ### Fixed
281 - Invalid coordinates of protein domains in GenBank output files. 301 - Invalid coordinates of protein domains in GenBank output files.
282 - `gecco.interpro` module not being added to wheel distribution. 302 - `gecco.interpro` module not being added to wheel distribution.
283 ### Changed 303 ### Changed
284 - Bump required `pyhmmer` version to `v0.2.1`. 304 - Bump required `pyhmmer` version to `v0.2.1`.
285 305
286 ## [v0.5.1] - 2021-01-15 306 ## [v0.5.1] - 2021-01-15
287 [v0.5.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.5.0...v0.5.1 307 [v0.5.1]: https://github.com/zellerlab/GECCO/compare/v0.5.0...v0.5.1
288 ### Fixed 308 ### Fixed
289 - `--hmm` flag being ignored in in `gecco run` command. 309 - `--hmm` flag being ignored in in `gecco run` command.
290 - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs. 310 - `PyHMMER` using HMM names instead of accessions, causing issues with Pfam HMMs.
291 311
292 ## [v0.5.0] - 2021-01-11 312 ## [v0.5.0] - 2021-01-11
293 [v0.5.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.5...v0.5.0 313 [v0.5.0]: https://github.com/zellerlab/GECCO/compare/v0.4.5...v0.5.0
294 ### Added 314 ### Added
295 - Explicit support for Python 3.9. 315 - Explicit support for Python 3.9.
296 ### Changed 316 ### Changed
297 - [`pyhmmer`](https://pypi.org/project/pyhmmer) is used to annotate protein sequences instead of HMMER3 binary `hmmsearch`. 317 - [`pyhmmer`](https://pypi.org/project/pyhmmer) is used to annotate protein sequences instead of HMMER3 binary `hmmsearch`.
298 - HMM files are stored in binary format to speedup parsing and reduce storage size. 318 - HMM files are stored in binary format to speedup parsing and reduce storage size.
299 - `tqdm` is now a *training*-only dependency. 319 - `tqdm` is now a *training*-only dependency.
300 - `gecco cv` now requires *training* dependencies. 320 - `gecco cv` now requires *training* dependencies.
301 321
302 ## [v0.4.5] - 2020-11-23 322 ## [v0.4.5] - 2020-11-23
303 [v0.4.5]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.4...v0.4.5 323 [v0.4.5]: https://github.com/zellerlab/GECCO/compare/v0.4.4...v0.4.5
304 ### Added 324 ### Added
305 - Additional `fold` column to cross-validation table output. 325 - Additional `fold` column to cross-validation table output.
306 ### Changed 326 ### Changed
307 - Use sequence ID instead of protein ID to extract type from cluster in `gecco cv`. 327 - Use sequence ID instead of protein ID to extract type from cluster in `gecco cv`.
308 - Install HMM data in pre-pressed format to make `hmmsearch` runs faster on short sequences. 328 - Install HMM data in pre-pressed format to make `hmmsearch` runs faster on short sequences.
309 - `gecco.orf` was rewritten to extract genes from input sequences in parallel. 329 - `gecco.orf` was rewritten to extract genes from input sequences in parallel.
310 330
311 ## [v0.4.4] - 2020-09-30 331 ## [v0.4.4] - 2020-09-30
312 [v0.4.4]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.3...v0.4.4 332 [v0.4.4]: https://github.com/zellerlab/GECCO/compare/v0.4.3...v0.4.4
313 ### Added 333 ### Added
314 - `gecco cv loto` command to run LOTO cross-validation using BGC types 334 - `gecco cv loto` command to run LOTO cross-validation using BGC types
315 for stratification. 335 for stratification.
316 - `header` keyword argument to `FeatureTable.dump` and `ClusterTable.dump` 336 - `header` keyword argument to `FeatureTable.dump` and `ClusterTable.dump`
317 to write the table without the column header allowing to append to an 337 to write the table without the column header allowing to append to an
323 the tables for every fold in memory. 343 the tables for every fold in memory.
324 ### Changed 344 ### Changed
325 - Bumped `pandas` training dependency to `v1.0`. 345 - Bumped `pandas` training dependency to `v1.0`.
326 346
327 ## [v0.4.3] - 2020-09-07 347 ## [v0.4.3] - 2020-09-07
328 [v0.4.3]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.2...v0.4.3 348 [v0.4.3]: https://github.com/zellerlab/GECCO/compare/v0.4.2...v0.4.3
329 ### Fixed 349 ### Fixed
330 - GenBank files being written with invalid `/cds` feature type. 350 - GenBank files being written with invalid `/cds` feature type.
331 ### Changed 351 ### Changed
332 - Blocked installation of Biopython `v1.78` or newer as it removes `Bio.Alphabet` 352 - Blocked installation of Biopython `v1.78` or newer as it removes `Bio.Alphabet`
333 and breaks the current code. 353 and breaks the current code.
334 354
335 ## [v0.4.2] - 2020-08-07 355 ## [v0.4.2] - 2020-08-07
336 [v0.4.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.1...v0.4.2 356 [v0.4.2]: https://github.com/zellerlab/GECCO/compare/v0.4.1...v0.4.2
337 ### Fixed 357 ### Fixed
338 - `TypeClassifier.predict_types` using inverse type probabilities when 358 - `TypeClassifier.predict_types` using inverse type probabilities when
339 given several clusters to process. 359 given several clusters to process.
340 360
341 ## [v0.4.1] - 2020-08-07 361 ## [v0.4.1] - 2020-08-07
342 [v0.4.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.4.0...v0.4.1 362 [v0.4.1]: https://github.com/zellerlab/GECCO/compare/v0.4.0...v0.4.1
343 ### Fixed 363 ### Fixed
344 - `gecco run` command crashing on input sequences not containing any genes. 364 - `gecco run` command crashing on input sequences not containing any genes.
345 365
346 ## [v0.4.0] - 2020-08-06 366 ## [v0.4.0] - 2020-08-06
347 [v0.4.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.3.0...v0.4.0 367 [v0.4.0]: https://github.com/zellerlab/GECCO/compare/v0.3.0...v0.4.0
348 ### Added 368 ### Added
349 - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC. 369 - `gecco.model.ProductType` enum to model the biosynthetic class of a BGC.
350 ### Removed 370 ### Removed
351 - `pandas` interaction from internal data model. 371 - `pandas` interaction from internal data model.
352 - `ClusterCRF` code specific to cross-validation. 372 - `ClusterCRF` code specific to cross-validation.
354 - `pandas`, `fisher` and `statsmodels` dependencies are now optional. 374 - `pandas`, `fisher` and `statsmodels` dependencies are now optional.
355 - `gecco train` command expects a cluster table in addition to the feature 375 - `gecco train` command expects a cluster table in addition to the feature
356 table to know the types of the input BGCs. 376 table to know the types of the input BGCs.
357 377
358 ## [v0.3.0] - 2020-08-03 378 ## [v0.3.0] - 2020-08-03
359 [v0.3.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.2...v0.3.0 379 [v0.3.0]: https://github.com/zellerlab/GECCO/compare/v0.2.2...v0.3.0
360 ### Changed 380 ### Changed
361 - Replaced Nearest-Neighbours classifier with Random Forest to perform type 381 - Replaced Nearest-Neighbours classifier with Random Forest to perform type
362 prediction for candidate BGCs. 382 prediction for candidate BGCs.
363 - `gecco.knn` module was renamed to implementation-agnostic name `gecco.types`. 383 - `gecco.knn` module was renamed to implementation-agnostic name `gecco.types`.
364 ### Fixed 384 ### Fixed
365 - Extraction of domain composition taking a long time in `gecco train` command. 385 - Extraction of domain composition taking a long time in `gecco train` command.
366 ### Removed 386 ### Removed
367 - `--metric` argument to the `gecco run` CLI command. 387 - `--metric` argument to the `gecco run` CLI command.
368 388
369 ## [v0.2.2] - 2020-07-31 389 ## [v0.2.2] - 2020-07-31
370 [v0.2.2]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.1...v0.2.2 390 [v0.2.2]: https://github.com/zellerlab/GECCO/compare/v0.2.1...v0.2.2
371 ### Changed 391 ### Changed
372 - `Domain` and `Gene` can now carry qualifiers that are used when they 392 - `Domain` and `Gene` can now carry qualifiers that are used when they
373 are translated to a sequence feature. 393 are translated to a sequence feature.
374 ### Added 394 ### Added
375 - InterPro names, accessions, and HMMER e-value for each annotated domain 395 - InterPro names, accessions, and HMMER e-value for each annotated domain
376 in GenBank output files. 396 in GenBank output files.
377 397
378 ## [v0.2.1] - 2020-07-23 398 ## [v0.2.1] - 2020-07-23
379 [v0.2.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.2.0...v0.2.1 399 [v0.2.1]: https://github.com/zellerlab/GECCO/compare/v0.2.0...v0.2.1
380 ### Fixed 400 ### Fixed
381 - Various potential crashes in `ClusterRefiner` code. 401 - Various potential crashes in `ClusterRefiner` code.
382 ### Removed 402 ### Removed
383 - Uneeded feature dictionary filtering in `ClusterCRF` for models with 403 - Uneeded feature dictionary filtering in `ClusterCRF` for models with
384 Fisher Exact Test feature selection. 404 Fisher Exact Test feature selection.
385 405
386 ## [v0.2.0] - 2020-07-23 406 ## [v0.2.0] - 2020-07-23
387 [v0.2.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.1...v0.2.0 407 [v0.2.0]: https://github.com/zellerlab/GECCO/compare/v0.1.1...v0.2.0
388 ### Fixed 408 ### Fixed
389 - `pandas` warning about unsorted columns in `gecco run`. 409 - `pandas` warning about unsorted columns in `gecco run`.
390 ### Removed 410 ### Removed
391 - `Gene.probability` property, replaced by `Gene.maximum_probability` and 411 - `Gene.probability` property, replaced by `Gene.maximum_probability` and
392 `Gene.average_probability` properties to be explicit. 412 `Gene.average_probability` properties to be explicit.
395 selected with Fisher's Exact Test. 415 selected with Fisher's Exact Test.
396 - `ClusterRefiner` now removes genes on `Cluster` edges if they do not 416 - `ClusterRefiner` now removes genes on `Cluster` edges if they do not
397 contain any domain annotation. 417 contain any domain annotation.
398 418
399 ## [v0.1.1] - 2020-07-22 419 ## [v0.1.1] - 2020-07-22
400 [v0.1.1]: https://git.embl.de/grp-zeller/GECCO/compare/v0.1.0...v0.1.1 420 [v0.1.1]: https://github.com/zellerlab/GECCO/compare/v0.1.0...v0.1.1
401 ### Added 421 ### Added
402 - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`. 422 - `ClusterCRF.predict_probabilities` to annotate a list of `Gene`.
403 ### Changed 423 ### Changed
404 - BGC probability is now stored at the `Domain` level instead of at the `Gene` 424 - BGC probability is now stored at the `Domain` level instead of at the `Gene`
405 level, independently of the feature extraction level used by the CRF. 425 level, independently of the feature extraction level used by the CRF.
408 - Added this changelog file to document changes in the code. 428 - Added this changelog file to document changes in the code.
409 - Added documentation to `gecco` submodules missing some. 429 - Added documentation to `gecco` submodules missing some.
410 - Included the `CHANGELOG.md` file to the generated docs. 430 - Included the `CHANGELOG.md` file to the generated docs.
411 431
412 ## [v0.1.0] - 2020-07-17 432 ## [v0.1.0] - 2020-07-17
413 [v0.1.0]: https://git.embl.de/grp-zeller/GECCO/compare/v0.0.1...v0.1.0 433 [v0.1.0]: https://github.com/zellerlab/GECCO/compare/v0.0.1...v0.1.0
414 Initial release. 434 Initial release.
415 435
416 ## [v0.0.1] - 2018-08-13 436 ## [v0.0.1] - 2018-08-13
417 [v0.0.1]: https://git.embl.de/grp-zeller/GECCO/compare/37afb97...v0.0.1 437 [v0.0.1]: https://github.com/zellerlab/GECCO/compare/37afb97...v0.0.1
418 Proof-of-concept. 438 Proof-of-concept.