Repository 'gecco'
hg clone https://toolshed.g2.bx.psu.edu/repos/althonos/gecco

Changeset 0:1625927fc16f (2021-11-21)
Next changeset 1:0699939e6dd6 (2021-11-21)
Commit message:
"Release v0.8.4"
added:
README.rst
gecco.xml
test-data/BGC0001866.1_cluster_1.gbk
test-data/BGC0001866.fna
test-data/clusters.tsv
test-data/features.tsv
b
diff -r 000000000000 -r 1625927fc16f README.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst Sun Nov 21 16:53:12 2021 +0000
b
@@ -0,0 +1,136 @@
+Hi, I’m GECCO!
+==============
+
+🦎 ️Overview
+---------------
+
+GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast
+and scalable method for identifying putative novel Biosynthetic Gene
+Clusters (BGCs) in genomic and metagenomic data using Conditional Random
+Fields (CRFs).
+
+|GitLabCI| |License| |Coverage| |Docs| |Source| |Mirror| |Changelog|
+|Issues| |Preprint| |PyPI| |Bioconda| |Versions| |Wheel|
+
+🔧 Installing GECCO
+-------------------
+
+GECCO is implemented in `Python <https://www.python.org/>`__, and
+supports `all versions <https://endoflife.date/python>`__ from Python
+3.6. It requires additional libraries that can be installed directly
+from `PyPI <https://pypi.org>`__, the Python Package Index.
+
+Use ```pip`` <https://pip.pypa.io/en/stable/>`__ to install GECCO on
+your machine:
+
+.. code:: console
+
+   $ pip install gecco-tool
+
+If you’d rather use `Conda <https://conda.io>`__, a package is available
+in the ```bioconda`` <https://bioconda.github.io/>`__ channel. You can
+install with:
+
+.. code:: console
+
+   $ conda install -c bioconda gecco
+
+This will install GECCO, its dependencies, and the data needed to run
+predictions. This requires around 100MB of data to be downloaded, so it
+could take some time depending on your Internet connection. Once done,
+you will have a ``gecco`` command available in your $PATH.
+
+*Note that GECCO uses*\ `HMMER3 <http://hmmer.org/>`__\ *, which can
+only run on PowerPC and recent x86-64 machines running a POSIX operating
+system. Therefore, Linux and OSX are supported platforms, but GECCO will
+not be able to run on Windows.*
+
+🧬 Running GECCO
+-----------------
+
+Once ``gecco`` is installed, you can run it from the terminal by giving
+it a FASTA or GenBank file with the genomic sequence you want to
+analyze, as well as an output directory:
+
+.. code:: console
+
+   $ gecco run --genome some_genome.fna -o some_output_dir
+
+Additional parameters of interest are:
+
+-  ``--jobs``, which controls the number of threads that will be spawned
+   by GECCO whenever a step can be parallelized. The default, *0*, will
+   autodetect the number of CPUs on the machine using
+   ```os.cpu_count`` <https://docs.python.org/3/library/os.html#os.cpu_count>`__.
+-  ``--cds``, controlling the minimum number of consecutive genes a BGC
+   region must have to be detected by GECCO (default is 3).
+-  ``--threshold``, controlling the minimum probability for a gene to be
+   considered part of a BGC region. Using a lower number will increase
+   the number (and possibly length) of predictions, but reduce accuracy.
+
+🔖 Reference
+-------------
+
+GECCO can be cited using the following preprint:
+
+   **Accurate de novo identification of biosynthetic gene clusters with
+   GECCO**. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby
+   Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller.
+   bioRxiv 2021.05.03.442509;
+   `doi:10.1101/2021.05.03.442509 <https://doi.org/10.1101/2021.05.03.442509>`__
+
+💭 Feedback
+------------
+
+⚠️ Issue Tracker
+~~~~~~~~~~~~~~~~
+
+Found a bug ? Have an enhancement request ? Head over to the `GitHub
+issue tracker <https://github.com/zellerlab/GECCO/issues>`__ if you need
+to report or ask something. If you are filing in on a bug, please
+include as much information as you can about the issue, and try to
+recreate the same bug in a simple, easily reproducible situation.
+
+🏗️ Contributing
+~~~~~~~~~~~~~~~~
+
+Contributions are more than welcome! See
+```CONTRIBUTING.md`` <https://github.com/althonos/pyhmmer/blob/master/CONTRIBUTING.md>`__
+for more details.
+
+⚖️ License
+----------
+
+This software is provided under the `GNU General Public License v3.0 or
+later <https://choosealicense.com/licenses/gpl-3.0/>`__. GECCO is
+developped by the `Zeller
+Team <https://www.embl.de/research/units/scb/zeller/index.html>`__ at
+the `European Molecular Biology Laboratory <https://www.embl.de/>`__ in
+Heidelberg.
+
+.. |GitLabCI| image:: https://img.shields.io/gitlab/pipeline/grp-zeller/GECCO/master?gitlab_url=https%3A%2F%2Fgit.embl.de&style=flat-square&maxAge=600
+   :target: https://git.embl.de/grp-zeller/GECCO/-/pipelines/
+.. |License| image:: https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square&maxAge=2678400
+   :target: https://choosealicense.com/licenses/gpl-3.0/
+.. |Coverage| image:: https://img.shields.io/codecov/c/gh/zellerlab/GECCO?style=flat-square&maxAge=600
+   :target: https://codecov.io/gh/zellerlab/GECCO/
+.. |Docs| image:: https://img.shields.io/badge/docs-gecco.embl.de-green.svg?maxAge=2678400&style=flat-square
+   :target: https://gecco.embl.de
+.. |Source| image:: https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400&style=flat-square
+   :target: https://github.com/zellerlab/GECCO/
+.. |Mirror| image:: https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square&maxAge=2678400
+   :target: https://git.embl.de/grp-zeller/GECCO/
+.. |Changelog| image:: https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400&style=flat-square
+   :target: https://github.com/zellerlab/GECCO/blob/master/CHANGELOG.md
+.. |Issues| image:: https://img.shields.io/github/issues/zellerlab/GECCO.svg?style=flat-square&maxAge=600
+   :target: https://github.com/zellerlab/GECCO/issues
+.. |Preprint| image:: https://img.shields.io/badge/preprint-bioRxiv-darkblue?style=flat-square&maxAge=2678400
+   :target: https://www.biorxiv.org/content/10.1101/2021.05.03.442509v1
+.. |PyPI| image:: https://img.shields.io/pypi/v/gecco-tool.svg?style=flat-square&maxAge=3600
+   :target: https://pypi.python.org/pypi/gecco-tool
+.. |Bioconda| image:: https://img.shields.io/conda/vn/bioconda/gecco?style=flat-square&maxAge=3600
+   :target: https://anaconda.org/bioconda/gecco
+.. |Versions| image:: https://img.shields.io/pypi/pyversions/gecco-tool.svg?style=flat-square&maxAge=3600
+   :target: https://pypi.org/project/gecco-tool/#files
+.. |Wheel| image:: https://img.shields.io/pypi/wheel/gecco-tool?style=flat-square&maxAge=3600
+   :target: https://pypi.org/project/gecco-tool/#files
b
diff -r 000000000000 -r 1625927fc16f gecco.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/gecco.xml Sun Nov 21 16:53:12 2021 +0000
[
@@ -0,0 +1,86 @@
+<?xml version='1.0' encoding='utf-8'?>
+<tool id="gecco" name="GECCO" version="0.8.4" python_template_version="3.5">
+    <description>GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).</description>
+    <requirements>
+        <requirement type="package" version="0.8.4">gecco</requirement>
+    </requirements>
+    <version_command>gecco --version</version_command>
+    <command detect_errors="aggressive"><![CDATA[
+
+        #if str($input.ext) == 'genbank':
+            #set $file_extension = 'gbk'
+        #else:
+            #set $file_extension = $input.ext
+        #end if
+        ln -s '$input' input_tempfile.$file_extension &&
+
+        gecco -vv run -g input_tempfile.$file_extension &&
+        mv input_tempfile.features.tsv $features &&
+        mv input_tempfile.clusters.tsv $clusters
+
+    ]]></command>
+    <inputs>
+        <param name="input" type="data" format="genbank,fasta" label="Sequence file in GenBank or FASTA format"/>
+    </inputs>
+    <outputs>
+        <collection name="records" type="list" label="${tool.name} detected Biosynthetic Gene Clusters on ${on_string} (GenBank)">
+            <discover_datasets pattern="(?P&lt;designation&gt;.*)\.gbk" ext="genbank" visible="false" />
+        </collection>
+        <data name="features" format="tabular" label="${tool.name} summary of detected features on ${on_string} (TSV)"/>
+        <data name="clusters" format="tabular" label="${tool.name} summary of detected BGCs on ${on_string} (TSV)"/>
+    </outputs>
+    <tests>
+        <test>
+            <param name="input" value="BGC0001866.fna"/>
+            <output name="features" file="features.tsv"/>
+            <output name="clusters" file="clusters.tsv"/>
+            <output_collection name="records" type="list">
+                <element name="BGC0001866.1_cluster_1" file="BGC0001866.1_cluster_1.gbk" ftype="genbank" lines_diff="2"/>
+            </output_collection>
+        </test>
+    </tests>
+    <help>
+<![CDATA[
+
+**Overview**
+
+GECCO is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
+It is developed in the Zeller group and is part of the suite of computational microbiome analysis tools hosted at EMBL.
+
+**Input**
+
+GECCO works with DNA sequences, and loads them using Biopython, allowing it to support a large variety of formats, including the common FASTA and GenBank files.
+
+**Output**
+
+GECCO will create the following files once done (using the same prefix as the input file):
+
+- features.tsv: The features file, containing the identified proteins and domains in the input sequences.
+- clusters.tsv: If any were found, a clusters file, containing the coordinates of the predicted clusters, along their putative biosynthetic type.
+- {sequence}_cluster_{N}.gbk: If any BGCs were found, a GenBank file per cluster, containing the cluster sequence annotated with its member proteins and domains.
+
+**Contact**
+
+If you have any question about GECCO, if you run into any issue, or if you would like to make a feature request, please create an issue in the GitHub repository. 
+You can also directly contact Martin Larralde via email. If you want to contribute to GECCO, please have a look at the contribution guide first, and feel free to 
+open a pull request on the GitHub repository.
+
+]]>
+    </help>
+    <citations>
+        <citation type="bibtex">
+@article {Carroll2021.05.03.442509,
+ author = {Carroll, Laura M. and Larralde, Martin and Fleck, Jonas Simon and Ponnudurai, Ruby and Milanese, Alessio and Cappio, Elisa and Zeller, Georg},
+ title = {Accurate de novo identification of biosynthetic gene clusters with GECCO},
+ elocation-id = {2021.05.03.442509},
+ year = {2021},
+ doi = {10.1101/2021.05.03.442509},
+ publisher = {Cold Spring Harbor Laboratory},
+ abstract = {Biosynthetic gene clusters (BGCs) are enticing targets for (meta)genomic mining efforts, as they may encode novel, specialized metabolites with potential uses in medicine and biotechnology. Here, we describe GECCO (GEne Cluster prediction with COnditional random fields; https://gecco.embl.de), a high-precision, scalable method for identifying novel BGCs in (meta)genomic data using conditional random fields (CRFs). Based on an extensive evaluation of de novo BGC prediction, we found GECCO to be more accurate and over 3x faster than a state-of-the-art deep learning approach. When applied to over 12,000 genomes, GECCO identified nearly twice as many BGCs compared to a rule-based approach, while achieving higher accuracy than other machine learning approaches. Introspection of the GECCO CRF revealed that its predictions rely on protein domains with both known and novel associations to secondary metabolism. The method developed here represents a scalable, interpretable machine learning approach, which can identify BGCs de novo with high precision.Competing Interest StatementThe authors have declared no competing interest.},
+ URL = {https://www.biorxiv.org/content/early/2021/05/04/2021.05.03.442509},
+ eprint = {https://www.biorxiv.org/content/early/2021/05/04/2021.05.03.442509.full.pdf},
+ journal = {bioRxiv}
+}
+        </citation>
+    </citations>
+</tool>
b
diff -r 000000000000 -r 1625927fc16f test-data/BGC0001866.1_cluster_1.gbk
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/BGC0001866.1_cluster_1.gbk Sun Nov 21 16:53:12 2021 +0000
b
b'@@ -0,0 +1,1110 @@\n+LOCUS       BGC0001866.1_cluster_1 32633 bp    DNA     linear   UNK 21-NOV-2021\n+DEFINITION  BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome\n+            Unknown C8Q69scaffold_14, whole genome shotgun sequence.\n+ACCESSION   BGC0001866.1_cluster_1\n+VERSION     BGC0001866.1_cluster_1\n+KEYWORDS    .\n+SOURCE      .\n+  ORGANISM  .\n+            .\n+REFERENCE   1\n+  AUTHORS   Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby\n+            Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller\n+  TITLE     Accurate de novo identification of biosynthetic gene clusters with\n+            GECCO\n+  JOURNAL   bioRxiv (2021.05.03.442509)\n+  REMARK    doi:10.1101/2021.05.03.442509\n+COMMENT     ##GECCO-Data-START##\n+            version                :: GECCO v0.8.4\n+            creation_date          :: 2021-11-21T16:33:58.470847\n+            biosyn_class           :: Polyketide\n+            alkaloid_probability   :: 0.0\n+            polyketide_probability :: 0.98\n+            ripp_probability       :: 0.0\n+            saccharide_probability :: 0.0\n+            terpene_probability    :: 0.0\n+            nrp_probability        :: 0.14\n+            other_probability      :: 0.0\n+            ##GECCO-Data-END##\n+FEATURES             Location/Qualifiers\n+     CDS             complement(1..1143)\n+                     /inference="ab initio prediction:Prodigal:2.6"\n+                     /transl_table=11\n+                     /locus_tag="BGC0001866.1_1"\n+                     /translation="MWIYEVDGHYIEPRRADTFLIWAGERYSAMIRLDKKPMDYSIRVP\n+                     DGGYSQMIAAFGILRYKNGDPNARQKPDRFGVTTISKPYFDYNAWPMRDAVFLDKLDLP\n+                     PWPRKVPAAHGDDMHVLYLGKANSTWEFTLSGKKKYPPDRSAYEPLLYNVNSEQAHDDD\n+                     LIIRTQNGTWQDIVLQVGHSPLWPVDFPHAVHKHANKYWRIGGGQGLWNYSSVEEAMAD\n+                     QPESFNMVNPPYRDTFLTEFTGAMWVVLRYQVTSPGAWLLHCHFEMHLDNGMAMAILDG\n+                     VDKWPHVPPEYTQGFHGFREHELPGPAGFWGLVSKILRPESLVWAGGAAVVLLSLFIGG\n+                     LWRLWQRRMQGTYYVLSQEDERDRFSMDKEAWKSEETKRM*"\n+     misc_feature    1..189\n+                     /inference="protein motif"\n+                     /db_xref="PFAM:PF00394"\n+                     /db_xref="InterPro:IPR001117"\n+                     /note="e-value: 2.1941888078432915e-08"\n+                     /note="p-value: 8.178117062405111e-12"\n+                     /function="Multicopper oxidase"\n+                     /standard_name="PF00394"\n+     misc_feature    448..843\n+                     /inference="protein motif"\n+                     /db_xref="PFAM:PF07731"\n+                     /db_xref="InterPro:IPR011706"\n+                     /note="e-value: 3.9374169295176556e-23"\n+                     /note="p-value: 1.467542649838858e-26"\n+                     /function="Multicopper oxidase"\n+                     /standard_name="PF07731"\n+     CDS             1179..1670\n+                     /inference="ab initio prediction:Prodigal:2.6"\n+                     /transl_table=11\n+                     /locus_tag="BGC0001866.1_2"\n+                     /translation="MSSLRSSSHSPSGLPGQPRLPLLDRSREHSLPGDRAGWRTRSRLR\n+                     ATDLLSMVRMGSTYTIIRDMNYTDDESPGRSPFVCDSVIRPALVHERDLLVNKPLMART\n+                     IDAPFAVEKNTIDATDFISQSTRNVLISVHWNHTRSAVGCLHLLLYTGSSCSSPSQKAS\n+                     *"\n+     CDS             complement(2167..2376)\n+                     /inference="ab initio prediction:Prodigal:2.6"\n+                     /transl_table=11\n+                     /locus_tag="BGC0001866.1_3"\n+                     /translation="MPAYLLLLACNVLLVLGAHVQRELVLTWEEGAPNGQSRQMIKTNG\n+                     QFPSPTLIFDEGDDVEVGGISFAN*"\n+     CDS             2559..3032\n+                     /inference="ab initio prediction:Prodigal:2.6"\n+                     /transl_table=11\n+                     /locus_tag="BGC0001866.1_4"\n+                     /translation="MLFNSEVGVEEHVVLWSFQETTSITMAEEIKLTPLETFAQAISAS\n+                     AKTIATYCRDSGHPQLSDDNSSGLTGDVLPPSAPQAVTA'..b'   29521 ccggattgtg gacgagaagt ccaccgaagg gacgttttca atcacatgcg agtcagatgt\n+    29581 atcccgacca gacctcagcc ctctggttca gggccataag gtcgaaggga tcggactttg\n+    29641 tacaccggta tgaatctcca cactcatgtt cgctgcgcag cataatcact gactccttct\n+    29701 gcagtccgtt tatgccgata taggattcac gctgggaaat taccttctag atcgtttccc\n+    29761 aactcgattc ggaccggata ctaaagttgt ggatgtcacg gacatggtga ttgaaaaggc\n+    29821 tcttatgccg ttgaatgcgg gaccacaatt actgcgagtc acggcttcat taatctggtc\n+    29881 cgagaaagag gcttctgtcc ggttctacag cgtggatgta agacgtccct cttctaaatc\n+    29941 tcagatgaat actaatattc ataattccca ggaaaatcac accgaaacag tacaacattc\n+    30001 ccactgccgc attaaattca gcgaccgttc aacgtaccaa gcctatcaag agcaaatctc\n+    30061 cgccgttaag gctcgtatgt ttgagatgaa gaccaactcc tcatcgggta gaacctaccg\n+    30121 attcaacgga ccaatggcat acaatatggt gcaggcgttg gcggaattcc acccggatta\n+    30181 ccggtgtatt gacgagacga ttctcgacaa cgagacactc gaagcagcct gtacagtcag\n+    30241 cttcgggaat gtcaagaagg agggtgtatt ccacacacat cctggctata tagatggact\n+    30301 cacgcagtcg ggcgggtttg tgatgaacgc taacgacaag actaatctcg gagtagaagt\n+    30361 gttcgttaat catgggtggg actcgttcca gttgtacgag cctgtcactg atgatcgttc\n+    30421 gtatcagact catgttcgga tgaggccggc ggagtcgaat cagtggaagg gtgatgtggt\n+    30481 cgttctaagt ggggagaatt tggtcgcttg tgttcgagga ttgacggtaa gtcgagagac\n+    30541 ctaagtaaca atctcctgtt tagaggagaa aaaagaaaga gaaagcggat ttgctgacta\n+    30601 ccttccagat ccaaggagta cccaggcgag tcctgcggta tatcctgcaa agcagtgcaa\n+    30661 aaaccacaca gacagccact tcgagcgtgc ctgccccgtc tcaagctccg gtgatggtgc\n+    30721 cacagattgt ccaagtacca aaagctaagc ctatctccca aatttccggg accctgacag\n+    30781 aggctctccg gattatttgt gaacaaagtg gtgtgcctct agcagagctc acggatgatg\n+    30841 caactttcgc gaacatcggc gtagactctc tcctagcgct gactatcaca agtgcatttg\n+    30901 ttgaggagct ggatctagac gtcgattctt ccttgttcat ggactatcct actgtggcgg\n+    30961 acctgaagcg gttcttcgac aagatcaaca cgcagcatgc tccggcacca gccccggtat\n+    31021 cagacgcgcc aaagcaatta caaccaagca gtagcccagt tgcatctgct actccgtctg\n+    31081 cacccatcca tggcagatcg aaatttgaat cagttcttaa catccttacc gaggaaagtg\n+    31141 gtgttgaaat ggcaggtctt ccggactcta ctgcgcttgc agacataggt atcgattcgc\n+    31201 tcttgtccct ggtagtcacg agccggctga acgatgagtt agagctagat gtgtcgtctg\n+    31261 aagacttcaa tgactgtctg actatccggg atctcaaggc acatttcatg tccaagaact\n+    31321 ccgacaatgg ttcgtctgcg gttcttactc ctcagccatc tcgggactcc gcactccctg\n+    31381 agcgcacgag acctagggtc gctgatacaa gcgatgagga ggatgcaccg gtttcagcaa\n+    31441 atgaattcac aaccagtgcc cgctctacat ctaagtatat ggctgtgctc aacataattt\n+    31501 ccgaagaaag cggcatggca atcgaagact tcaccgacaa tgtaatgttc gcagatatcg\n+    31561 gaatagactc gctgctgtcc ttggtcattg gaggtagaat acgggaagag ctatctttcg\n+    31621 acctcgaggt ggactctctt ttcgtggact acccagatgt caagggactg aggtcatttt\n+    31681 tcggatttga gagcaacaag acggcgacaa atccaactgc gagtcaatcg tcttcgtcca\n+    31741 tttcaagcgg cacttcggtc ttcgatacat caccttctcc cacagactta gacatcctaa\n+    31801 ctccagaatc cagcctctca caagaggagt tcgagcaacc gctcacaata gcaacaaagc\n+    31861 cacttccacc cgcaacttca gtcactctgc agggtttacc ctccaaggca cacaagatac\n+    31921 ttttcctttt cccagatggc tctggctcag caacatcata cgcgaaactc ccccgactcg\n+    31981 gtgcggacgt agccattatc ggcctgaact caccctacct gatggacggc gccaacatga\n+    32041 cctgcacctt cgacgagctc gttacactgt acctcacaga aatccagcga cgtcaacccg\n+    32101 caggcccata ccacttgggc ggctggtccg ccggtggcat tctcgcttac cgcgctgcgc\n+    32161 aaatcctcca aaaagccgcc gccaaccccc agaaaccagt agtagaatcc ctgctcctcc\n+    32221 tcgactctcc accaccaaca gggctcggca agctccccaa acatttcttt gactactgtg\n+    32281 accaaattgg cattttcggg caagggacag ccaaggcccc ggagtggctg atcacccatt\n+    32341 tccagggcac gaactccgtt ctgcacgaat accacgccac gccgttctca ttcggtacag\n+    32401 cacccagaac tgggatcatc tgggcttcgc agacagtgtt cgagacgagg gccgtggcgc\n+    32461 ccccacctgt acgtcctgac gatacggagg acatgaagtt tttgacggag cgacggacag\n+    32521 atttctcggc cgggtcttgg ggacatatgt ttcctggtac agaggtattg attgagacgg\n+    32581 cctatggggc ggatcatttt agtttgctgg tgagtcttct cttccgtgat taa\n+//\n'
b
diff -r 000000000000 -r 1625927fc16f test-data/BGC0001866.fna
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/BGC0001866.fna Sun Nov 21 16:53:12 2021 +0000
b
b'@@ -0,0 +1,556 @@\n+>BGC0001866.1 Byssochlamys spectabilis strain CBS 101075 chromosome Unknown C8Q69scaffold_14, whole genome shotgun sequence\n+GTGCTAATTCTATAAGGCTCTCTATATAAGGCTCTATATACTATAGAAGCTGCGGATAGC\n+TATTTTTCTTATAGTCAAAATGAATTTTAATTAATAGTTATTACGAGGAGGGAACTAGAG\n+GATCTAAATACACGATTCAAATGACGCGGTAAATAGCCTCCGTGGTCGTCGTTTTCGGCG\n+TCATTTTCTCACATTCTATGTGTAAGTAAACTAGACACCATTTTGTACCGTGGCTTCTTG\n+CATCCTACCTTAATCTGCTCCATGGCAATCCACTCAACCTTTCATGCTAGTTGATATATT\n+CAATGTCGTTATTGGTGGAGGCGTATGTTATGAACTATGAAAGTGCTTACATCCGCTTAG\n+TCTCCTCGGACTTCCATGCTTCCTTGTCCATTGAGAAACGATCCCTCTCATCTTCTTGGG\n+ACAAAACATAATATGTTCCTTGCATCCTTCGTTGCCACAGACGCCACAGACCCCCAATGA\n+AAAGCGACAGTAGAACAACCGCAGCTCCTCCCGCCCATACCAAACTCTCCGGCCGCAGAA\n+TCTTCGATACAAGACCCCAAAACCCCGCTGGGCCTGGCAGTTCGTGTTCCCGAAACCCAT\n+GGAAACCCTGCGTATACTCCGGCGGAACGTGGGGCCACTTATCTACACCGTCTAGGATAG\n+CCATTGCCATTCCATTATCCAGATGCATCTCGAAATGGCAGTGTAACAGCCACGCGCCTG\n+GCGACGTAACCTGGTACCGCAGCACAACCCACATCGCTCCCGTGAACTCTGTCAGGAACG\n+TGTCTCGGTATGGGGGATTCACCATGTTAAAGCTTTCGGGCTGGTCGGCCATTGCTTCTT\n+CTACCGAGGAATAGTTCCACAGGCCTTGCCCCCCTCCAATCCGCCAGTATTTGTTCGCGT\n+GTTTATGGACCGCGTGCGGGAAGTCGACTGGCCATAATGGTGAATGACCTACCTGCAGTA\n+CGATGTCTTGCCAGGTGCCGTTCTGGGTTCGGATGATCAAATCGTCATCATGGGCCTGCT\n+CGGAATTGACGTTGTAAAGAAGGGGTTCGTAGGCTGAGCGGTCTGGGGGATATTTTTTCT\n+TCCCACTCAGAGTAAATTCCCAAGTGGAGTTAGCCTTGCCCAAGTACAACACATGCATAT\n+CATCTCCGTGGGCAGCTGGAACTTTTCTCGGCCAAGGAGGCAGGTCAAGCTTGTCTAGGA\n+AAACAGCATCTCGCATCGGCCAAGCGTTGTAATCGAAATAGGGCTTAGATATAGTCGTGA\n+CACCAAATCGGTCTGGTTTCTGACGGGCGTTGGGGTCTCCATTTTTGTATCGGAGGATCC\n+CGAATGCTGCGATCATCTGGGAATATCCACCATCTGGGACACGGATCGAGTAGTCCATGG\n+GCTTCTTATCCAGTCTGATCATGGCTGAATATCGCTCTCCGGCCCAAATCAGGAAGGTGT\n+CTGCCCGGCGAGGCTCGATGTAATGGCCGTCGACTTCATAGATCCACATTTCGTGCTCAT\n+CGATTGTTGGCTGGAGGGTCTTGAATGTCGAGCCTCCGATCCAGTTCACACTCGCCCAGC\n+GGTCTGCCGGGTCAACCTCGATTGCCTCTGTTGGACCGGAGTAGGGAACACAGCCTTCCT\n+GGAGACCGGGCGGGATGGCGGACACGTTCCCGTCTGCGAGCCACGGACCTTCTGTCGATG\n+GTACGAATGGGAAGCACCTATACGATAATCAGAGACATGAACTATACCGATGATGAATCT\n+CCTGGCCGCTCACCCTTTGTCTGTGATAGTGTCATTCGGCCAGCTCTTGTGCATGAACGG\n+GATCTGCTTGTCAATAAGCCATTGATGGCCAGGACAATAGACGCTCCCTTTGCCGTTGAG\n+AAGAATACTATCGACGCAACTGATTTTATAAGTCAGTCGACTAGAAACGTACTCATATCA\n+GTACACTGGAATCATACAAGATCTGCAGTCGGCTGTTTGCATCTGCTTCTTTATACTGGG\n+TCGAGTTGTAGTAGTCCCAGTCAGAAAGCATCATGATGTGAGGATTGTTGCTTGCGCGCT\n+CCATGGCAGCAATGTCCTCGGGATCTTCGCTGATCATTGCCCAGGGTCCAGCGGTGCCTG\n+GCTTTCGGCTGCCTTCTGTGAGCTACATCCAACAATAGCAGAGAGAAAACAAACCGAATG\n+AACAATGCACCATATAAGCCATCCAGCAGAGTAGCCCGAGAATGAGAGTGGTACCTGATT\n+CTGTCAGTCTACTTCTGACTGGGTGTGTTTCTCATGACGACCTACCAGTATTGACCAGGG\n+GGGTAGGCAGTAAAGCGATAGACGTAGCTCTCACCAGGCTCTATAGGCTTCTGACTCAGA\n+CCTGGAACTCCATCTGACCACGGCGTATCTTGCATCCTTCATTACACCATTAGCACCTGG\n+CGAAATATCGAAGGTTGAATAGGTCTACTCACAAAATTCCATGCCAGTGGATTGTGGTGT\n+TCTCATGCATGTAATTACGAACAACAATCTGTTCTTCGTCAGTCCTGCAGCCTCAATTGG\n+CGAAAGATATGCCACCGACTTCAACGTCGTCGCCCTCGTCGAATATAAGCGTCGGAGATG\n+GAAATTGACCATTTGTCTTGATCATCTGCCGAGATTGGCCGTTGGGAGCGCCCTCTTCCC\n+AGGTGAGCACTAGCTCGCGTTGGACATGGGCGCCCAGGACGAGCAATACATTGCAGGCCA\n+GCAGTAAAAGATAGGCCGGCATTGTGAAAGAATGGATAGCCTGATTAGTTCAGTTGCGGT\n+TATGCCTGGTATTAATATCACGGTTGGGATACTTGGGGCTGAAGAACTCATTAAGCCCCG\n+GTTCTTCGGGGACCGAAAACATTGCCAACCTCACTGCCACAATACTCGTACTCTTCGGAT\n+TGAAACAGTTACTGAGTGAAATTAATGTTATTTAATAGCGAAGTTGGAGTCGAAGAACAT\n+GTCGTACTCTGGTCATTCCAAGAAACAACGTCGATTACTATGGCAGAGGAAATCAAGCTG\n+ACTCCCCTGGAGACCTTCGCACAGGCAATCAGTGCCTCTGCGAAGACTATTGCAACTTAC\n+TGCAGAGACTCCGGTCATCCTCAACTGTCCGATGATAATTCTAGCGGCCTCACTGGGGAT\n+GTTCTCCCCCCTTCCGCACCACAGGCAGTCACCGCCGCCAGACAGACCATCTTGGAGGCA\n+TCGTACCGACTACAGCAATTGGTCACTGAGCCTAGCCAATACCTGCCGCGACTGACCGTT\n+TACGTGAGTGTTGAACAATCTCCCATGAAAGATCAAACTAACGACAGAAAAGCCCCAGCA\n+CCTGGCTGCCTTACGCTGGCTGTGCCATTTCAGAATCCCGGAGCTCATCCCCGTGCAAGG\n+CACCAGGACATACTATGAGCTGGCTACAGAAGCCAAAGTTCCTCTTCATCAACTGCAGAG\n+CATTGCAAGAATGGCAATTACTGGGAGCTTTCTCCGAGAGCCGGAGCCCAATATCGTCGC\n+CCACAGCAGGACGTCAGCCCATTTTGTTGAGAATCCTTCGCTCCGTGACTGGACACTATT\n+CCTGGCAGAGGATACCGCGCCCATGGCGATGAAGCTTGTTGAGGCGACTGAAAAGTGGGG\n+AGACACGAGGAGCAAGACAGAGACGGCCTTTAACCTGGCGCTGGGCACGGATCTGGCCTT\n+CTTCAAGTATCTTTCCAGCAACCCGCAGTTCACCCAGAAATTCTCGGGATATATGAAAAA\n+TGTGACAGCGA'..b'ATACTACAAGCACTTCGCCAAGCCAGCACAATGAATCTTG\n+TCCATGACAGCAGCGTAGTCATGGAGTTTGGACCACATCCTGTCGTATCAGGCATGGTGA\n+AATCAACGCTGGGGAACAGCATCAAGGCACTTCCCACTCTGCAACGGAACCGAAACACCT\n+GGGAAGTACTCACGGAGAGCGTGTCAACACTATACTGTATGGGATTCGACATCAACTGGA\n+CCGAGTACCATCGAGATTTTCCATCATCGCAGCGTGTCTTGCGACTCCCATCGTACTCCT\n+GGGATCTGAAGTCGTACTGGATTCCGTACCGGAATGATTGGACTCTGTACAAGGGCGATA\n+TTGTGCCTGAATCAAGCATCGCGCTGCCAACCCACCAAAACAAGCCACACAGTACATCGC\n+CGAAACAGCAAGCACCGACACCAATCCTGGAGACGACAACATTACACCGGATTGTGGACG\n+AGAAGTCCACCGAAGGGACGTTTTCAATCACATGCGAGTCAGATGTATCCCGACCAGACC\n+TCAGCCCTCTGGTTCAGGGCCATAAGGTCGAAGGGATCGGACTTTGTACACCGGTATGAA\n+TCTCCACACTCATGTTCGCTGCGCAGCATAATCACTGACTCCTTCTGCAGTCCGTTTATG\n+CCGATATAGGATTCACGCTGGGAAATTACCTTCTAGATCGTTTCCCAACTCGATTCGGAC\n+CGGATACTAAAGTTGTGGATGTCACGGACATGGTGATTGAAAAGGCTCTTATGCCGTTGA\n+ATGCGGGACCACAATTACTGCGAGTCACGGCTTCATTAATCTGGTCCGAGAAAGAGGCTT\n+CTGTCCGGTTCTACAGCGTGGATGTAAGACGTCCCTCTTCTAAATCTCAGATGAATACTA\n+ATATTCATAATTCCCAGGAAAATCACACCGAAACAGTACAACATTCCCACTGCCGCATTA\n+AATTCAGCGACCGTTCAACGTACCAAGCCTATCAAGAGCAAATCTCCGCCGTTAAGGCTC\n+GTATGTTTGAGATGAAGACCAACTCCTCATCGGGTAGAACCTACCGATTCAACGGACCAA\n+TGGCATACAATATGGTGCAGGCGTTGGCGGAATTCCACCCGGATTACCGGTGTATTGACG\n+AGACGATTCTCGACAACGAGACACTCGAAGCAGCCTGTACAGTCAGCTTCGGGAATGTCA\n+AGAAGGAGGGTGTATTCCACACACATCCTGGCTATATAGATGGACTCACGCAGTCGGGCG\n+GGTTTGTGATGAACGCTAACGACAAGACTAATCTCGGAGTAGAAGTGTTCGTTAATCATG\n+GGTGGGACTCGTTCCAGTTGTACGAGCCTGTCACTGATGATCGTTCGTATCAGACTCATG\n+TTCGGATGAGGCCGGCGGAGTCGAATCAGTGGAAGGGTGATGTGGTCGTTCTAAGTGGGG\n+AGAATTTGGTCGCTTGTGTTCGAGGATTGACGGTAAGTCGAGAGACCTAAGTAACAATCT\n+CCTGTTTAGAGGAGAAAAAAGAAAGAGAAAGCGGATTTGCTGACTACCTTCCAGATCCAA\n+GGAGTACCCAGGCGAGTCCTGCGGTATATCCTGCAAAGCAGTGCAAAAACCACACAGACA\n+GCCACTTCGAGCGTGCCTGCCCCGTCTCAAGCTCCGGTGATGGTGCCACAGATTGTCCAA\n+GTACCAAAAGCTAAGCCTATCTCCCAAATTTCCGGGACCCTGACAGAGGCTCTCCGGATT\n+ATTTGTGAACAAAGTGGTGTGCCTCTAGCAGAGCTCACGGATGATGCAACTTTCGCGAAC\n+ATCGGCGTAGACTCTCTCCTAGCGCTGACTATCACAAGTGCATTTGTTGAGGAGCTGGAT\n+CTAGACGTCGATTCTTCCTTGTTCATGGACTATCCTACTGTGGCGGACCTGAAGCGGTTC\n+TTCGACAAGATCAACACGCAGCATGCTCCGGCACCAGCCCCGGTATCAGACGCGCCAAAG\n+CAATTACAACCAAGCAGTAGCCCAGTTGCATCTGCTACTCCGTCTGCACCCATCCATGGC\n+AGATCGAAATTTGAATCAGTTCTTAACATCCTTACCGAGGAAAGTGGTGTTGAAATGGCA\n+GGTCTTCCGGACTCTACTGCGCTTGCAGACATAGGTATCGATTCGCTCTTGTCCCTGGTA\n+GTCACGAGCCGGCTGAACGATGAGTTAGAGCTAGATGTGTCGTCTGAAGACTTCAATGAC\n+TGTCTGACTATCCGGGATCTCAAGGCACATTTCATGTCCAAGAACTCCGACAATGGTTCG\n+TCTGCGGTTCTTACTCCTCAGCCATCTCGGGACTCCGCACTCCCTGAGCGCACGAGACCT\n+AGGGTCGCTGATACAAGCGATGAGGAGGATGCACCGGTTTCAGCAAATGAATTCACAACC\n+AGTGCCCGCTCTACATCTAAGTATATGGCTGTGCTCAACATAATTTCCGAAGAAAGCGGC\n+ATGGCAATCGAAGACTTCACCGACAATGTAATGTTCGCAGATATCGGAATAGACTCGCTG\n+CTGTCCTTGGTCATTGGAGGTAGAATACGGGAAGAGCTATCTTTCGACCTCGAGGTGGAC\n+TCTCTTTTCGTGGACTACCCAGATGTCAAGGGACTGAGGTCATTTTTCGGATTTGAGAGC\n+AACAAGACGGCGACAAATCCAACTGCGAGTCAATCGTCTTCGTCCATTTCAAGCGGCACT\n+TCGGTCTTCGATACATCACCTTCTCCCACAGACTTAGACATCCTAACTCCAGAATCCAGC\n+CTCTCACAAGAGGAGTTCGAGCAACCGCTCACAATAGCAACAAAGCCACTTCCACCCGCA\n+ACTTCAGTCACTCTGCAGGGTTTACCCTCCAAGGCACACAAGATACTTTTCCTTTTCCCA\n+GATGGCTCTGGCTCAGCAACATCATACGCGAAACTCCCCCGACTCGGTGCGGACGTAGCC\n+ATTATCGGCCTGAACTCACCCTACCTGATGGACGGCGCCAACATGACCTGCACCTTCGAC\n+GAGCTCGTTACACTGTACCTCACAGAAATCCAGCGACGTCAACCCGCAGGCCCATACCAC\n+TTGGGCGGCTGGTCCGCCGGTGGCATTCTCGCTTACCGCGCTGCGCAAATCCTCCAAAAA\n+GCCGCCGCCAACCCCCAGAAACCAGTAGTAGAATCCCTGCTCCTCCTCGACTCTCCACCA\n+CCAACAGGGCTCGGCAAGCTCCCCAAACATTTCTTTGACTACTGTGACCAAATTGGCATT\n+TTCGGGCAAGGGACAGCCAAGGCCCCGGAGTGGCTGATCACCCATTTCCAGGGCACGAAC\n+TCCGTTCTGCACGAATACCACGCCACGCCGTTCTCATTCGGTACAGCACCCAGAACTGGG\n+ATCATCTGGGCTTCGCAGACAGTGTTCGAGACGAGGGCCGTGGCGCCCCCACCTGTACGT\n+CCTGACGATACGGAGGACATGAAGTTTTTGACGGAGCGACGGACAGATTTCTCGGCCGGG\n+TCTTGGGGACATATGTTTCCTGGTACAGAGGTATTGATTGAGACGGCCTATGGGGCGGAT\n+CATTTTAGTTTGCTGGTGAGTCTTCTCTTCCGTGATTAAGTTGCGAATACTAATAGAGGC\n+TATAGCAGGAGGAACCCTATAAGGGTGCCGTCAGGGCGTTCATGTCTCGAGTTTTGCAGT\n+TATAAGGGCTAGAGGAGCAGAGGTTGGTGGCAATAAAGTCGTCCTCACTGCTGGGTACAT\n+TCATTTGGATGAATTCTTCTTTTTTCGTCGTGTTTTCATTACTGTATGTATTTTGATGTT\n+GGGTTATACCTCTAGGTCGGGATAACGCTTTTCGGCTGTGGCATGACAACCGGAATATAT\n+ATAATAGAACAATCCTATGTACATCTTTGCTGTGCTTACACGACGCACAG\n'
b
diff -r 000000000000 -r 1625927fc16f test-data/clusters.tsv
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/clusters.tsv Sun Nov 21 16:53:12 2021 +0000
b
@@ -0,0 +1,2 @@
+sequence_id bgc_id start end average_p max_p type alkaloid_probability polyketide_probability ripp_probability saccharide_probability terpene_probability nrp_probability other_probability proteins domains
+BGC0001866.1 BGC0001866.1_cluster_1 347 32979 0.9969495815733557 0.9999999447224028 Polyketide 0.0 0.98 0.0 0.0 0.0 0.14 0.0 BGC0001866.1_1;BGC0001866.1_2;BGC0001866.1_3;BGC0001866.1_4;BGC0001866.1_5;BGC0001866.1_6;BGC0001866.1_7;BGC0001866.1_8;BGC0001866.1_9;BGC0001866.1_10;BGC0001866.1_11;BGC0001866.1_12;BGC0001866.1_13;BGC0001866.1_14;BGC0001866.1_15;BGC0001866.1_16;BGC0001866.1_17;BGC0001866.1_18;BGC0001866.1_19;BGC0001866.1_20;BGC0001866.1_21;BGC0001866.1_22;BGC0001866.1_23 PF00106;PF00107;PF00109;PF00135;PF00394;PF00550;PF00698;PF00743;PF00891;PF00975;PF02801;PF06609;PF07690;PF07731;PF08241;PF08242;PF08493;PF08659;PF13434;PF13489;PF13649;PF13847;PF14765;PF16073;PF16197
b
diff -r 000000000000 -r 1625927fc16f test-data/features.tsv
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/features.tsv Sun Nov 21 16:53:12 2021 +0000
b
@@ -0,0 +1,38 @@
+sequence_id protein_id start end strand domain hmm i_evalue pvalue domain_start domain_end bgc_probability
+BGC0001866.1 BGC0001866.1_1 347 1489 - PF00394 Pfam 2.1941888078432915e-08 8.178117062405111e-12 1 63 0.9852038761627908
+BGC0001866.1 BGC0001866.1_1 347 1489 - PF07731 Pfam 3.9374169295176556e-23 1.467542649838858e-26 150 281 0.9852038761627908
+BGC0001866.1 BGC0001866.1_6 3946 4389 + PF00891 Pfam 4.743887678074703e-16 1.7681280946979883e-19 17 121 0.9910535094227727
+BGC0001866.1 BGC0001866.1_7 4683 5138 + PF00135 Pfam 4.674605664377319e-21 1.7423055029360116e-24 48 140 0.9913598896683397
+BGC0001866.1 BGC0001866.1_8 5384 5812 + PF00135 Pfam 3.9706994470948554e-30 1.4799476135277136e-33 2 114 0.9925093258822111
+BGC0001866.1 BGC0001866.1_9 5823 6599 + PF00135 Pfam 1.4185801852307574e-15 5.287291037013632e-19 2 209 0.9946019708257335
+BGC0001866.1 BGC0001866.1_10 7758 9029 + PF13434 Pfam 5.777178703900199e-08 2.153253337271785e-11 13 124 0.9978201609931655
+BGC0001866.1 BGC0001866.1_10 7758 9029 + PF00743 Pfam 5.089108077410868e-07 1.8967976434628658e-10 36 102 0.9978201609931655
+BGC0001866.1 BGC0001866.1_13 11550 12662 + PF07690 Pfam 5.839871260376694e-37 2.1766199255969786e-40 1 362 0.9990971143689635
+BGC0001866.1 BGC0001866.1_13 11550 12662 + PF06609 Pfam 9.543170598318239e-09 3.55690294383833e-12 17 244 0.9990971143689635
+BGC0001866.1 BGC0001866.1_15 14920 15912 + PF08493 Pfam 2.6165794251055913e-17 9.752439154325723e-21 139 224 0.9999977987864139
+BGC0001866.1 BGC0001866.1_16 17173 19143 + PF00109 Pfam 9.025888536170949e-60 3.364103069761815e-63 2 248 0.9999994272691842
+BGC0001866.1 BGC0001866.1_16 17173 19143 + PF02801 Pfam 2.2171445990751238e-35 8.263677223537547e-39 257 368 0.9999994272691842
+BGC0001866.1 BGC0001866.1_16 17173 19143 + PF16197 Pfam 3.8698172759236842e-25 1.4423471024687604e-28 371 487 0.9999994272691842
+BGC0001866.1 BGC0001866.1_16 17173 19143 + PF00698 Pfam 1.0799913424517567e-26 4.025312495161225e-30 512 648 0.9999994272691842
+BGC0001866.1 BGC0001866.1_17 19152 22424 + PF00698 Pfam 2.639223271303753e-16 9.836836642950999e-20 2 151 0.9999940983719267
+BGC0001866.1 BGC0001866.1_17 19152 22424 + PF14765 Pfam 2.520598829779557e-60 9.394703055458656e-64 228 504 0.9999940983719267
+BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13489 Pfam 1.0131254482174088e-12 3.776091868123029e-16 661 817 0.9999940983719267
+BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13847 Pfam 8.939870258494623e-11 3.332042586095648e-14 666 776 0.9999940983719267
+BGC0001866.1 BGC0001866.1_17 19152 22424 + PF13649 Pfam 2.319131521369124e-13 8.643799930559537e-17 667 764 0.9999940983719267
+BGC0001866.1 BGC0001866.1_17 19152 22424 + PF08242 Pfam 3.6288099491186147e-22 1.3525195486837923e-25 668 766 0.9999940983719267
+BGC0001866.1 BGC0001866.1_17 19152 22424 + PF08241 Pfam 5.245291385894328e-12 1.9550098344742185e-15 668 767 0.9999940983719267
+BGC0001866.1 BGC0001866.1_18 22762 23235 + PF00107 Pfam 1.0960342036668699e-15 4.085106983476965e-19 12 117 0.9999176675645223
+BGC0001866.1 BGC0001866.1_19 23268 24623 + PF08659 Pfam 1.5141662612831146e-61 5.643556695054471e-65 65 239 0.9999724741067139
+BGC0001866.1 BGC0001866.1_19 23268 24623 + PF00106 Pfam 1.1379002942545491e-07 4.2411490654288077e-11 68 221 0.9999724741067139
+BGC0001866.1 BGC0001866.1_19 23268 24623 + PF00550 Pfam 3.359618716013185e-10 1.2521873708584363e-13 384 437 0.9999724741067139
+BGC0001866.1 BGC0001866.1_20 25769 26056 + PF16073 Pfam 1.3071857188363548e-23 4.872104803713585e-27 8 94 0.999988513111687
+BGC0001866.1 BGC0001866.1_21 26544 29999 + PF16073 Pfam 8.208876065249628e-11 3.059588544632735e-14 2 47 0.9999999447224028
+BGC0001866.1 BGC0001866.1_21 26544 29999 + PF00109 Pfam 2.667462237983852e-82 9.942088102809735e-86 178 426 0.9999999447224028
+BGC0001866.1 BGC0001866.1_21 26544 29999 + PF02801 Pfam 2.4031043351141288e-34 8.956780973217029e-38 434 555 0.9999999447224028
+BGC0001866.1 BGC0001866.1_21 26544 29999 + PF16197 Pfam 2.535893425129411e-07 9.451708628883381e-11 567 673 0.9999999447224028
+BGC0001866.1 BGC0001866.1_21 26544 29999 + PF00698 Pfam 4.597134671955754e-38 1.7134307387088164e-41 709 1012 0.9999999447224028
+BGC0001866.1 BGC0001866.1_22 30150 30890 + PF14765 Pfam 7.778696660229127e-11 2.8992533209948296e-14 39 244 0.9999460955852995
+BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 5.884377030377924e-14 2.193207987468477e-17 67 128 0.9997314383315643
+BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 3.9212317886052276e-10 1.461510170930014e-13 174 238 0.9997314383315643
+BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00550 Pfam 1.367829688372301e-08 5.098135252971677e-12 299 360 0.9997314383315643
+BGC0001866.1 BGC0001866.1_23 30937 32979 + PF00975 Pfam 6.711355516947163e-24 2.5014370171252933e-27 443 550 0.9997314383315643