annotate blast_annotations_processor.py @ 2:9ca209477dfd draft default tip

planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
author onnodg
date Mon, 15 Dec 2025 16:43:36 +0000
parents a3989edf0a4a
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
1 """Galaxy-compatible BLAST annotation processor.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
2
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
3 This script processes a single annotated BLAST file along with a FASTA file
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
4 containing unannotated reads. It generates multiple types of outputs for
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
5 integration with Galaxy workflows:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
6
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
7 - E-value distribution plots (PNG)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
8 - Taxonomic composition reports (text)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
9 - Circular taxonomy diagram data (JSON)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
10 - Header annotations with merged and per-read information (Excel)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
11 - Annotation statistics summary (text)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
12
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
13 Main workflow:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
14 1. Parse command-line arguments.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
15 2. Load annotated BLAST results and unannotated FASTA headers.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
16 3. Group BLAST hits per query (q_id), filter by thresholds.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
17 4. Resolve taxonomic conflicts with uncertainty rules.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
18 5. Generate requested outputs.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
19
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
20 Notes:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
21 - Headers in BLAST and FASTA should correspond.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
22 - Uses matplotlib, pandas, and openpyxl for visualization and reporting.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
23 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
24
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
25 import argparse
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
26 import json
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
27 import os
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
28 from collections import OrderedDict
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
29 from collections import defaultdict
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
30
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
31 import matplotlib.pyplot as plt
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
32 import numpy as np
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
33 import pandas as pd
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
34
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
35 # Default taxonomic levels
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
36 TAXONOMIC_LEVELS = ["K", "P", "C", "O", "F", "G", "S"]
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
37
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
38
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
39 def parse_arguments(arg_list=None):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
40 """Parse command line arguments for cluster processing."""
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
41 parser = argparse.ArgumentParser(description='Process BLAST annotation results for Galaxy')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
42
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
43 parser.add_argument('--input-anno', required=True, help='Annotated BLAST output file (tabular format)')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
44 parser.add_argument('--input-unanno', required=True, help='Unannotated sequences file (FASTA format)')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
45
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
46 parser.add_argument('--eval-plot', help='Output path for E-value plot (PNG)')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
47 parser.add_argument('--taxa-output', help='Output path for taxa report (tabular)')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
48 parser.add_argument('--circle-data', help='Output path for circular taxonomy data (txt)')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
49 parser.add_argument('--header-anno', help='Output path for header annotations (tabular/xlsx)')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
50 parser.add_argument('--log', help='Output path for log file (txt)', required=True)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
51 parser.add_argument('--filtered-fasta', required=True,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
52 help='Filtered fasta file (fasta format) for downstream analysis')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
53
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
54 parser.add_argument('--uncertain-threshold', type=float, default=0.9, required=True,
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
55 help='Threshold for resolving taxonomic conflicts (default: 0.9)')
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
56 parser.add_argument('--eval-threshold', default='1e-10', type=float, required=True,
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
57 help='E-value threshold for filtering results (default: 1e-10)')
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
58 parser.add_argument('--use-counts', action='store_true', default=False, required=False,
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
59 help='Use read counts in circular data')
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
60 parser.add_argument('--ignore-rank', default='unknown', required=False,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
61 help='Ignore rank when containing this text')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
62 parser.add_argument('--ignore-taxonomy', default='environmental', required=False,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
63 help="Don't use taxonomy containing this taxonomy")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
64 parser.add_argument('--bitscore-perc-cutoff', type=float, default=8, required=True,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
65 help='Bitscore percentage cutoff for considered hits')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
66 parser.add_argument('--min-bitscore', type=int, default=0, required=True,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
67 help='Minimum bitscore threshold for hits')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
68 parser.add_argument('-iot', '--ignore-obiclean-type', type=str, default='singleton', required=False,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
69 help='Ignore sequences with this obiclean type')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
70 parser.add_argument('-iit', '--ignore-illuminapairend-type', type=str, default='pairend', required=False,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
71 help='Ignore sequences with this illumina paired end output type')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
72 parser.add_argument('--min-identity', type=int, default=0, required=True,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
73 help='Minimum sequence identity to consider a hit')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
74 parser.add_argument('--min-coverage', type=int, default=0, required=True,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
75 help='Minimum sequence coverage to consider a hit')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
76 parser.add_argument('--ignore-seqids', type=str, default='', required=False,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
77 help='Ignore sequences with this sequence identifier')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
78 parser.add_argument('--min-support', type=int, default=0, required=True,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
79 help='A taxon is kept only if it (or its descendants) have at least N reads assigned.')
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
80
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
81 return parser.parse_args(arg_list)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
82
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
83
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
84 def log_message(log_messages, msg):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
85 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
86 Helper to both print and collect log messages.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
87 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
88 if log_messages is not None:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
89 log_messages.append(msg)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
90
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
91
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
92 def list_to_string(x):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
93 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
94 Convert a list, pandas Series, or numpy array to a comma-separated string.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
95 """
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
96 if isinstance(x, (list, pd.Series, np.ndarray)):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
97 return ", ".join(map(str, x))
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
98 return str(x)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
99
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
100
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
101 def make_eval_plot(e_val_sets, output_path, log_messages):
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
102 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
103 Generate an E-value distribution plot.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
104
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
105 This function aggregates E-values per read, transforms them onto a log axis
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
106 and produces a visual summary of the distribution of best hits across reads.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
107 If no E-values are available, no plot is created.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
108
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
109 :param e_val_sets: Set of E-values per read.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
110 :type e_val_sets: list[set[str]]
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
111 :param output_path: Output PNG file.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
112 :type output_path: str
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
113 :param log_messages: Log collection list.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
114 :type log_messages: list[str]
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
115 :return: None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
116 :rtype: None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
117 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
118 if not e_val_sets:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
119 log_message(log_messages, "No E-values to plot")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
120 return None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
121
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
122 processed_sets = sorted([sorted(float(e_val) for e_val in e_val_set) for e_val_set in e_val_sets])
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
123
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
124 bar_positions = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
125 bar_heights = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
126
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
127 for i, e_set in enumerate(processed_sets):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
128 if len(e_set) == 1:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
129 e_set.append(0)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
130 for j, e_val in enumerate(e_set):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
131 bar_positions.append(i + j * 0.29)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
132 if e_val != 0:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
133 bar_heights.append(1 / e_val)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
134 else:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
135 bar_heights.append(e_val)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
136
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
137 plt.figure(figsize=(21, 9))
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
138 plt.bar(bar_positions, bar_heights, width=0.29, color=['blue', 'red'])
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
139
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
140 plt.yscale('log')
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
141 plt.grid(axis='y', linestyle='--', color='gray', alpha=0.7)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
142 plt.ylabel("e-values")
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
143 plt.xticks(ticks=range(len(processed_sets)), labels=[f'{i + 1}' for i in range(len(processed_sets))])
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
144
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
145 plt.tight_layout()
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
146 plt.savefig(output_path, format='png')
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
147 plt.close()
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
148 return None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
149
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
150
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
151 def calculate_annotation_stats(anno_count, unanno_file_path, unique_anno_count, total_unique_count):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
152 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
153 Compute annotation statistics for a dataset.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
154
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
155 This function calculates summary statistics for annotated and unannotated
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
156 sequences in a dataset. It counts the total number of sequences in the
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
157 unannotated file (FASTA-style, based on lines starting with '>'), and
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
158 computes the percentage of annotated sequences and unique annotated
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
159 sequences relative to their totals.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
160
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
161 :param anno_count: Total number of annotated sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
162 :type anno_count: int
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
163 :param unanno_file_path: Path to a FASTA file containing unannotated sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
164 :type unanno_file_path: str
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
165 :param unique_anno_count: Number of unique annotated sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
166 :type unique_anno_count: int
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
167 :param total_unique_count: Total number of unique sequences in the dataset.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
168 :type total_unique_count: int
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
169 :return: Dictionary containing:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
170 - 'percentage_annotated' (float): Percentage of annotated sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
171 - 'annotated_sequences' (int): Number of annotated sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
172 - 'total_sequences' (int): Total number of sequences (annotated + unannotated).
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
173 - 'percentage_unique_annotated' (float): Percentage of unique annotated sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
174 - 'unique_annotated' (int): Number of unique annotated sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
175 - 'total_unique' (int): Total number of unique sequences.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
176 :rtype: dict[str, float | int]
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
177 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
178 total_sequences = 0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
179 with open(unanno_file_path, 'r') as f:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
180 total_sequences = sum(1 for line in f if line.startswith('>'))
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
181
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
182 percentage_annotated = (anno_count / total_sequences * 100) if total_sequences > 0 else 0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
183 percentage_unique_annotated = (unique_anno_count / total_unique_count * 100) if total_unique_count > 0 else 0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
184
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
185 return {'percentage_annotated': percentage_annotated, 'annotated_sequences': anno_count,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
186 'total_sequences': total_sequences, 'percentage_unique_annotated': percentage_unique_annotated,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
187 'unique_annotated': unique_anno_count, 'total_unique': total_unique_count}
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
188
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
189
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
190 def resolve_tax_majority(taxon_counts, uncertain_threshold):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
191 total_count = sum(taxon_counts.values())
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
192 if total_count == 0:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
193 return "Uncertain taxa", "Uncertain taxa"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
194 most_common_taxon, count = max(taxon_counts.items(), key=lambda x: x[1])
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
195 # Use most common if above threshold
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
196 if count / total_count >= uncertain_threshold:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
197 return most_common_taxon, most_common_taxon
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
198 conflicting_taxa = list(taxon_counts.keys())
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
199 conflicting_levels = [taxon.split(" / ") for taxon in conflicting_taxa]
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
200
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
201 # Resolve uncertainty per level
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
202 for level_idx in range(min(len(level) for level in conflicting_levels)):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
203 current_level_names = {level[level_idx] for level in conflicting_levels}
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
204 if len(current_level_names) > 1:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
205 lowest_common_path = conflicting_levels[0][:level_idx + 1]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
206 lowest_common_path[-1] = 'Uncertain taxa'
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
207 small_output = " / ".join(lowest_common_path)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
208 min_conflicting_level = min(len(level) for level in conflicting_levels)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
209 uncertain_path = conflicting_levels[0][:level_idx + 1]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
210 uncertain_path[-1] = 'Uncertain taxa'
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
211 for _ in range(level_idx + 1, min_conflicting_level):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
212 uncertain_path.append('Uncertain taxa')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
213 return small_output, " / ".join(uncertain_path)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
214 return conflicting_taxa[0], conflicting_taxa[0]
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
215
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
216
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
217 def process_taxa_output(taxa_dicts, output_path, uncertain_threshold):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
218 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
219 Generate a hierarchical taxa summary file.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
220
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
221 This function resolves best taxonomic assignments per read and aggregates
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
222 counts into taxonomic levels, producing a readable text report listing
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
223 percentages, counts and resolved taxonomy.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
224
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
225 :param taxa_dicts: Best taxa dictionaries per read.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
226 :type taxa_dicts: list[dict]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
227 :param output_path: Path of output report.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
228 :type output_path: str
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
229 :param uncertain_threshold: Confidence value for majority voting.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
230 :type uncertain_threshold: float
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
231 :return: None
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
232 :rtype: None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
233 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
234 uncertain_dict = {level: 0 for level in TAXONOMIC_LEVELS}
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
235 aggregated_counts = defaultdict(int)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
236
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
237 for read_taxa in taxa_dicts:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
238 if not read_taxa:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
239 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
240 resolved_taxon, _ = resolve_tax_majority(read_taxa, uncertain_threshold)
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
241
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
242 # Add counts for resolved taxon
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
243 levels = resolved_taxon.split(" / ")
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
244 for i in range(1, len(levels) + 1):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
245 aggregated_counts[" / ".join(levels[:i])] += 1
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
246
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
247 total_count = sum(value for key, value in aggregated_counts.items() if len(key.split(" / ")) == 1)
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
248
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
249 report_lines = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
250
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
251 for taxonomy, count in sorted(aggregated_counts.items(), key=lambda x: x[0]):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
252 levels = taxonomy.split(" / ")
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
253 indent = " " * (len(levels) - 1) * 2
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
254 taxon_name = levels[-1]
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
255 taxon_level_code = TAXONOMIC_LEVELS[len(levels) - 1] if len(levels) <= len(TAXONOMIC_LEVELS) else "U"
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
256 percentage = (count / total_count) * 100 if total_count > 0 else 0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
257
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
258 report_lines.append(f"{percentage:.2f}\t{count}\t{total_count}\t{taxon_level_code}\t{indent}{taxon_name}")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
259
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
260 if taxon_name == 'Uncertain taxa':
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
261 uncertain_dict[taxon_level_code] += count
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
262
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
263 with open(output_path, 'w') as f:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
264 f.write("Uncertain count per taxonomic level" + str(uncertain_dict) + '\n')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
265 f.write('percentage_rooted\tnumber_rooted\ttotal_num\ttaxon_level\tidentification\n')
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
266 f.write("\n".join(report_lines))
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
267
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
268
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
269 def process_header_annotations(taxa_dicts, headers, output_path, uncertain_threshold, source_list, seq_id_list,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
270 log_messages):
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
271 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
272 Create an Excel report summarizing individual and merged read annotations.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
273
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
274 This function converts best taxonomy assignments into a per-read table,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
275 generates an aggregated table grouped by taxon, and saves both tables to
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
276 separate sheets in a single Excel file.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
277
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
278 :param taxa_dicts: Taxonomy assignments per read.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
279 :type taxa_dicts: list[dict]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
280 :param headers: Original FASTA headers.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
281 :type headers: list[str]
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
282 :param output_path: Output Excel file name.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
283 :type output_path: str
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
284 :param uncertain_threshold: Confidence level for taxonomy conflict resolution.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
285 :type uncertain_threshold: float
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
286 :param source_list: Source identifiers per read.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
287 :type source_list: list[list[str]]
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
288 :param seq_id_list: Sequence identifiers per read.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
289 :type seq_id_list: list[list[str]]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
290 :param log_messages: Log collection list.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
291 :type log_messages: list[str]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
292 :return: None
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
293 :rtype: None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
294 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
295 report_lines = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
296 for i, read_taxa in enumerate(taxa_dicts):
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
297 if not read_taxa:
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
298 continue
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
299 _, resolved_taxon_long = resolve_tax_majority(read_taxa, uncertain_threshold)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
300 source = source_list[i][0] if source_list[i] else "N/A"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
301 seq_id = seq_id_list[i][0] if seq_id_list[i] else "N/A"
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
302 header = headers[i] if i < len(headers) else f"Header missing"
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
303
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
304 # Extract count
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
305 try:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
306 header_base, count_str = header.rsplit("(", 1)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
307 count = int(count_str.rstrip(")"))
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
308 except ValueError as e:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
309 log_message(log_messages, f'Failed extracting count: {e}')
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
310 header_base = header
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
311 count = 1
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
312
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
313 report_lines.append(f'{header_base}\t{seq_id}\t{source}\t{count}\t{resolved_taxon_long}')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
314 # temp path is needed to write to a xlsx from galaxy
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
315 temp_tsv_path = 'temp.tsv'
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
316 try:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
317 with open(temp_tsv_path, 'w') as f:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
318 f.write('header\tseq_id\tsource\tcount\ttaxa\n')
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
319 f.write("\n".join(report_lines))
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
320 except PermissionError as e:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
321 log_message(log_messages, f"Unable to write to file, error: {e} file might be opened")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
322
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
323 df = pd.read_csv(temp_tsv_path, sep='\t', encoding="latin1")
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
324 if not df.empty:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
325 taxa_split = df["taxa"].str.split(" / ", expand=True)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
326 max_levels = taxa_split.shape[1] if taxa_split.shape[1] is not None else 7
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
327 level_names = ["kingdom", "phylum", "class", "order", "family", "genus", "species"][:max_levels]
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
328 taxa_split.columns = level_names
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
329
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
330 df_individual = pd.concat([df, taxa_split], axis=1)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
331 df_individual = df_individual.sort_values(['species', 'genus', 'family'], ascending=[True, True, True])
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
332 df_for_merge = df_individual.copy()
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
333
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
334 group_cols = ["taxa"]
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
335 agg_dict = {"seq_id": lambda x: list_to_string(list(x.unique())), "count": "sum",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
336 "source": lambda x: list_to_string(list(x.unique())), }
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
337
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
338 for level in level_names:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
339 if level in df_for_merge.columns:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
340 agg_dict[level] = "first"
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
341
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
342 df_merged = df_for_merge.groupby(group_cols, as_index=False).agg(agg_dict)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
343
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
344 sort_columns = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
345 sort_ascending = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
346
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
347 for level in ['species', 'genus', 'family']:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
348 if level in df_merged.columns:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
349 sort_columns.append(level)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
350 sort_ascending.append(True)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
351
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
352 df_merged = df_merged.sort_values(sort_columns, ascending=sort_ascending)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
353
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
354 temp_path = output_path + ".xlsx"
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
355 os.makedirs(os.path.dirname(temp_path), exist_ok=True)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
356 with pd.ExcelWriter(temp_path, engine='openpyxl', mode='w') as writer:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
357 df_individual.to_excel(writer, sheet_name='Individual_Reads', index=False)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
358 df_merged.to_excel(writer, sheet_name='Merged_by_Taxa', index=False)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
359
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
360 for sheet_name in writer.sheets:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
361 worksheet = writer.sheets[sheet_name]
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
362 for column in worksheet.columns:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
363 max_length = 0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
364 column_letter = column[0].column_letter
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
365 for cell in column:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
366 try:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
367 if len(str(cell.value)) > max_length:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
368 max_length = len(str(cell.value))
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
369 except AttributeError as e:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
370 log_message(log_messages, f"Error {e}, {cell} has no value, something probably went wrong")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
371 pass
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
372 adjusted_width = min(max_length + 2, 50)
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
373 worksheet.column_dimensions[column_letter].width = adjusted_width
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
374 os.replace(temp_path, output_path)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
375 # Clean up temporary file
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
376 os.remove(temp_tsv_path)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
377 else:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
378 log_message(log_messages, "Dataframe empty, no annotation results")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
379
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
380
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
381 def create_circle_data(taxa_dicts, output_path, use_counts, uncertain_threshold):
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
382 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
383 Generate circular taxonomy layer data.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
384
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
385 The function converts resolved taxonomy per read into hierarchical
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
386 cumulative counts that can be visualized as concentric circles.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
387
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
388 :param taxa_dicts: Per-read taxonomy assignments.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
389 :type taxa_dicts: list[dict]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
390 :param output_path: JSON destination file.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
391 :type output_path: str
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
392 :param use_counts: Whether to count total reads or unique taxa.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
393 :type use_counts: bool
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
394 :param uncertain_threshold: Majority resolution threshold.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
395 :type uncertain_threshold: float
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
396 :return: None
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
397 :rtype: None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
398 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
399 aggregated_counts = defaultdict(int)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
400 seen_taxa = set()
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
401
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
402 for read_taxa in taxa_dicts:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
403 if not read_taxa:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
404 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
405 _, resolved_taxon_long = resolve_tax_majority(read_taxa, uncertain_threshold)
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
406
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
407 levels = resolved_taxon_long.split(" / ")
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
408
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
409 if use_counts:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
410 for i in range(1, len(levels) + 1):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
411 aggregated_counts[" / ".join(levels[:i])] += 1
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
412 else:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
413 if resolved_taxon_long not in seen_taxa:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
414 for i in range(1, len(levels) + 1):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
415 aggregated_counts[" / ".join(levels[:i])] += 1
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
416 seen_taxa.add(resolved_taxon_long)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
417
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
418 circle_data = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
419 for level in range(len(TAXONOMIC_LEVELS)):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
420 circle_data.append({"labels": [], "sizes": []})
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
421
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
422 for taxonomy, count in sorted(aggregated_counts.items(), key=lambda x: x[0]):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
423 levels = taxonomy.split(" / ")
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
424 if len(levels) <= len(TAXONOMIC_LEVELS):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
425 circle_data[len(levels) - 1]["labels"].append(levels[-1].strip())
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
426 circle_data[len(levels) - 1]["sizes"].append(count)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
427 with open(output_path, "w") as f:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
428 json.dump(circle_data, f, indent=2)
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
429
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
430
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
431 def read_unannotated_fasta(unanno_file_path, ignore_illuminapairend_type, ignore_obiclean_type, min_support,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
432 filtered_fasta_path, log_messages):
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
433 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
434 Read, filter and optionally rewrite an unannotated FASTA file.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
435
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
436 This function extracts dereplicated sequence counts, applies header filters
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
437 and a minimum support threshold, collects statistics and, if requested,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
438 writes a filtered FASTA file for downstream processing.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
439
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
440 :param unanno_file_path: Path to FASTA file.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
441 :type unanno_file_path: str
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
442 :param ignore_illuminapairend_type: Illumina pair-end status to remove.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
443 :type ignore_illuminapairend_type: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
444 :param ignore_obiclean_type: Obiclean label to remove.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
445 :type ignore_obiclean_type: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
446 :param min_support: Minimum dereplicated read count retained.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
447 :type min_support: int
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
448 :param filtered_fasta_path: Optional path for filtered FASTA output.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
449 :type filtered_fasta_path: str | None
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
450 :param log_messages: Log collection list.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
451 :type log_messages: list[str]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
452 :return: (headers, total_unique_count, invalid_count)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
453 :rtype: tuple[list[str], int, int]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
454 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
455 headers = []
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
456 total_unique_count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
457 invalid_count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
458
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
459 total_headers = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
460 low_support_count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
461 header_filter_count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
462 written_fasta_count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
463
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
464 if not os.path.exists(unanno_file_path):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
465 log_message(log_messages, f"Warning: Unannotated file {unanno_file_path} not found")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
466 return headers, total_unique_count, invalid_count
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
467
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
468 write_fasta = filtered_fasta_path is not None
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
469 filtered_fasta = None
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
470 if write_fasta:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
471 try:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
472 filtered_fasta = open(filtered_fasta_path, "w")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
473 except OSError as e:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
474 log_message(log_messages, f"Error: Cannot open {filtered_fasta_path} for writing: {e}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
475 write_fasta = False
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
476
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
477 write_flag = False
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
478
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
479 with open(unanno_file_path, 'r') as f:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
480 for line in f:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
481 if line.startswith('>'):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
482 total_headers += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
483 header_line = line.rstrip("\n")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
484 header = header_line.split()[0].strip('>')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
485 if "count=" in header_line:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
486 count = int(header_line.split("count=")[1].split(";")[0])
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
487 else:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
488 count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
489
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
490 passes_header_filter = check_header_string(header_line, ignore_illuminapairend_type,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
491 ignore_obiclean_type)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
492
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
493 if passes_header_filter and int(count) >= int(min_support):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
494 headers.append(header)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
495 total_unique_count += count
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
496
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
497 write_flag = True
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
498 if write_fasta:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
499 filtered_fasta.write(line)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
500 written_fasta_count += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
501 else:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
502 # Invalid header (either header filter or low support)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
503 if not passes_header_filter:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
504 header_filter_count += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
505 elif int(count) < int(min_support):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
506 low_support_count += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
507
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
508 invalid_count += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
509 write_flag = False
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
510 else:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
511 if write_flag and write_fasta:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
512 filtered_fasta.write(line)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
513
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
514 if write_fasta and filtered_fasta is not None:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
515 filtered_fasta.close()
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
516 log_message(log_messages, f"Filtered FASTA written succesfully"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
517 f"({written_fasta_count} sequences)")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
518
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
519 log_message(log_messages, f"FASTA: total headers: {total_headers}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
520 log_message(log_messages, f"FASTA: headers kept after filters and min_support={min_support}: {len(headers)}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
521 log_message(log_messages, f"FASTA: removed due to header filters (illumina/obiclean/etc.): {header_filter_count}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
522 log_message(log_messages, f"FASTA: removed due to low dereplicated count (<{min_support}): {low_support_count}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
523 log_message(log_messages, f"FASTA: total invalid (header filter + low support): {invalid_count}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
524
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
525 return headers, total_unique_count, invalid_count
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
526
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
527
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
528 def parse_blast_output(anno_file_path, eval_threshold, ignore_taxonomy, ignore_rank, min_coverage, min_identity,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
529 min_bitscore, ignore_seqids, log_messages):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
530 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
531 Parse a BLAST tabular file and retain high-quality hits.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
532
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
533 This function reads BLAST annotations, applies E-value and quality filters,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
534 excludes invalid taxa, groups remaining hits by query identifier, and logs
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
535 summary statistics.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
536
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
537 :param anno_file_path: Input BLAST result file.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
538 :type anno_file_path: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
539 :param eval_threshold: Maximum allowed E-value.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
540 :type eval_threshold: float
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
541 :param ignore_taxonomy: Taxonomy substring filter.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
542 :type ignore_taxonomy: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
543 :param ignore_rank: Rank substring filter.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
544 :type ignore_rank: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
545 :param min_coverage: Minimum coverage threshold.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
546 :type min_coverage: int
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
547 :param min_identity: Minimum identity threshold.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
548 :type min_identity: int
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
549 :param min_bitscore: Minimum bitscore threshold.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
550 :type min_bitscore: int
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
551 :param ignore_seqids: Comma-separated seqids to ignore.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
552 :type ignore_seqids: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
553 :param log_messages: Log collection list.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
554 :type log_messages: list[str]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
555 :return: Mapping of q_id → hits
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
556 :rtype: OrderedDict[str, list[dict]]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
557 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
558 if not os.path.exists(anno_file_path):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
559 log_message(log_messages, f"Error: Input file {anno_file_path} not found")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
560 return OrderedDict()
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
561
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
562 blast_groups = OrderedDict()
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
563
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
564 total_hits = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
565 kept_hits = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
566 filtered_hits = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
567 invalid_taxon_count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
568 ignored_seqid_count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
569
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
570 try:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
571 with open(anno_file_path, 'r') as f:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
572 for line in f:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
573 if line.startswith("#"):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
574 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
575
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
576 parts = line.strip().split('\t')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
577 if len(parts) < 7:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
578 log_message(log_messages, f"less than 7 parts in line, skipped faulty line: {line}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
579 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
580
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
581 q_id = parts[0]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
582 total_hits += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
583
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
584 try:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
585 e_val = float(parts[6])
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
586 except ValueError as e:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
587 log_message(log_messages, f"Error {e}, while extracting evalue")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
588 filtered_hits += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
589 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
590
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
591 taxon = parts[-1].strip()
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
592 if is_invalid_taxon(taxon, ignore_taxonomy, ignore_rank):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
593 invalid_taxon_count += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
594 filtered_hits += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
595 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
596
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
597 seq_id = parts[2] if len(parts) > 2 else ''
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
598 ignore_seqid_set = set(parse_list_param(ignore_seqids))
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
599 if seq_id in ignore_seqid_set:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
600 ignored_seqid_count += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
601 filtered_hits += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
602 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
603
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
604 identity = parts[4] if len(parts) > 4 else ''
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
605 cov = parts[5] if len(parts) > 5 else ''
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
606 bitscore = float(parts[7]) if len(parts) > 7 and parts[7] else 0.0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
607 source = parts[8] if len(parts) > 8 else ''
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
608
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
609 hit = {'e_val': e_val, 'identity': identity, 'cov': cov, 'bitscore': bitscore, 'source': source,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
610 'taxon': taxon, 'seq_id': seq_id, }
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
611
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
612 if not is_insufficient_hit(hit, min_coverage, min_identity, min_bitscore, eval_threshold):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
613 blast_groups.setdefault(q_id, []).append(hit)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
614 kept_hits += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
615 else:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
616 filtered_hits += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
617 except Exception as e:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
618 log_message(log_messages, f"Error reading BLAST file {anno_file_path}: {e}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
619 return OrderedDict()
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
620
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
621 log_message(log_messages, f"BLAST: total hits read: {total_hits}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
622 log_message(log_messages, f"BLAST: hits kept after quality filters: {kept_hits}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
623 log_message(log_messages, f"BLAST: hits filtered (evalue/coverage/identity/bitscore): {filtered_hits}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
624 log_message(log_messages, f"BLAST: hits removed due to invalid taxon: {invalid_taxon_count}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
625 log_message(log_messages, f"BLAST: hits removed due to ignored seqids: {ignored_seqid_count}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
626
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
627 return blast_groups
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
628
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
629
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
630 def process_single_query(header, hits, bitscore_perc_cutoff):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
631 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
632 Process all BLAST hits for a single query/header and return a summary dict.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
633 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
634 current_read_info = {'taxa_dict': {}, 'header': header, 'e_values': set(), 'identities': set(), 'coverages': set(),
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
635 'sources': set(), 'bitscores': set(), 'seq_id': set(), 'best_taxa_dict': {}}
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
636
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
637 max_bitscore = max(h['bitscore'] for h in hits)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
638 if bitscore_perc_cutoff > 0:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
639 top_bitscore = float(max_bitscore) * (1 - (float(bitscore_perc_cutoff) / 100.0))
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
640 else:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
641 top_bitscore = float(max_bitscore)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
642 current_read_info['best_taxa_dict'] = defaultdict(int)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
643
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
644 for hit in hits:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
645 e_val = hit['e_val']
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
646 identity = hit['identity']
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
647 cov = hit['cov']
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
648 bitscore = hit['bitscore']
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
649 source = hit['source']
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
650 taxon = hit['taxon']
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
651 seq_id = hit['seq_id']
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
652
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
653 # Keep track of hits that pass bitscore threshold
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
654 if bitscore >= top_bitscore:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
655 current_read_info['best_taxa_dict'][taxon] += 1
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
656 current_read_info['e_values'].add(str(e_val))
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
657 current_read_info['identities'].add(identity)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
658 current_read_info['coverages'].add(cov)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
659 current_read_info['sources'].add(source)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
660 current_read_info['bitscores'].add(bitscore)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
661 current_read_info['seq_id'].add(seq_id)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
662
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
663 return current_read_info
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
664
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
665
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
666 def process_single_file(anno_file_path, unanno_file_path, args, log_messages):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
667 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
668 Execute the complete BLAST annotation processing workflow.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
669
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
670 This function performs FASTA parsing, BLAST parsing, taxonomy resolution,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
671 per-read and aggregated reporting, summary statistics computation, and
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
672 generation of optional plot and table outputs.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
673
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
674 All validations, errors and counters are written into the log collector.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
675
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
676 :param anno_file_path: BLAST annotation file.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
677 :type anno_file_path: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
678 :param unanno_file_path: Input FASTA file.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
679 :type unanno_file_path: str
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
680 :param args: Parsed arguments namespace.
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
681 :type args: argparse.Namespace
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
682 :param log_messages: Log collection list.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
683 :type log_messages: list[str]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
684 :return: None
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
685 :rtype: None
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
686 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
687 log_message(log_messages, f"Starting processing for FASTA")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
688 log_message(log_messages, "=== PARAMETERS USED ===")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
689
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
690 skip_keys = {
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
691 "input_anno",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
692 "input_unanno",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
693 "eval_plot",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
694 "taxa_output",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
695 "circle_data",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
696 "header_anno",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
697 "log",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
698 "filtered_fasta",
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
699 }
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
700
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
701 for key, value in vars(args).items():
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
702 if key in skip_keys:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
703 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
704 log_message(log_messages, f"{key}: {value}")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
705
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
706 log_message(log_messages, "=== END PARAMETERS ===")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
707 unanno_headers_ordered, total_unique_count, invalid_count = read_unannotated_fasta(unanno_file_path,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
708 args.ignore_illuminapairend_type, args.ignore_obiclean_type, args.min_support, args.filtered_fasta,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
709 log_messages)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
710
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
711 unanno_set = set(unanno_headers_ordered)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
712
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
713 if not os.path.exists(anno_file_path):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
714 log_message(log_messages, f"Error: Input file {anno_file_path} not found")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
715 with open(args.log, 'w') as f:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
716 print('gaat nog niet goed hoor')
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
717 f.write("\n".join(log_messages))
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
718 return
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
719
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
720 log_message(log_messages, f"Reading BLAST annotations")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
721 blast_groups = parse_blast_output(anno_file_path, args.eval_threshold, args.ignore_taxonomy, args.ignore_rank,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
722 args.min_coverage, args.min_identity, args.min_bitscore, args.ignore_seqids, log_messages)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
723
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
724 extra_blast_qids = [q for q in blast_groups.keys() if q not in unanno_set]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
725 if extra_blast_qids:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
726 log_message(log_messages,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
727 f"Note: {len(extra_blast_qids)} BLAST q_ids not in FASTA (showing up to 10): {extra_blast_qids[:10]}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
728
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
729 read_data = []
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
730 headers = []
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
731 seen_reads = set()
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
732 annotated_unique_count = 0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
733
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
734 total_headers_for_annotation = len(unanno_headers_ordered)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
735 reads_with_hits = 0
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
736
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
737 for header in unanno_headers_ordered:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
738 if header not in blast_groups:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
739 continue
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
740 hits = blast_groups[header]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
741 current_read_info = process_single_query(header, hits, args.bitscore_perc_cutoff)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
742 read_data.append(current_read_info)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
743 reads_with_hits += 1
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
744
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
745 if header not in seen_reads:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
746 seen_reads.add(header)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
747 headers.append(header)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
748 try:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
749 count = int(header.split('(')[1].split(')')[0])
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
750 except (IndexError, ValueError, AttributeError) as e:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
751 log_message(log_messages, f"Error {e}: could not parse count in header: {header}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
752 count = 0
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
753 annotated_unique_count += count
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
754
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
755 reads_without_hits = total_headers_for_annotation - reads_with_hits
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
756 log_message(log_messages, f"ANNOTATION: total FASTA headers considered: {total_headers_for_annotation}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
757 log_message(log_messages, f"ANNOTATION: reads with BLAST hits: {reads_with_hits}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
758 log_message(log_messages, f"ANNOTATION: reads without BLAST hits: {reads_without_hits}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
759 log_message(log_messages, f"ANNOTATION: unique annotated count (from header counts): {annotated_unique_count}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
760 log_message(log_messages, f"ANNOTATION: total unique count (from FASTA): {total_unique_count}")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
761
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
762 taxa_dicts = [read['taxa_dict'] for read in read_data]
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
763 td = [read['best_taxa_dict'] for read in read_data]
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
764
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
765 if args.eval_plot:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
766 e_val_sets = [read['e_values'] for read in read_data]
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
767 make_eval_plot(e_val_sets, args.eval_plot, log_messages)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
768 log_message(log_messages, f"E-value plot written succesfully")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
769
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
770 if args.taxa_output:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
771 process_taxa_output(td, args.taxa_output, args.uncertain_threshold)
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
772 log_message(log_messages, f"Taxa summary written succesfully")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
773
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
774 if args.header_anno:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
775 source_for_headers = [[read['sources']] for read in read_data]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
776 seq_id_for_headers = [[read['seq_id']] for read in read_data]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
777 process_header_annotations(td, headers, args.header_anno, args.uncertain_threshold, source_for_headers,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
778 seq_id_for_headers, log_messages)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
779 log_message(log_messages, f"Header annotations written succesfully")
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
780
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
781 if args.circle_data:
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
782 create_circle_data(td, args.circle_data, args.use_counts, args.uncertain_threshold)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
783 log_message(log_messages, f"Circle diagram JSON written succesfully")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
784
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
785 stats = calculate_annotation_stats(len(taxa_dicts), unanno_file_path, annotated_unique_count,
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
786 int(total_unique_count))
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
787
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
788 log_message(log_messages, "=== ANNOTATION STATISTICS ===")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
789 for key, value in stats.items():
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
790 log_message(log_messages, f"{key}: {value}")
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
791
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
792 with open(args.log, 'w') as f:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
793 f.write("\n".join(log_messages))
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
794
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
795
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
796 def check_header_string(header_line, invalid_header_string, invalid_line_string):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
797 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
798 Checks the header input for string that it needs to be filtered on
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
799 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
800 invalid_string_list = parse_list_param(invalid_header_string) + parse_list_param(invalid_line_string)
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
801 for string in invalid_string_list:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
802 if string.lower() == 'singleton':
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
803 string = "obiclean_status={'XXX': 's'}"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
804 elif string.lower() == 'variant':
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
805 string = "obiclean_status={'XXX': 'i'}"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
806 elif string.lower() == 'head':
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
807 string = "obiclean_status={'XXX': 'h'}"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
808 elif string.lower() == 'pairend' or string.lower() == 'paired end':
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
809 string = "PairEnd"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
810 elif string.lower() == 'consensus' or string.lower() == 'cons':
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
811 string = "CONS"
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
812 if string in header_line:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
813 return False
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
814 return True
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
815
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
816
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
817 def is_insufficient_hit(hit, min_cov, min_id, min_bit, min_eval):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
818 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
819 Checks if hit has values that pass the thresholds
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
820 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
821 if float(hit['cov']) < min_cov:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
822 return True
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
823 elif float(hit['identity']) < min_id:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
824 return True
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
825 elif float(hit['bitscore']) < min_bit:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
826 return True
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
827 elif int(hit['e_val']) > min_eval:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
828 return True
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
829 else:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
830 return False
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
831
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
832
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
833 def is_invalid_taxon(taxon: str, invalid_taxa, invalid_ranks) -> bool:
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
834 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
835 Determine whether a given taxonomic path should be considered invalid and excluded from analysis.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
836 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
837 invalid_string_list = parse_list_param(invalid_ranks) + parse_list_param(invalid_taxa)
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
838 taxon_lower = taxon.lower()
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
839 for string in invalid_string_list:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
840 if string in taxon_lower:
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
841 return True
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
842 return False
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
843
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
844
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
845 def parse_list_param(param):
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
846 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
847 Safely convert a comma-separated Galaxy text parameter into a list.
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
848 """
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
849 if not param:
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
850 return []
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
851 return [x.strip() for x in param.strip(', \n').split(',') if x.strip()]
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
852
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
853
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
854 def main(arg_list=None):
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
855 """
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
856 Entry point for Galaxy-compatible BLAST annotation processing.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
857
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
858 :param arg_list: Optional list of command-line arguments to override
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
859 ``sys.argv``. Primarily used for testing or programmatic execution.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
860 If ``None``, arguments are read directly from the command line.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
861 :type arg_list: list[str] | None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
862 :return: None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
863 :rtype: None
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
864
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
865 Notes:
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
866 Calls `process_single_file` with parsed arguments.
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
867 """
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
868 log_messages = []
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
869 args = parse_arguments(arg_list)
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
870 process_single_file(args.input_anno, args.input_unanno, args, log_messages)
0
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
871 print("Processing completed successfully")
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
872
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
873
a3989edf0a4a planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
874 if __name__ == "__main__":
2
9ca209477dfd planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
875 main()