Mercurial > repos > onnodg > blast_annotations_processor
annotate blast_annotations_processor.py @ 2:9ca209477dfd draft default tip
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
| author | onnodg |
|---|---|
| date | Mon, 15 Dec 2025 16:43:36 +0000 |
| parents | a3989edf0a4a |
| children |
| rev | line source |
|---|---|
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
1 """Galaxy-compatible BLAST annotation processor. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
2 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
3 This script processes a single annotated BLAST file along with a FASTA file |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
4 containing unannotated reads. It generates multiple types of outputs for |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
5 integration with Galaxy workflows: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
6 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
7 - E-value distribution plots (PNG) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
8 - Taxonomic composition reports (text) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
9 - Circular taxonomy diagram data (JSON) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
10 - Header annotations with merged and per-read information (Excel) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
11 - Annotation statistics summary (text) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
12 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
13 Main workflow: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
14 1. Parse command-line arguments. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
15 2. Load annotated BLAST results and unannotated FASTA headers. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
16 3. Group BLAST hits per query (q_id), filter by thresholds. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
17 4. Resolve taxonomic conflicts with uncertainty rules. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
18 5. Generate requested outputs. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
19 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
20 Notes: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
21 - Headers in BLAST and FASTA should correspond. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
22 - Uses matplotlib, pandas, and openpyxl for visualization and reporting. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
23 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
24 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
25 import argparse |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
26 import json |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
27 import os |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
28 from collections import OrderedDict |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
29 from collections import defaultdict |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
30 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
31 import matplotlib.pyplot as plt |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
32 import numpy as np |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
33 import pandas as pd |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
34 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
35 # Default taxonomic levels |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
36 TAXONOMIC_LEVELS = ["K", "P", "C", "O", "F", "G", "S"] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
37 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
38 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
39 def parse_arguments(arg_list=None): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
40 """Parse command line arguments for cluster processing.""" |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
41 parser = argparse.ArgumentParser(description='Process BLAST annotation results for Galaxy') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
42 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
43 parser.add_argument('--input-anno', required=True, help='Annotated BLAST output file (tabular format)') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
44 parser.add_argument('--input-unanno', required=True, help='Unannotated sequences file (FASTA format)') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
45 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
46 parser.add_argument('--eval-plot', help='Output path for E-value plot (PNG)') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
47 parser.add_argument('--taxa-output', help='Output path for taxa report (tabular)') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
48 parser.add_argument('--circle-data', help='Output path for circular taxonomy data (txt)') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
49 parser.add_argument('--header-anno', help='Output path for header annotations (tabular/xlsx)') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
50 parser.add_argument('--log', help='Output path for log file (txt)', required=True) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
51 parser.add_argument('--filtered-fasta', required=True, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
52 help='Filtered fasta file (fasta format) for downstream analysis') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
53 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
54 parser.add_argument('--uncertain-threshold', type=float, default=0.9, required=True, |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
55 help='Threshold for resolving taxonomic conflicts (default: 0.9)') |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
56 parser.add_argument('--eval-threshold', default='1e-10', type=float, required=True, |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
57 help='E-value threshold for filtering results (default: 1e-10)') |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
58 parser.add_argument('--use-counts', action='store_true', default=False, required=False, |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
59 help='Use read counts in circular data') |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
60 parser.add_argument('--ignore-rank', default='unknown', required=False, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
61 help='Ignore rank when containing this text') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
62 parser.add_argument('--ignore-taxonomy', default='environmental', required=False, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
63 help="Don't use taxonomy containing this taxonomy") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
64 parser.add_argument('--bitscore-perc-cutoff', type=float, default=8, required=True, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
65 help='Bitscore percentage cutoff for considered hits') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
66 parser.add_argument('--min-bitscore', type=int, default=0, required=True, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
67 help='Minimum bitscore threshold for hits') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
68 parser.add_argument('-iot', '--ignore-obiclean-type', type=str, default='singleton', required=False, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
69 help='Ignore sequences with this obiclean type') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
70 parser.add_argument('-iit', '--ignore-illuminapairend-type', type=str, default='pairend', required=False, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
71 help='Ignore sequences with this illumina paired end output type') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
72 parser.add_argument('--min-identity', type=int, default=0, required=True, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
73 help='Minimum sequence identity to consider a hit') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
74 parser.add_argument('--min-coverage', type=int, default=0, required=True, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
75 help='Minimum sequence coverage to consider a hit') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
76 parser.add_argument('--ignore-seqids', type=str, default='', required=False, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
77 help='Ignore sequences with this sequence identifier') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
78 parser.add_argument('--min-support', type=int, default=0, required=True, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
79 help='A taxon is kept only if it (or its descendants) have at least N reads assigned.') |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
80 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
81 return parser.parse_args(arg_list) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
82 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
83 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
84 def log_message(log_messages, msg): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
85 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
86 Helper to both print and collect log messages. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
87 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
88 if log_messages is not None: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
89 log_messages.append(msg) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
90 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
91 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
92 def list_to_string(x): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
93 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
94 Convert a list, pandas Series, or numpy array to a comma-separated string. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
95 """ |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
96 if isinstance(x, (list, pd.Series, np.ndarray)): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
97 return ", ".join(map(str, x)) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
98 return str(x) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
99 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
100 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
101 def make_eval_plot(e_val_sets, output_path, log_messages): |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
102 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
103 Generate an E-value distribution plot. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
104 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
105 This function aggregates E-values per read, transforms them onto a log axis |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
106 and produces a visual summary of the distribution of best hits across reads. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
107 If no E-values are available, no plot is created. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
108 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
109 :param e_val_sets: Set of E-values per read. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
110 :type e_val_sets: list[set[str]] |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
111 :param output_path: Output PNG file. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
112 :type output_path: str |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
113 :param log_messages: Log collection list. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
114 :type log_messages: list[str] |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
115 :return: None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
116 :rtype: None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
117 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
118 if not e_val_sets: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
119 log_message(log_messages, "No E-values to plot") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
120 return None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
121 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
122 processed_sets = sorted([sorted(float(e_val) for e_val in e_val_set) for e_val_set in e_val_sets]) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
123 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
124 bar_positions = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
125 bar_heights = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
126 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
127 for i, e_set in enumerate(processed_sets): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
128 if len(e_set) == 1: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
129 e_set.append(0) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
130 for j, e_val in enumerate(e_set): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
131 bar_positions.append(i + j * 0.29) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
132 if e_val != 0: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
133 bar_heights.append(1 / e_val) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
134 else: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
135 bar_heights.append(e_val) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
136 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
137 plt.figure(figsize=(21, 9)) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
138 plt.bar(bar_positions, bar_heights, width=0.29, color=['blue', 'red']) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
139 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
140 plt.yscale('log') |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
141 plt.grid(axis='y', linestyle='--', color='gray', alpha=0.7) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
142 plt.ylabel("e-values") |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
143 plt.xticks(ticks=range(len(processed_sets)), labels=[f'{i + 1}' for i in range(len(processed_sets))]) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
144 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
145 plt.tight_layout() |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
146 plt.savefig(output_path, format='png') |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
147 plt.close() |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
148 return None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
149 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
150 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
151 def calculate_annotation_stats(anno_count, unanno_file_path, unique_anno_count, total_unique_count): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
152 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
153 Compute annotation statistics for a dataset. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
154 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
155 This function calculates summary statistics for annotated and unannotated |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
156 sequences in a dataset. It counts the total number of sequences in the |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
157 unannotated file (FASTA-style, based on lines starting with '>'), and |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
158 computes the percentage of annotated sequences and unique annotated |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
159 sequences relative to their totals. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
160 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
161 :param anno_count: Total number of annotated sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
162 :type anno_count: int |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
163 :param unanno_file_path: Path to a FASTA file containing unannotated sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
164 :type unanno_file_path: str |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
165 :param unique_anno_count: Number of unique annotated sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
166 :type unique_anno_count: int |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
167 :param total_unique_count: Total number of unique sequences in the dataset. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
168 :type total_unique_count: int |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
169 :return: Dictionary containing: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
170 - 'percentage_annotated' (float): Percentage of annotated sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
171 - 'annotated_sequences' (int): Number of annotated sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
172 - 'total_sequences' (int): Total number of sequences (annotated + unannotated). |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
173 - 'percentage_unique_annotated' (float): Percentage of unique annotated sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
174 - 'unique_annotated' (int): Number of unique annotated sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
175 - 'total_unique' (int): Total number of unique sequences. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
176 :rtype: dict[str, float | int] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
177 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
178 total_sequences = 0 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
179 with open(unanno_file_path, 'r') as f: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
180 total_sequences = sum(1 for line in f if line.startswith('>')) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
181 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
182 percentage_annotated = (anno_count / total_sequences * 100) if total_sequences > 0 else 0 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
183 percentage_unique_annotated = (unique_anno_count / total_unique_count * 100) if total_unique_count > 0 else 0 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
184 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
185 return {'percentage_annotated': percentage_annotated, 'annotated_sequences': anno_count, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
186 'total_sequences': total_sequences, 'percentage_unique_annotated': percentage_unique_annotated, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
187 'unique_annotated': unique_anno_count, 'total_unique': total_unique_count} |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
188 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
189 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
190 def resolve_tax_majority(taxon_counts, uncertain_threshold): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
191 total_count = sum(taxon_counts.values()) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
192 if total_count == 0: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
193 return "Uncertain taxa", "Uncertain taxa" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
194 most_common_taxon, count = max(taxon_counts.items(), key=lambda x: x[1]) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
195 # Use most common if above threshold |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
196 if count / total_count >= uncertain_threshold: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
197 return most_common_taxon, most_common_taxon |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
198 conflicting_taxa = list(taxon_counts.keys()) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
199 conflicting_levels = [taxon.split(" / ") for taxon in conflicting_taxa] |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
200 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
201 # Resolve uncertainty per level |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
202 for level_idx in range(min(len(level) for level in conflicting_levels)): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
203 current_level_names = {level[level_idx] for level in conflicting_levels} |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
204 if len(current_level_names) > 1: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
205 lowest_common_path = conflicting_levels[0][:level_idx + 1] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
206 lowest_common_path[-1] = 'Uncertain taxa' |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
207 small_output = " / ".join(lowest_common_path) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
208 min_conflicting_level = min(len(level) for level in conflicting_levels) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
209 uncertain_path = conflicting_levels[0][:level_idx + 1] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
210 uncertain_path[-1] = 'Uncertain taxa' |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
211 for _ in range(level_idx + 1, min_conflicting_level): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
212 uncertain_path.append('Uncertain taxa') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
213 return small_output, " / ".join(uncertain_path) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
214 return conflicting_taxa[0], conflicting_taxa[0] |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
215 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
216 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
217 def process_taxa_output(taxa_dicts, output_path, uncertain_threshold): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
218 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
219 Generate a hierarchical taxa summary file. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
220 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
221 This function resolves best taxonomic assignments per read and aggregates |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
222 counts into taxonomic levels, producing a readable text report listing |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
223 percentages, counts and resolved taxonomy. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
224 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
225 :param taxa_dicts: Best taxa dictionaries per read. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
226 :type taxa_dicts: list[dict] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
227 :param output_path: Path of output report. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
228 :type output_path: str |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
229 :param uncertain_threshold: Confidence value for majority voting. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
230 :type uncertain_threshold: float |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
231 :return: None |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
232 :rtype: None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
233 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
234 uncertain_dict = {level: 0 for level in TAXONOMIC_LEVELS} |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
235 aggregated_counts = defaultdict(int) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
236 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
237 for read_taxa in taxa_dicts: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
238 if not read_taxa: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
239 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
240 resolved_taxon, _ = resolve_tax_majority(read_taxa, uncertain_threshold) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
241 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
242 # Add counts for resolved taxon |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
243 levels = resolved_taxon.split(" / ") |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
244 for i in range(1, len(levels) + 1): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
245 aggregated_counts[" / ".join(levels[:i])] += 1 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
246 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
247 total_count = sum(value for key, value in aggregated_counts.items() if len(key.split(" / ")) == 1) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
248 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
249 report_lines = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
250 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
251 for taxonomy, count in sorted(aggregated_counts.items(), key=lambda x: x[0]): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
252 levels = taxonomy.split(" / ") |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
253 indent = " " * (len(levels) - 1) * 2 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
254 taxon_name = levels[-1] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
255 taxon_level_code = TAXONOMIC_LEVELS[len(levels) - 1] if len(levels) <= len(TAXONOMIC_LEVELS) else "U" |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
256 percentage = (count / total_count) * 100 if total_count > 0 else 0 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
257 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
258 report_lines.append(f"{percentage:.2f}\t{count}\t{total_count}\t{taxon_level_code}\t{indent}{taxon_name}") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
259 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
260 if taxon_name == 'Uncertain taxa': |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
261 uncertain_dict[taxon_level_code] += count |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
262 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
263 with open(output_path, 'w') as f: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
264 f.write("Uncertain count per taxonomic level" + str(uncertain_dict) + '\n') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
265 f.write('percentage_rooted\tnumber_rooted\ttotal_num\ttaxon_level\tidentification\n') |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
266 f.write("\n".join(report_lines)) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
267 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
268 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
269 def process_header_annotations(taxa_dicts, headers, output_path, uncertain_threshold, source_list, seq_id_list, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
270 log_messages): |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
271 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
272 Create an Excel report summarizing individual and merged read annotations. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
273 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
274 This function converts best taxonomy assignments into a per-read table, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
275 generates an aggregated table grouped by taxon, and saves both tables to |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
276 separate sheets in a single Excel file. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
277 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
278 :param taxa_dicts: Taxonomy assignments per read. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
279 :type taxa_dicts: list[dict] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
280 :param headers: Original FASTA headers. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
281 :type headers: list[str] |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
282 :param output_path: Output Excel file name. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
283 :type output_path: str |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
284 :param uncertain_threshold: Confidence level for taxonomy conflict resolution. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
285 :type uncertain_threshold: float |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
286 :param source_list: Source identifiers per read. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
287 :type source_list: list[list[str]] |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
288 :param seq_id_list: Sequence identifiers per read. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
289 :type seq_id_list: list[list[str]] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
290 :param log_messages: Log collection list. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
291 :type log_messages: list[str] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
292 :return: None |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
293 :rtype: None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
294 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
295 report_lines = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
296 for i, read_taxa in enumerate(taxa_dicts): |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
297 if not read_taxa: |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
298 continue |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
299 _, resolved_taxon_long = resolve_tax_majority(read_taxa, uncertain_threshold) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
300 source = source_list[i][0] if source_list[i] else "N/A" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
301 seq_id = seq_id_list[i][0] if seq_id_list[i] else "N/A" |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
302 header = headers[i] if i < len(headers) else f"Header missing" |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
303 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
304 # Extract count |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
305 try: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
306 header_base, count_str = header.rsplit("(", 1) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
307 count = int(count_str.rstrip(")")) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
308 except ValueError as e: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
309 log_message(log_messages, f'Failed extracting count: {e}') |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
310 header_base = header |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
311 count = 1 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
312 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
313 report_lines.append(f'{header_base}\t{seq_id}\t{source}\t{count}\t{resolved_taxon_long}') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
314 # temp path is needed to write to a xlsx from galaxy |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
315 temp_tsv_path = 'temp.tsv' |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
316 try: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
317 with open(temp_tsv_path, 'w') as f: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
318 f.write('header\tseq_id\tsource\tcount\ttaxa\n') |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
319 f.write("\n".join(report_lines)) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
320 except PermissionError as e: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
321 log_message(log_messages, f"Unable to write to file, error: {e} file might be opened") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
322 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
323 df = pd.read_csv(temp_tsv_path, sep='\t', encoding="latin1") |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
324 if not df.empty: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
325 taxa_split = df["taxa"].str.split(" / ", expand=True) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
326 max_levels = taxa_split.shape[1] if taxa_split.shape[1] is not None else 7 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
327 level_names = ["kingdom", "phylum", "class", "order", "family", "genus", "species"][:max_levels] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
328 taxa_split.columns = level_names |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
329 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
330 df_individual = pd.concat([df, taxa_split], axis=1) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
331 df_individual = df_individual.sort_values(['species', 'genus', 'family'], ascending=[True, True, True]) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
332 df_for_merge = df_individual.copy() |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
333 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
334 group_cols = ["taxa"] |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
335 agg_dict = {"seq_id": lambda x: list_to_string(list(x.unique())), "count": "sum", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
336 "source": lambda x: list_to_string(list(x.unique())), } |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
337 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
338 for level in level_names: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
339 if level in df_for_merge.columns: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
340 agg_dict[level] = "first" |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
341 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
342 df_merged = df_for_merge.groupby(group_cols, as_index=False).agg(agg_dict) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
343 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
344 sort_columns = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
345 sort_ascending = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
346 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
347 for level in ['species', 'genus', 'family']: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
348 if level in df_merged.columns: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
349 sort_columns.append(level) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
350 sort_ascending.append(True) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
351 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
352 df_merged = df_merged.sort_values(sort_columns, ascending=sort_ascending) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
353 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
354 temp_path = output_path + ".xlsx" |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
355 os.makedirs(os.path.dirname(temp_path), exist_ok=True) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
356 with pd.ExcelWriter(temp_path, engine='openpyxl', mode='w') as writer: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
357 df_individual.to_excel(writer, sheet_name='Individual_Reads', index=False) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
358 df_merged.to_excel(writer, sheet_name='Merged_by_Taxa', index=False) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
359 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
360 for sheet_name in writer.sheets: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
361 worksheet = writer.sheets[sheet_name] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
362 for column in worksheet.columns: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
363 max_length = 0 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
364 column_letter = column[0].column_letter |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
365 for cell in column: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
366 try: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
367 if len(str(cell.value)) > max_length: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
368 max_length = len(str(cell.value)) |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
369 except AttributeError as e: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
370 log_message(log_messages, f"Error {e}, {cell} has no value, something probably went wrong") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
371 pass |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
372 adjusted_width = min(max_length + 2, 50) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
373 worksheet.column_dimensions[column_letter].width = adjusted_width |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
374 os.replace(temp_path, output_path) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
375 # Clean up temporary file |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
376 os.remove(temp_tsv_path) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
377 else: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
378 log_message(log_messages, "Dataframe empty, no annotation results") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
379 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
380 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
381 def create_circle_data(taxa_dicts, output_path, use_counts, uncertain_threshold): |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
382 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
383 Generate circular taxonomy layer data. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
384 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
385 The function converts resolved taxonomy per read into hierarchical |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
386 cumulative counts that can be visualized as concentric circles. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
387 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
388 :param taxa_dicts: Per-read taxonomy assignments. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
389 :type taxa_dicts: list[dict] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
390 :param output_path: JSON destination file. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
391 :type output_path: str |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
392 :param use_counts: Whether to count total reads or unique taxa. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
393 :type use_counts: bool |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
394 :param uncertain_threshold: Majority resolution threshold. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
395 :type uncertain_threshold: float |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
396 :return: None |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
397 :rtype: None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
398 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
399 aggregated_counts = defaultdict(int) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
400 seen_taxa = set() |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
401 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
402 for read_taxa in taxa_dicts: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
403 if not read_taxa: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
404 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
405 _, resolved_taxon_long = resolve_tax_majority(read_taxa, uncertain_threshold) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
406 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
407 levels = resolved_taxon_long.split(" / ") |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
408 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
409 if use_counts: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
410 for i in range(1, len(levels) + 1): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
411 aggregated_counts[" / ".join(levels[:i])] += 1 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
412 else: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
413 if resolved_taxon_long not in seen_taxa: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
414 for i in range(1, len(levels) + 1): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
415 aggregated_counts[" / ".join(levels[:i])] += 1 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
416 seen_taxa.add(resolved_taxon_long) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
417 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
418 circle_data = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
419 for level in range(len(TAXONOMIC_LEVELS)): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
420 circle_data.append({"labels": [], "sizes": []}) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
421 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
422 for taxonomy, count in sorted(aggregated_counts.items(), key=lambda x: x[0]): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
423 levels = taxonomy.split(" / ") |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
424 if len(levels) <= len(TAXONOMIC_LEVELS): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
425 circle_data[len(levels) - 1]["labels"].append(levels[-1].strip()) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
426 circle_data[len(levels) - 1]["sizes"].append(count) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
427 with open(output_path, "w") as f: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
428 json.dump(circle_data, f, indent=2) |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
429 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
430 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
431 def read_unannotated_fasta(unanno_file_path, ignore_illuminapairend_type, ignore_obiclean_type, min_support, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
432 filtered_fasta_path, log_messages): |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
433 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
434 Read, filter and optionally rewrite an unannotated FASTA file. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
435 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
436 This function extracts dereplicated sequence counts, applies header filters |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
437 and a minimum support threshold, collects statistics and, if requested, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
438 writes a filtered FASTA file for downstream processing. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
439 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
440 :param unanno_file_path: Path to FASTA file. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
441 :type unanno_file_path: str |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
442 :param ignore_illuminapairend_type: Illumina pair-end status to remove. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
443 :type ignore_illuminapairend_type: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
444 :param ignore_obiclean_type: Obiclean label to remove. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
445 :type ignore_obiclean_type: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
446 :param min_support: Minimum dereplicated read count retained. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
447 :type min_support: int |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
448 :param filtered_fasta_path: Optional path for filtered FASTA output. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
449 :type filtered_fasta_path: str | None |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
450 :param log_messages: Log collection list. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
451 :type log_messages: list[str] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
452 :return: (headers, total_unique_count, invalid_count) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
453 :rtype: tuple[list[str], int, int] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
454 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
455 headers = [] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
456 total_unique_count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
457 invalid_count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
458 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
459 total_headers = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
460 low_support_count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
461 header_filter_count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
462 written_fasta_count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
463 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
464 if not os.path.exists(unanno_file_path): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
465 log_message(log_messages, f"Warning: Unannotated file {unanno_file_path} not found") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
466 return headers, total_unique_count, invalid_count |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
467 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
468 write_fasta = filtered_fasta_path is not None |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
469 filtered_fasta = None |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
470 if write_fasta: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
471 try: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
472 filtered_fasta = open(filtered_fasta_path, "w") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
473 except OSError as e: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
474 log_message(log_messages, f"Error: Cannot open {filtered_fasta_path} for writing: {e}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
475 write_fasta = False |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
476 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
477 write_flag = False |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
478 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
479 with open(unanno_file_path, 'r') as f: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
480 for line in f: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
481 if line.startswith('>'): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
482 total_headers += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
483 header_line = line.rstrip("\n") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
484 header = header_line.split()[0].strip('>') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
485 if "count=" in header_line: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
486 count = int(header_line.split("count=")[1].split(";")[0]) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
487 else: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
488 count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
489 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
490 passes_header_filter = check_header_string(header_line, ignore_illuminapairend_type, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
491 ignore_obiclean_type) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
492 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
493 if passes_header_filter and int(count) >= int(min_support): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
494 headers.append(header) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
495 total_unique_count += count |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
496 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
497 write_flag = True |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
498 if write_fasta: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
499 filtered_fasta.write(line) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
500 written_fasta_count += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
501 else: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
502 # Invalid header (either header filter or low support) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
503 if not passes_header_filter: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
504 header_filter_count += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
505 elif int(count) < int(min_support): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
506 low_support_count += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
507 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
508 invalid_count += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
509 write_flag = False |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
510 else: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
511 if write_flag and write_fasta: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
512 filtered_fasta.write(line) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
513 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
514 if write_fasta and filtered_fasta is not None: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
515 filtered_fasta.close() |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
516 log_message(log_messages, f"Filtered FASTA written succesfully" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
517 f"({written_fasta_count} sequences)") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
518 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
519 log_message(log_messages, f"FASTA: total headers: {total_headers}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
520 log_message(log_messages, f"FASTA: headers kept after filters and min_support={min_support}: {len(headers)}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
521 log_message(log_messages, f"FASTA: removed due to header filters (illumina/obiclean/etc.): {header_filter_count}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
522 log_message(log_messages, f"FASTA: removed due to low dereplicated count (<{min_support}): {low_support_count}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
523 log_message(log_messages, f"FASTA: total invalid (header filter + low support): {invalid_count}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
524 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
525 return headers, total_unique_count, invalid_count |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
526 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
527 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
528 def parse_blast_output(anno_file_path, eval_threshold, ignore_taxonomy, ignore_rank, min_coverage, min_identity, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
529 min_bitscore, ignore_seqids, log_messages): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
530 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
531 Parse a BLAST tabular file and retain high-quality hits. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
532 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
533 This function reads BLAST annotations, applies E-value and quality filters, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
534 excludes invalid taxa, groups remaining hits by query identifier, and logs |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
535 summary statistics. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
536 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
537 :param anno_file_path: Input BLAST result file. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
538 :type anno_file_path: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
539 :param eval_threshold: Maximum allowed E-value. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
540 :type eval_threshold: float |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
541 :param ignore_taxonomy: Taxonomy substring filter. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
542 :type ignore_taxonomy: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
543 :param ignore_rank: Rank substring filter. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
544 :type ignore_rank: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
545 :param min_coverage: Minimum coverage threshold. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
546 :type min_coverage: int |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
547 :param min_identity: Minimum identity threshold. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
548 :type min_identity: int |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
549 :param min_bitscore: Minimum bitscore threshold. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
550 :type min_bitscore: int |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
551 :param ignore_seqids: Comma-separated seqids to ignore. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
552 :type ignore_seqids: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
553 :param log_messages: Log collection list. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
554 :type log_messages: list[str] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
555 :return: Mapping of q_id → hits |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
556 :rtype: OrderedDict[str, list[dict]] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
557 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
558 if not os.path.exists(anno_file_path): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
559 log_message(log_messages, f"Error: Input file {anno_file_path} not found") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
560 return OrderedDict() |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
561 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
562 blast_groups = OrderedDict() |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
563 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
564 total_hits = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
565 kept_hits = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
566 filtered_hits = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
567 invalid_taxon_count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
568 ignored_seqid_count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
569 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
570 try: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
571 with open(anno_file_path, 'r') as f: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
572 for line in f: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
573 if line.startswith("#"): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
574 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
575 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
576 parts = line.strip().split('\t') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
577 if len(parts) < 7: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
578 log_message(log_messages, f"less than 7 parts in line, skipped faulty line: {line}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
579 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
580 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
581 q_id = parts[0] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
582 total_hits += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
583 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
584 try: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
585 e_val = float(parts[6]) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
586 except ValueError as e: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
587 log_message(log_messages, f"Error {e}, while extracting evalue") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
588 filtered_hits += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
589 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
590 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
591 taxon = parts[-1].strip() |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
592 if is_invalid_taxon(taxon, ignore_taxonomy, ignore_rank): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
593 invalid_taxon_count += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
594 filtered_hits += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
595 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
596 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
597 seq_id = parts[2] if len(parts) > 2 else '' |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
598 ignore_seqid_set = set(parse_list_param(ignore_seqids)) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
599 if seq_id in ignore_seqid_set: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
600 ignored_seqid_count += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
601 filtered_hits += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
602 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
603 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
604 identity = parts[4] if len(parts) > 4 else '' |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
605 cov = parts[5] if len(parts) > 5 else '' |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
606 bitscore = float(parts[7]) if len(parts) > 7 and parts[7] else 0.0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
607 source = parts[8] if len(parts) > 8 else '' |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
608 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
609 hit = {'e_val': e_val, 'identity': identity, 'cov': cov, 'bitscore': bitscore, 'source': source, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
610 'taxon': taxon, 'seq_id': seq_id, } |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
611 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
612 if not is_insufficient_hit(hit, min_coverage, min_identity, min_bitscore, eval_threshold): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
613 blast_groups.setdefault(q_id, []).append(hit) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
614 kept_hits += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
615 else: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
616 filtered_hits += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
617 except Exception as e: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
618 log_message(log_messages, f"Error reading BLAST file {anno_file_path}: {e}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
619 return OrderedDict() |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
620 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
621 log_message(log_messages, f"BLAST: total hits read: {total_hits}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
622 log_message(log_messages, f"BLAST: hits kept after quality filters: {kept_hits}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
623 log_message(log_messages, f"BLAST: hits filtered (evalue/coverage/identity/bitscore): {filtered_hits}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
624 log_message(log_messages, f"BLAST: hits removed due to invalid taxon: {invalid_taxon_count}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
625 log_message(log_messages, f"BLAST: hits removed due to ignored seqids: {ignored_seqid_count}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
626 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
627 return blast_groups |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
628 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
629 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
630 def process_single_query(header, hits, bitscore_perc_cutoff): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
631 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
632 Process all BLAST hits for a single query/header and return a summary dict. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
633 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
634 current_read_info = {'taxa_dict': {}, 'header': header, 'e_values': set(), 'identities': set(), 'coverages': set(), |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
635 'sources': set(), 'bitscores': set(), 'seq_id': set(), 'best_taxa_dict': {}} |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
636 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
637 max_bitscore = max(h['bitscore'] for h in hits) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
638 if bitscore_perc_cutoff > 0: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
639 top_bitscore = float(max_bitscore) * (1 - (float(bitscore_perc_cutoff) / 100.0)) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
640 else: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
641 top_bitscore = float(max_bitscore) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
642 current_read_info['best_taxa_dict'] = defaultdict(int) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
643 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
644 for hit in hits: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
645 e_val = hit['e_val'] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
646 identity = hit['identity'] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
647 cov = hit['cov'] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
648 bitscore = hit['bitscore'] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
649 source = hit['source'] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
650 taxon = hit['taxon'] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
651 seq_id = hit['seq_id'] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
652 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
653 # Keep track of hits that pass bitscore threshold |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
654 if bitscore >= top_bitscore: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
655 current_read_info['best_taxa_dict'][taxon] += 1 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
656 current_read_info['e_values'].add(str(e_val)) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
657 current_read_info['identities'].add(identity) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
658 current_read_info['coverages'].add(cov) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
659 current_read_info['sources'].add(source) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
660 current_read_info['bitscores'].add(bitscore) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
661 current_read_info['seq_id'].add(seq_id) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
662 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
663 return current_read_info |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
664 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
665 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
666 def process_single_file(anno_file_path, unanno_file_path, args, log_messages): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
667 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
668 Execute the complete BLAST annotation processing workflow. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
669 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
670 This function performs FASTA parsing, BLAST parsing, taxonomy resolution, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
671 per-read and aggregated reporting, summary statistics computation, and |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
672 generation of optional plot and table outputs. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
673 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
674 All validations, errors and counters are written into the log collector. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
675 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
676 :param anno_file_path: BLAST annotation file. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
677 :type anno_file_path: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
678 :param unanno_file_path: Input FASTA file. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
679 :type unanno_file_path: str |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
680 :param args: Parsed arguments namespace. |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
681 :type args: argparse.Namespace |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
682 :param log_messages: Log collection list. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
683 :type log_messages: list[str] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
684 :return: None |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
685 :rtype: None |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
686 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
687 log_message(log_messages, f"Starting processing for FASTA") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
688 log_message(log_messages, "=== PARAMETERS USED ===") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
689 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
690 skip_keys = { |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
691 "input_anno", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
692 "input_unanno", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
693 "eval_plot", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
694 "taxa_output", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
695 "circle_data", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
696 "header_anno", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
697 "log", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
698 "filtered_fasta", |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
699 } |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
700 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
701 for key, value in vars(args).items(): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
702 if key in skip_keys: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
703 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
704 log_message(log_messages, f"{key}: {value}") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
705 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
706 log_message(log_messages, "=== END PARAMETERS ===") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
707 unanno_headers_ordered, total_unique_count, invalid_count = read_unannotated_fasta(unanno_file_path, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
708 args.ignore_illuminapairend_type, args.ignore_obiclean_type, args.min_support, args.filtered_fasta, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
709 log_messages) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
710 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
711 unanno_set = set(unanno_headers_ordered) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
712 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
713 if not os.path.exists(anno_file_path): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
714 log_message(log_messages, f"Error: Input file {anno_file_path} not found") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
715 with open(args.log, 'w') as f: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
716 print('gaat nog niet goed hoor') |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
717 f.write("\n".join(log_messages)) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
718 return |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
719 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
720 log_message(log_messages, f"Reading BLAST annotations") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
721 blast_groups = parse_blast_output(anno_file_path, args.eval_threshold, args.ignore_taxonomy, args.ignore_rank, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
722 args.min_coverage, args.min_identity, args.min_bitscore, args.ignore_seqids, log_messages) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
723 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
724 extra_blast_qids = [q for q in blast_groups.keys() if q not in unanno_set] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
725 if extra_blast_qids: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
726 log_message(log_messages, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
727 f"Note: {len(extra_blast_qids)} BLAST q_ids not in FASTA (showing up to 10): {extra_blast_qids[:10]}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
728 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
729 read_data = [] |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
730 headers = [] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
731 seen_reads = set() |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
732 annotated_unique_count = 0 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
733 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
734 total_headers_for_annotation = len(unanno_headers_ordered) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
735 reads_with_hits = 0 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
736 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
737 for header in unanno_headers_ordered: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
738 if header not in blast_groups: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
739 continue |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
740 hits = blast_groups[header] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
741 current_read_info = process_single_query(header, hits, args.bitscore_perc_cutoff) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
742 read_data.append(current_read_info) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
743 reads_with_hits += 1 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
744 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
745 if header not in seen_reads: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
746 seen_reads.add(header) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
747 headers.append(header) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
748 try: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
749 count = int(header.split('(')[1].split(')')[0]) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
750 except (IndexError, ValueError, AttributeError) as e: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
751 log_message(log_messages, f"Error {e}: could not parse count in header: {header}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
752 count = 0 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
753 annotated_unique_count += count |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
754 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
755 reads_without_hits = total_headers_for_annotation - reads_with_hits |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
756 log_message(log_messages, f"ANNOTATION: total FASTA headers considered: {total_headers_for_annotation}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
757 log_message(log_messages, f"ANNOTATION: reads with BLAST hits: {reads_with_hits}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
758 log_message(log_messages, f"ANNOTATION: reads without BLAST hits: {reads_without_hits}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
759 log_message(log_messages, f"ANNOTATION: unique annotated count (from header counts): {annotated_unique_count}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
760 log_message(log_messages, f"ANNOTATION: total unique count (from FASTA): {total_unique_count}") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
761 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
762 taxa_dicts = [read['taxa_dict'] for read in read_data] |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
763 td = [read['best_taxa_dict'] for read in read_data] |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
764 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
765 if args.eval_plot: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
766 e_val_sets = [read['e_values'] for read in read_data] |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
767 make_eval_plot(e_val_sets, args.eval_plot, log_messages) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
768 log_message(log_messages, f"E-value plot written succesfully") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
769 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
770 if args.taxa_output: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
771 process_taxa_output(td, args.taxa_output, args.uncertain_threshold) |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
772 log_message(log_messages, f"Taxa summary written succesfully") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
773 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
774 if args.header_anno: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
775 source_for_headers = [[read['sources']] for read in read_data] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
776 seq_id_for_headers = [[read['seq_id']] for read in read_data] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
777 process_header_annotations(td, headers, args.header_anno, args.uncertain_threshold, source_for_headers, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
778 seq_id_for_headers, log_messages) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
779 log_message(log_messages, f"Header annotations written succesfully") |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
780 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
781 if args.circle_data: |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
782 create_circle_data(td, args.circle_data, args.use_counts, args.uncertain_threshold) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
783 log_message(log_messages, f"Circle diagram JSON written succesfully") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
784 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
785 stats = calculate_annotation_stats(len(taxa_dicts), unanno_file_path, annotated_unique_count, |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
786 int(total_unique_count)) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
787 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
788 log_message(log_messages, "=== ANNOTATION STATISTICS ===") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
789 for key, value in stats.items(): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
790 log_message(log_messages, f"{key}: {value}") |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
791 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
792 with open(args.log, 'w') as f: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
793 f.write("\n".join(log_messages)) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
794 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
795 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
796 def check_header_string(header_line, invalid_header_string, invalid_line_string): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
797 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
798 Checks the header input for string that it needs to be filtered on |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
799 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
800 invalid_string_list = parse_list_param(invalid_header_string) + parse_list_param(invalid_line_string) |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
801 for string in invalid_string_list: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
802 if string.lower() == 'singleton': |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
803 string = "obiclean_status={'XXX': 's'}" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
804 elif string.lower() == 'variant': |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
805 string = "obiclean_status={'XXX': 'i'}" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
806 elif string.lower() == 'head': |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
807 string = "obiclean_status={'XXX': 'h'}" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
808 elif string.lower() == 'pairend' or string.lower() == 'paired end': |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
809 string = "PairEnd" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
810 elif string.lower() == 'consensus' or string.lower() == 'cons': |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
811 string = "CONS" |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
812 if string in header_line: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
813 return False |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
814 return True |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
815 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
816 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
817 def is_insufficient_hit(hit, min_cov, min_id, min_bit, min_eval): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
818 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
819 Checks if hit has values that pass the thresholds |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
820 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
821 if float(hit['cov']) < min_cov: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
822 return True |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
823 elif float(hit['identity']) < min_id: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
824 return True |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
825 elif float(hit['bitscore']) < min_bit: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
826 return True |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
827 elif int(hit['e_val']) > min_eval: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
828 return True |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
829 else: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
830 return False |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
831 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
832 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
833 def is_invalid_taxon(taxon: str, invalid_taxa, invalid_ranks) -> bool: |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
834 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
835 Determine whether a given taxonomic path should be considered invalid and excluded from analysis. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
836 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
837 invalid_string_list = parse_list_param(invalid_ranks) + parse_list_param(invalid_taxa) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
838 taxon_lower = taxon.lower() |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
839 for string in invalid_string_list: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
840 if string in taxon_lower: |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
841 return True |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
842 return False |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
843 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
844 |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
845 def parse_list_param(param): |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
846 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
847 Safely convert a comma-separated Galaxy text parameter into a list. |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
848 """ |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
849 if not param: |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
850 return [] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
851 return [x.strip() for x in param.strip(', \n').split(',') if x.strip()] |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
852 |
|
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
853 |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
854 def main(arg_list=None): |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
855 """ |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
856 Entry point for Galaxy-compatible BLAST annotation processing. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
857 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
858 :param arg_list: Optional list of command-line arguments to override |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
859 ``sys.argv``. Primarily used for testing or programmatic execution. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
860 If ``None``, arguments are read directly from the command line. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
861 :type arg_list: list[str] | None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
862 :return: None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
863 :rtype: None |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
864 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
865 Notes: |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
866 Calls `process_single_file` with parsed arguments. |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
867 """ |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
868 log_messages = [] |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
869 args = parse_arguments(arg_list) |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
870 process_single_file(args.input_anno, args.input_unanno, args, log_messages) |
|
0
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
871 print("Processing completed successfully") |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
872 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
873 |
|
a3989edf0a4a
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff
changeset
|
874 if __name__ == "__main__": |
|
2
9ca209477dfd
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents:
0
diff
changeset
|
875 main() |
