annotate add_taxonomic_labels.py @ 2:f4b8ab4ed24e draft

planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
author onnodg
date Mon, 15 Dec 2025 16:49:00 +0000
parents abd214795fa5
children 04ec86bdac32
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
1 """This script processes the output obtained from a curated BLAST database
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
2 and prepares the correct input format for downstream analysis.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
3
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
4 The difference before and after running this script is that taxonomic labels
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
5 in the BLAST output are no longer marked as unknown, because the curated database
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
6 only includes taxonomy information in the headers of the output sequences.
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
7 The source and sequence ID of the hit are also added to the correct position
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
8 in the tabular file.
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
9
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
10 To extract the correct taxonomy levels, it is necessary to know the exact
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
11 positions of the required taxonomic ranks. These positions must be specified
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
12 in the `--taxon_levels`, based on splitting each header string first on '=' and
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
13 then on whitespace.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
14
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
15 Important:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
16 This script is not needed for GenBank-based databases, since their output
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
17 is already usable.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
18 `--taxon_levels` is a critical variable — do not modify it unless you
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
19 understand what you're doing.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
20 """
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
21
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
22 import argparse
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
23 import re
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
24
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
25 def parse_arguments(args_list=None):
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
26 """Parse the command line arguments."""
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
27 parser = argparse.ArgumentParser(description="Add taxonomix labels from curated database output")
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
28 parser.add_argument("-i", "--input_file", required=True)
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
29 parser.add_argument("-o", "--output_file", required=True)
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
30 parser.add_argument("-t", "--taxon_levels", type=int, nargs="+", default=[1, 2, 4, 7, 11, 12, 13])
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
31 return parser.parse_args(args_list)
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
32
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
33
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
34 def add_labels(input, output, taxon_levels):
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
35 """
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
36 Add taxonomic labels, source and seqID to correct positions in BLAST output.
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
37
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
38 :param input: Path to BLAST output file.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
39 :type input: str
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
40 :param output: Path to output file.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
41 :type output: str
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
42 :param taxon_levels: List of taxonomic levels.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
43 :type taxon_levels: list
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
44 :return: None
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
45 :rtype: None
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
46 :raises: ValueError: if certain fields are missing from the input
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
47 """
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
48 # Make header in file
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
49 with open(output, 'w') as outfile:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
50 outfile.write('#Query ID #Subject #Subject accession #Subject Taxonomy ID #Identity percentage '
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
51 '#Coverage #evalue #bitscore #Source #Taxonomy\n')
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
52
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
53 with (open(input, 'r') as infile, open(output, 'a') as outfile):
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
54 for line in infile:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
55 if "#Query ID" not in line:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
56 new_taxa = ''
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
57 src_val = ""
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
58 seq_id = ""
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
59 split_index = 0
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
60 if "superkingdom=" in line and "markercode=" in line and "Genbank" in line:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
61 taxonomy = line.split("superkingdom=")[-1].split("markercode=")[0].strip()
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
62 line = line.split("Genbank")[0].strip()
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
63 parts = line.strip().split()
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
64 else:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
65 raise ValueError("Line does not contain expected fields: superkingdom, markercode, or Genbank")
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
66
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
67 for i, p in enumerate(parts):
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
68 if p.startswith("source="):
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
69 src_val = p.split("=", 1)[1]
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
70 elif p.startswith("sequenceID="):
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
71 seq_id = p.split("=", 1)[1]
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
72 elif p.startswith("markercode="):
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
73 split_index = i
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
74 if src_val and seq_id and split_index:
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
75 break
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
76
2
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
77 # Split on the source, which appears twice in the line
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
78 line1, remove, line2 = line.split(f"source={src_val}", 2)
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
79 # Just remove the taxon and source
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
80 keep = remove.split(parts[split_index-1])[-1]
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
81 if src_val and seq_id:
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
82 # Insert source and seq id in the line
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
83 line = (line1.strip() + "\t" + keep.strip() + "\t" + seq_id.strip() + "\t"
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
84 + line2.strip() + "\t" + src_val)
f4b8ab4ed24e planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
onnodg
parents: 0
diff changeset
85 # Add taxon to line
0
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
86 for level in taxon_levels:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
87 if level != taxon_levels[-1]:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
88 new_taxa += taxonomy.split('=')[level].split(' ')[0]
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
89 new_taxa += ' / '
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
90 else:
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
91 # The species name already contains a space, so it
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
92 # should not be split further.
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
93 new_taxa += taxonomy.split('=')[level]
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
94 line += '\t'
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
95 line += new_taxa
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
96 outfile.write(f'{line}\n')
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
97
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
98
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
99 def main(arg_list=None):
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
100 args = parse_arguments(arg_list)
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
101 add_labels(args.input_file, args.output_file, args.taxon_levels)
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
102
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
103
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
104 if __name__ == '__main__':
abd214795fa5 planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit c944fd5685f295acba06679e85b67973c173b137
onnodg
parents:
diff changeset
105 main()