Mercurial > repos > onnodg > add_taxonomic_labels
view README.md @ 4:04ec86bdac32 draft default tip
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
| author | onnodg |
|---|---|
| date | Mon, 15 Dec 2025 17:01:06 +0000 |
| parents | f4b8ab4ed24e |
| children |
line wrap: on
line source
# Add Taxonomic Labels Script This script processes BLAST output files from a **curated BLAST database** and prepares them for downstream taxonomic analysis. In curated BLAST results, taxonomic labels are often missing or marked as “unknown,” because taxonomy information is stored only in the sequence headers. This script extracts that information and appends it to each BLAST result, producing a fully annotated output file. --- ## Usage Each sequence header in the curated BLAST database includes taxonomy metadata in a structured format, with fields separated by `=` and whitespace. The tool identifies the reads and annotations source, and appends them in the tabular rows, so the source and taxa positions match those of BLAST output using a genbank database. Using the `--taxon_levels` argument, you can specify which header positions correspond to taxonomic ranks (e.g., kingdom, phylum, genus, species). > ⚠️ **Important:** > The `--taxon_levels` argument is critical — change it only if you fully understand your database’s header structure. ### When to Use | Database Type | Need This Script? | Reason | |-----------------------------|-------------------|--------| | **Curated BLAST database** | ✅ Yes | Taxonomy exists only in headers | | **GenBank-based BLAST** | ❌ No | Taxonomy already included in tabular file | ### Command Line Interface The add_taxonomic_labels tool can be run as a Python script: ```bash python add_taxonomic_labels.py \ --input blast_results.tabular \ --output labeled_results.tabular \ --taxon_levels "1 2 4 7 11 12 13" ``` #### General use case The tool serves a single, clear purpose. In the input example, the taxonomic information appears only in the sequence headers, while the corresponding annotation fields in the file are marked as *unknown*. The tool extracts the taxonomy data from the headers and inserts it into the appropriate annotation fields, replacing the unknown values. ```text Input M01687:476:000000000-LL5F5:1:1102:12299:1165_CONS(1758) source=NCBI sequenceID=EU382995 superkingdom=Eukaryota kingdom=Viridiplantae phylum=Streptophyta subphylum=Streptophytina class=Magnoliopsida subclass=NA infraclass=NA order=Ranunculales suborder=NA infraorder=NA superfamily=NA family=Ranunculaceae genus=Ranunculus species=Ranunculus repens markercode=trnL lat=NA lon=NA source=NCBI N/A 100.000 100 1.24e-38 152 Genbank unknown kingdom / unknown phylum / unknown class / unknown order / unknown family / unknown genus / unknown species Output M01687:476:000000000-LL5F5:1:1102:12299:1165_CONS(1758) source=NCBI sequenceID=EU382995 superkingdom=Eukaryota kingdom=Viridiplantae phylum=Streptophyta subphylum=Streptophytina class=Magnoliopsida subclass=NA infraclass=NA order=Ranunculales suborder=NA infraorder=NA superfamily=NA family=Ranunculaceae genus=Ranunculus species=Ranunculus repens markercode=trnL lat=NA lon=NA source=NCBI N/A 100.000 100 1.24e-38 152 NCBI Viridiplantae / Streptophyta / Magnoliopsida / Ranunculales / Ranunculaceae / Ranunculus / Ranunculus repens ``` ### Galaxy integration The tool is also available through the Galaxy platform: - **Galaxy Toolshed**: The add_taxonomic_labels tool is available in the Galaxy Toolshed, enabling easy installation into any Galaxy instance. - **Web-based interface**: Users can upload sequence files, configure validation parameters through the GUI, run validations, and download results. - **Workflow integration**: The tool can be incorporated into Galaxy workflows for automated processing pipelines. To use the tool in Galaxy: 1. Install the tool from the Galaxy Toolshed (search for "add_taxonomic_labels") 2. Upload your BLAST files to your Galaxy history 3. Configure parameters through the GUI 4. Run the tool 5. View results and use the reformatted BLAST file for downstream analysis ## License No license yet ## Citation If you use this software in your research, please cite this repository. ## Contact For questions or issues: - GitHub Issues: https://github.com/Onnodg/Naturalis_NLOOR/issues - Email: onno.gorter@naturalis.nl (until Febuary 2026) ## Acknowledgments This tool was developed to support the New lights on old remedies project, a PhD project by Anja Fischer.
