Mercurial > repos > onnodg > add_taxonomic_labels
diff README.md @ 2:f4b8ab4ed24e draft
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
| author | onnodg |
|---|---|
| date | Mon, 15 Dec 2025 16:49:00 +0000 |
| parents | |
| children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.md Mon Dec 15 16:49:00 2025 +0000 @@ -0,0 +1,85 @@ +# Add Taxonomic Labels Script + +This script processes BLAST output files from a **curated BLAST database** and prepares them for downstream taxonomic analysis. + +In curated BLAST results, taxonomic labels are often missing or marked as “unknown,” because taxonomy information is stored only in the sequence headers. +This script extracts that information and appends it to each BLAST result, producing a fully annotated output file. + +--- + +## Usage + +Each sequence header in the curated BLAST database includes taxonomy metadata in a structured format, with fields separated by `=` and whitespace. The tool identifies the reads and annotations source, and appends them in the tabular rows, so the source and taxa positions match those of BLAST output using a genbank database. + +Using the `--taxon_levels` argument, you can specify which header positions correspond to taxonomic ranks (e.g., kingdom, phylum, genus, species). + +> ⚠️ **Important:** +> The `--taxon_levels` argument is critical — change it only if you fully understand your database’s header structure. + + + +### When to Use + +| Database Type | Need This Script? | Reason | +|-----------------------------|-------------------|--------| +| **Curated BLAST database** | ✅ Yes | Taxonomy exists only in headers | +| **GenBank-based BLAST** | ❌ No | Taxonomy already included in tabular file | + + + +### Command Line Interface +The add_taxonomic_labels tool can be run as a Python script: + +```bash +python add_taxonomic_labels.py \ + --input blast_results.tabular \ + --output labeled_results.tabular \ + --taxon_levels "1 2 4 7 11 12 13" +``` + +#### General use case + +The tool serves a single, clear purpose. In the input example, the taxonomic information appears only in the sequence headers, while the corresponding annotation fields in the file are marked as *unknown*. The tool extracts the taxonomy data from the headers and inserts it into the appropriate annotation fields, replacing the unknown values. + +```text +Input +M01687:476:000000000-LL5F5:1:1102:12299:1165_CONS(1758) source=NCBI sequenceID=EU382995 superkingdom=Eukaryota kingdom=Viridiplantae phylum=Streptophyta subphylum=Streptophytina class=Magnoliopsida subclass=NA infraclass=NA order=Ranunculales suborder=NA infraorder=NA superfamily=NA family=Ranunculaceae genus=Ranunculus species=Ranunculus repens markercode=trnL lat=NA lon=NA source=NCBI N/A 100.000 100 1.24e-38 152 Genbank unknown kingdom / unknown phylum / unknown class / unknown order / unknown family / unknown genus / unknown species + +Output +M01687:476:000000000-LL5F5:1:1102:12299:1165_CONS(1758) source=NCBI sequenceID=EU382995 superkingdom=Eukaryota kingdom=Viridiplantae phylum=Streptophyta subphylum=Streptophytina class=Magnoliopsida subclass=NA infraclass=NA order=Ranunculales suborder=NA infraorder=NA superfamily=NA family=Ranunculaceae genus=Ranunculus species=Ranunculus repens markercode=trnL lat=NA lon=NA source=NCBI N/A 100.000 100 1.24e-38 152 NCBI Viridiplantae / Streptophyta / Magnoliopsida / Ranunculales / Ranunculaceae / Ranunculus / Ranunculus repens +``` + +### Galaxy integration + +The tool is also available through the Galaxy platform: + +- **Galaxy Toolshed**: The add_taxonomic_labels tool is available in the Galaxy Toolshed, + enabling easy installation into any Galaxy instance. +- **Web-based interface**: Users can upload sequence files, configure validation parameters through the GUI, + run validations, and download results. +- **Workflow integration**: The tool can be incorporated into Galaxy workflows for automated processing pipelines. + +To use the tool in Galaxy: +1. Install the tool from the Galaxy Toolshed (search for "add_taxonomic_labels") +2. Upload your BLAST files to your Galaxy history +3. Configure parameters through the GUI +4. Run the tool +5. View results and use the reformatted BLAST file for downstream analysis + +## License + +No license yet + +## Citation + +If you use this software in your research, please cite this repository. + +## Contact + +For questions or issues: +- GitHub Issues: https://github.com/Onnodg/Naturalis_NLOOR/issues +- Email: onno.gorter@naturalis.nl (until Febuary 2026) + +## Acknowledgments + +This tool was developed to support the New lights on old remedies project, a PhD project by Anja Fischer.
