diff README.md @ 2:f4b8ab4ed24e draft

planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/add_header_tool commit 4017d38cf327c48a6252e488ba792527dae97a70-dirty
author onnodg
date Mon, 15 Dec 2025 16:49:00 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.md	Mon Dec 15 16:49:00 2025 +0000
@@ -0,0 +1,85 @@
+# Add Taxonomic Labels Script
+
+This script processes BLAST output files from a **curated BLAST database** and prepares them for downstream taxonomic analysis.
+
+In curated BLAST results, taxonomic labels are often missing or marked as “unknown,” because taxonomy information is stored only in the sequence headers.  
+This script extracts that information and appends it to each BLAST result, producing a fully annotated output file.
+
+---
+
+## Usage
+
+Each sequence header in the curated BLAST database includes taxonomy metadata in a structured format, with fields separated by `=` and whitespace. The tool identifies the reads and annotations source, and appends them in the tabular rows, so the source and taxa positions match those of BLAST output using a genbank database.
+
+Using the `--taxon_levels` argument, you can specify which header positions correspond to taxonomic ranks (e.g., kingdom, phylum, genus, species).
+
+> ⚠️ **Important:**  
+> The `--taxon_levels` argument is critical — change it only if you fully understand your database’s header structure.
+
+
+
+### When to Use
+
+| Database Type              | Need This Script? | Reason |
+|-----------------------------|-------------------|--------|
+| **Curated BLAST database**  | ✅ Yes            | Taxonomy exists only in headers |
+| **GenBank-based BLAST**     | ❌ No             | Taxonomy already included in tabular file |
+
+
+
+### Command Line Interface
+The add_taxonomic_labels tool can be run as a Python script:
+
+```bash
+python add_taxonomic_labels.py \
+  --input blast_results.tabular \
+  --output labeled_results.tabular \
+  --taxon_levels "1 2 4 7 11 12 13"
+```
+
+#### General use case
+
+The tool serves a single, clear purpose. In the input example, the taxonomic information appears only in the sequence headers, while the corresponding annotation fields in the file are marked as *unknown*. The tool extracts the taxonomy data from the headers and inserts it into the appropriate annotation fields, replacing the unknown values.
+
+```text
+Input
+M01687:476:000000000-LL5F5:1:1102:12299:1165_CONS(1758)	source=NCBI   sequenceID=EU382995   superkingdom=Eukaryota   kingdom=Viridiplantae   phylum=Streptophyta   subphylum=Streptophytina   class=Magnoliopsida   subclass=NA   infraclass=NA   order=Ranunculales   suborder=NA   infraorder=NA   superfamily=NA   family=Ranunculaceae   genus=Ranunculus   species=Ranunculus repens   markercode=trnL   lat=NA   lon=NA	source=NCBI	N/A	100.000	100	1.24e-38	152	Genbank	unknown kingdom / unknown phylum / unknown class / unknown order / unknown family / unknown genus / unknown species
+
+Output
+M01687:476:000000000-LL5F5:1:1102:12299:1165_CONS(1758)	source=NCBI   sequenceID=EU382995   superkingdom=Eukaryota   kingdom=Viridiplantae   phylum=Streptophyta   subphylum=Streptophytina   class=Magnoliopsida   subclass=NA   infraclass=NA   order=Ranunculales   suborder=NA   infraorder=NA   superfamily=NA   family=Ranunculaceae   genus=Ranunculus   species=Ranunculus repens   markercode=trnL   lat=NA   lon=NA	source=NCBI	N/A	100.000	100	1.24e-38	152	NCBI	Viridiplantae / Streptophyta / Magnoliopsida / Ranunculales / Ranunculaceae / Ranunculus / Ranunculus repens
+```
+
+### Galaxy integration
+
+The tool is also available through the Galaxy platform:
+
+- **Galaxy Toolshed**: The add_taxonomic_labels tool is available in the Galaxy Toolshed, 
+  enabling easy installation into any Galaxy instance.
+- **Web-based interface**: Users can upload sequence files, configure validation parameters through the GUI, 
+  run validations, and download results.
+- **Workflow integration**: The tool can be incorporated into Galaxy workflows for automated processing pipelines.
+
+To use the tool in Galaxy:
+1. Install the tool from the Galaxy Toolshed (search for "add_taxonomic_labels")
+2. Upload your BLAST files to your Galaxy history
+3. Configure parameters through the GUI
+4. Run the tool
+5. View results and use the reformatted BLAST file for downstream analysis
+
+## License
+
+No license yet
+
+## Citation
+
+If you use this software in your research, please cite this repository.
+
+## Contact
+
+For questions or issues:
+- GitHub Issues: https://github.com/Onnodg/Naturalis_NLOOR/issues
+- Email: onno.gorter@naturalis.nl (until Febuary 2026)
+
+## Acknowledgments
+
+This tool was developed to support the New lights on old remedies project, a PhD project by Anja Fischer.