Mercurial > repos > onnodg > blast_annotations_processor
comparison blast_annotations_processor.xml @ 1:2acf82433aa4 draft default tip
planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit d771f9fbfd42bcdeda1623d954550882a0863847-dirty
| author | onnodg |
|---|---|
| date | Mon, 20 Oct 2025 12:26:51 +0000 |
| parents | a3989edf0a4a |
| children |
comparison
equal
deleted
inserted
replaced
| 0:a3989edf0a4a | 1:2acf82433aa4 |
|---|---|
| 1 <tool id="blast_annotation_processor" name="BLAST Annotation Processor" version="1.0.0"> | 1 <tool id="blast_annotation_processor" name="BLAST Annotation Processor" version="1.0.1"> |
| 2 <description>Process BLAST annotation results with taxonomic analysis</description> | 2 <description>Process BLAST annotation results with taxonomic analysis</description> |
| 3 | 3 |
| 4 <requirements> | 4 <requirements> |
| 5 <requirement type="package" version="3.12.3">python</requirement> | 5 <requirement type="package" version="3.12.3">python</requirement> |
| 6 <requirement type="package" version="3.10.6">matplotlib</requirement> | 6 <requirement type="package" version="3.10.6">matplotlib</requirement> |
| 55 <param name="outputs" type="select" multiple="true" display="checkboxes" | 55 <param name="outputs" type="select" multiple="true" display="checkboxes" |
| 56 label="Select outputs to generate" help="Choose which analysis outputs to create"> | 56 label="Select outputs to generate" help="Choose which analysis outputs to create"> |
| 57 <option value="eval_plot">E-value distribution plot</option> | 57 <option value="eval_plot">E-value distribution plot</option> |
| 58 <option value="taxa_output">Taxonomic report (Kraken2-like format)</option> | 58 <option value="taxa_output">Taxonomic report (Kraken2-like format)</option> |
| 59 <option value="circle_data">Circular taxonomic datafile</option> | 59 <option value="circle_data">Circular taxonomic datafile</option> |
| 60 <option value="header_anno">Header annotations table</option> | 60 <option value="header_anno">Annotations per header (in Excel)</option> |
| 61 <option value="anno_stats">Annotation statistics</option> | 61 <option value="anno_stats">Annotation statistics</option> |
| 62 </param> | 62 </param> |
| 63 | 63 |
| 64 <!-- Processing Parameters --> | 64 <!-- Processing Parameters --> |
| 65 <section name="advanced" title="Advanced Parameters" expanded="false"> | 65 <section name="advanced" title="Advanced Parameters" expanded="false"> |
| 163 | 163 |
| 164 - **Taxonomic report**: Kraken2-like format report showing taxonomic composition with read counts and percentages. Includes information about uncertain taxonomic assignments. | 164 - **Taxonomic report**: Kraken2-like format report showing taxonomic composition with read counts and percentages. Includes information about uncertain taxonomic assignments. |
| 165 | 165 |
| 166 - **Circular taxonomic data**: Json data to generate a circular sunburst-style diagram showing taxonomic composition across all taxonomic levels (Kingdom -> Species). | 166 - **Circular taxonomic data**: Json data to generate a circular sunburst-style diagram showing taxonomic composition across all taxonomic levels (Kingdom -> Species). |
| 167 | 167 |
| 168 - **Header annotations table**: Excel workbook listing each sequence header with its taxonomic assignment and E-value. | 168 - **Annotations per header**: Excel workbook listing each sequence header with its taxonomic assignment and E-value. |
| 169 | 169 |
| 170 - **Annotation statistics**: Summary statistics about annotation success rates and sequence counts. | 170 - **Annotation statistics**: Summary statistics about annotation success rates and sequence counts. |
| 171 | 171 |
| 172 **Parameters:** | 172 **Parameters:** |
| 173 | 173 |
| 174 - **Uncertain threshold**: When multiple conflicting taxonomic assignments exist for a sequence, this threshold determines whether to use the most common assignment (if it exceeds the threshold) or mark it as "Uncertain taxa". | 174 - **Uncertain threshold**: Treshold for lca. When multiple conflicting taxonomic assignments exist for a sequence, this threshold determines whether to use the most common assignment (if it exceeds the threshold) or mark it as "Uncertain taxa". |
| 175 | 175 |
| 176 - **E-value threshold**: Sequences with E-values higher than this threshold are filtered out from the analysis. | 176 - **E-value threshold**: Sequences with E-values higher than this threshold are filtered out from the analysis. |
| 177 | 177 |
| 178 - **Use read counts**: Determines whether circular data reflects the abundance of reads (checked) or just count unique taxonomic assignments (unchecked). | 178 - **Use read counts**: Determines whether circular data reflects the abundance of reads (checked) or just count unique taxonomic assignments (unchecked). |
| 179 #Query ID #Subject #Subject accession #Subject Taxonomy ID #Identity percentage | 179 |
| 180 #Coverage #evalue #bitscore #Source #Taxonomy | |
| 181 **Expected Input Format:** | 180 **Expected Input Format:** |
| 182 | 181 |
| 183 The annotated BLAST file should be in tabular format with at least 7 columns: | 182 The annotated BLAST file should be in tabular format with at least 7 columns: |
| 184 1. Query ID | 183 |
| 185 2. Subject ID | 184 - 1. Query ID |
| 186 3. Subject accession | 185 |
| 187 4. Subject Taxonomy ID | 186 - 2. Subject ID |
| 188 5. Identity percentage | 187 |
| 189 6. Coverage | 188 - 3. Subject accession |
| 190 7. Evalue | 189 |
| 191 8. Bitscore | 190 - 4. Subject Taxonomy ID |
| 192 9. Source | 191 |
| 193 10. Taxonomy | 192 - 5. Identity percentage |
| 193 | |
| 194 - 6. Coverage | |
| 195 | |
| 196 - 7. Evalue | |
| 197 | |
| 198 - 8. Bitscore | |
| 199 | |
| 200 - 9. Source | |
| 201 | |
| 202 - 10. Taxonomy | |
| 194 | 203 |
| 195 **Note:** This tool processes files that have been deduplicated and contain read count information in the sequence headers in the format: `sequence_name(count_number)`. | 204 **Note:** This tool processes files that have been deduplicated and contain read count information in the sequence headers in the format: `sequence_name(count_number)`. |
| 196 | 205 |
| 206 ------------- | |
| 207 | |
| 208 .. class:: infomark | |
| 209 | |
| 197 **Credits** | 210 **Credits** |
| 198 Authors = Onno de Gorter, 2025. | 211 |
| 199 Based on a script by Nick Kortleven, translated, modified and wrapped by Onno de Gorter, | 212 Based on a script by Nick Kortleven, translated, modified and wrapped by Onno de Gorter, |
| 200 Developed for the New light on old remedies project, a PhD research by Anja Fischer | 213 Developed for the New light on old remedies project, a PhD research by Anja Fischer. |
| 214 | |
| 215 Link to the project website: | |
| 216 | |
| 217 * https://ahm.uva.nl/funded-research-projects/new-lights-on-old-remedies/new-lights-on-old-remedies.html | |
| 218 | |
| 201 ]]></help> | 219 ]]></help> |
| 220 <creator> | |
| 221 <organization name="Naturalis Biodiversity Center" url="https://www.naturalis.nl/en/science" /> | |
| 222 <person givenName="Onno" familyName="de Gorter" url="https://github.com/Onnodg"/> | |
| 223 <person givenName="Nick" familyName="Kortleven" url="https://github.com/tombkingsts" /> | |
| 224 </creator> | |
| 202 </tool> | 225 </tool> |
