comparison blast_annotations_processor.xml @ 1:2acf82433aa4 draft default tip

planemo upload for repository https://github.com/Onnodg/Naturalis_NLOOR/tree/main/NLOOR_scripts/process_annotations_tool commit d771f9fbfd42bcdeda1623d954550882a0863847-dirty
author onnodg
date Mon, 20 Oct 2025 12:26:51 +0000
parents a3989edf0a4a
children
comparison
equal deleted inserted replaced
0:a3989edf0a4a 1:2acf82433aa4
1 <tool id="blast_annotation_processor" name="BLAST Annotation Processor" version="1.0.0"> 1 <tool id="blast_annotation_processor" name="BLAST Annotation Processor" version="1.0.1">
2 <description>Process BLAST annotation results with taxonomic analysis</description> 2 <description>Process BLAST annotation results with taxonomic analysis</description>
3 3
4 <requirements> 4 <requirements>
5 <requirement type="package" version="3.12.3">python</requirement> 5 <requirement type="package" version="3.12.3">python</requirement>
6 <requirement type="package" version="3.10.6">matplotlib</requirement> 6 <requirement type="package" version="3.10.6">matplotlib</requirement>
55 <param name="outputs" type="select" multiple="true" display="checkboxes" 55 <param name="outputs" type="select" multiple="true" display="checkboxes"
56 label="Select outputs to generate" help="Choose which analysis outputs to create"> 56 label="Select outputs to generate" help="Choose which analysis outputs to create">
57 <option value="eval_plot">E-value distribution plot</option> 57 <option value="eval_plot">E-value distribution plot</option>
58 <option value="taxa_output">Taxonomic report (Kraken2-like format)</option> 58 <option value="taxa_output">Taxonomic report (Kraken2-like format)</option>
59 <option value="circle_data">Circular taxonomic datafile</option> 59 <option value="circle_data">Circular taxonomic datafile</option>
60 <option value="header_anno">Header annotations table</option> 60 <option value="header_anno">Annotations per header (in Excel)</option>
61 <option value="anno_stats">Annotation statistics</option> 61 <option value="anno_stats">Annotation statistics</option>
62 </param> 62 </param>
63 63
64 <!-- Processing Parameters --> 64 <!-- Processing Parameters -->
65 <section name="advanced" title="Advanced Parameters" expanded="false"> 65 <section name="advanced" title="Advanced Parameters" expanded="false">
163 163
164 - **Taxonomic report**: Kraken2-like format report showing taxonomic composition with read counts and percentages. Includes information about uncertain taxonomic assignments. 164 - **Taxonomic report**: Kraken2-like format report showing taxonomic composition with read counts and percentages. Includes information about uncertain taxonomic assignments.
165 165
166 - **Circular taxonomic data**: Json data to generate a circular sunburst-style diagram showing taxonomic composition across all taxonomic levels (Kingdom -> Species). 166 - **Circular taxonomic data**: Json data to generate a circular sunburst-style diagram showing taxonomic composition across all taxonomic levels (Kingdom -> Species).
167 167
168 - **Header annotations table**: Excel workbook listing each sequence header with its taxonomic assignment and E-value. 168 - **Annotations per header**: Excel workbook listing each sequence header with its taxonomic assignment and E-value.
169 169
170 - **Annotation statistics**: Summary statistics about annotation success rates and sequence counts. 170 - **Annotation statistics**: Summary statistics about annotation success rates and sequence counts.
171 171
172 **Parameters:** 172 **Parameters:**
173 173
174 - **Uncertain threshold**: When multiple conflicting taxonomic assignments exist for a sequence, this threshold determines whether to use the most common assignment (if it exceeds the threshold) or mark it as "Uncertain taxa". 174 - **Uncertain threshold**: Treshold for lca. When multiple conflicting taxonomic assignments exist for a sequence, this threshold determines whether to use the most common assignment (if it exceeds the threshold) or mark it as "Uncertain taxa".
175 175
176 - **E-value threshold**: Sequences with E-values higher than this threshold are filtered out from the analysis. 176 - **E-value threshold**: Sequences with E-values higher than this threshold are filtered out from the analysis.
177 177
178 - **Use read counts**: Determines whether circular data reflects the abundance of reads (checked) or just count unique taxonomic assignments (unchecked). 178 - **Use read counts**: Determines whether circular data reflects the abundance of reads (checked) or just count unique taxonomic assignments (unchecked).
179 #Query ID #Subject #Subject accession #Subject Taxonomy ID #Identity percentage 179
180 #Coverage #evalue #bitscore #Source #Taxonomy
181 **Expected Input Format:** 180 **Expected Input Format:**
182 181
183 The annotated BLAST file should be in tabular format with at least 7 columns: 182 The annotated BLAST file should be in tabular format with at least 7 columns:
184 1. Query ID 183
185 2. Subject ID 184 - 1. Query ID
186 3. Subject accession 185
187 4. Subject Taxonomy ID 186 - 2. Subject ID
188 5. Identity percentage 187
189 6. Coverage 188 - 3. Subject accession
190 7. Evalue 189
191 8. Bitscore 190 - 4. Subject Taxonomy ID
192 9. Source 191
193 10. Taxonomy 192 - 5. Identity percentage
193
194 - 6. Coverage
195
196 - 7. Evalue
197
198 - 8. Bitscore
199
200 - 9. Source
201
202 - 10. Taxonomy
194 203
195 **Note:** This tool processes files that have been deduplicated and contain read count information in the sequence headers in the format: `sequence_name(count_number)`. 204 **Note:** This tool processes files that have been deduplicated and contain read count information in the sequence headers in the format: `sequence_name(count_number)`.
196 205
206 -------------
207
208 .. class:: infomark
209
197 **Credits** 210 **Credits**
198 Authors = Onno de Gorter, 2025. 211
199 Based on a script by Nick Kortleven, translated, modified and wrapped by Onno de Gorter, 212 Based on a script by Nick Kortleven, translated, modified and wrapped by Onno de Gorter,
200 Developed for the New light on old remedies project, a PhD research by Anja Fischer 213 Developed for the New light on old remedies project, a PhD research by Anja Fischer.
214
215 Link to the project website:
216
217 * https://ahm.uva.nl/funded-research-projects/new-lights-on-old-remedies/new-lights-on-old-remedies.html
218
201 ]]></help> 219 ]]></help>
220 <creator>
221 <organization name="Naturalis Biodiversity Center" url="https://www.naturalis.nl/en/science" />
222 <person givenName="Onno" familyName="de Gorter" url="https://github.com/Onnodg"/>
223 <person givenName="Nick" familyName="Kortleven" url="https://github.com/tombkingsts" />
224 </creator>
202 </tool> 225 </tool>