Mercurial > repos > devteam > kraken_filter
view kraken-filter.xml @ 4:3d6570bffdb1 draft
"planemo upload for repository https://github.com/galaxyproject/tools-iuc/blob/master/tool_collections/kraken/ commit 06345505a91f3dcfa8a37dceb6f25e95806dddc8"
author | iuc |
---|---|
date | Wed, 04 Dec 2019 06:51:27 -0500 |
parents | 5bee9adae474 |
children | 67236d921e83 |
line wrap: on
line source
<tool id="kraken-filter" name="Kraken-filter" version="@WRAPPER_VERSION@"> <description>filter classification by confidence score</description> <macros> <import>macros.xml</import> </macros> <expand macro="requirements" /> <expand macro="version_command" /> <command detect_errors="exit_code"><![CDATA[ @SET_DATABASE_PATH@ && kraken-filter @INPUT_DATABASE@ --threshold $threshold '${input}' > '$filtered_output' ]]></command> <inputs> <param name="input" type="data" format="tabular" label="Kraken output" help="Select taxonomy classification produced by kraken"/> <param argument="--threshold" type="float" value="0" min="0" max="1" label="Confidence threshold" help="A floating point number between 0 and 1; default=0"/> <expand macro="input_database" /> </inputs> <outputs> <data format="tabular" name="filtered_output" /> </outputs> <tests> <test> <param name="input" value="kraken-filter/kraken_filter_test1.tab"/> <param name="threshold" value="0"/> <param name="kraken_database" value="new_style_test_entry"/> <output name="filtered_output" file="kraken-filter/kraken_filter_test1_output.tab" ftype="tabular"/> </test> </tests> <help> <![CDATA[ .. class:: warningmark **Note**: the database used must be the same as the one used in the original Kraken run ----- **What it does** At present, we have not yet developed a confidence score with a solid probabilistic interpretation for Kraken. However, we have developed a simple scoring scheme that has yielded good results for us, and we've made that available in the kraken-filter script. The approach we use allows a user to specify a threshold score in the [0,1] interval; the ``kraken-filter`` script then will adjust labels up the tree until the label's score (described below) meets or exceeds that threshold. If a label at the root of the taxonomic tree would not have a score exceeding the threshold, the sequence is called unclassified by ``kraken-filter``. A sequence label's score is a fraction C/Q, where C is the number of k-mers mapped to LCA values in the clade rooted at the label, and Q is the number of k-mers in the sequence that lack an ambiguous nucleotide (i.e., they were queried against the database). Consider the example of the LCA mappings in Kraken's output:: 562:13 561:4 A:31 0:1 562:3 would indicate that:: the first 13 k-mers mapped to taxonomy ID #562 the next 4 k-mers mapped to taxonomy ID #561 the next 31 k-mers contained an ambiguous nucleotide the next k-mer was not in the database the last 3 k-mers mapped to taxonomy ID #562 In this case, ID #561 is the parent node of #562. Here, a label of #562 for this sequence would have a score of C/Q = (13+3)/(13+4+1+3) = 16/21. A label of #561 would have a score of C/Q = (13+4+3)/(13+4+1+3) = 20/21. If a user specified a threshold over 16/21, kraken-filter would adjust the original label from #562 to #561; if the threshold was greater than 20/21, the sequence would become unclassified. ]]> </help> <expand macro="citations" /> </tool>