Galaxy |

What it does

With Recentrifuge, researchers can interactively explore what organisms are in their samples and at which level of confidence, enabling robust comparative analysis of multiple samples in any metagenomic study.

Removes diverse contaminants, including crossovers, using a novel robust contamination removal algorithm.

Provides a confidence level for every result, since the calculated score propagates to all the downstream analysis and comparisons.

Unveils the generalities and specificities in the metagenomic samples, thanks to a new comparative analysis engine.

Recentrifuge is especially useful when a more reliable detection of minority organisms is needed (e.g. in the case of low microbial biomass metagenomic studies) in clinical, environmental, or forensic analysis. Beyond the standard confidence levels, Recentrifuge implements others devoted to variable length reads, very convenient for complex datasets generated by nanopore sequencers.

Input option Recentrifuge can deal with some different taxonomic output files. Input files can come from centrifuge, kraken, clark of lmat software. A generic fonction to accept other files is available but need to add information of the file content. If generic is choose, the option format need a string like : 'TYP:csv,TID:1,LEN:3,SCO:6,UNC:0'. Where TYP are csv/tsv/ssv, and the rest of fields indicate the number of column used (starting in 1) for the TaxIDs assigned,the LENgth of the read, the SCOre given to the assignment"

Database for recentrifuge

Recentrifuge first need the taxonomic database from NCBI (nodes.dmp and names.dmp). We also provide the option to directly load necessary files from history as a dataset list. 1. cached for already installed taxonomic databases 2. history to load from your history

Output options

1. Depending of the option provided, the file output format can be csv, tsv or xlsx and be combine in one or more files (extra). 3. By default a html file is generated to visualize data, could be remove using the nohtml option

Advanced options

Recentrifuge can integrate sample in the data which are negative control to normalize the data
Scoring is an option to choose the score method for the read classified by taxonomic tools : SHEL (Single Hit Equivalent Length): This is a score value in pair bases roughly equivalent to a single hit to the database. KRAKEN: This scoring scheme is only available for this classifier. It divides the k-mer hit count of the top assignment by the total k-mers in the read and multiplies the result by 100 to give a percentage of coverage (the fraction of the read k-mers covered by k-mers belonging to the read final assignment). This is the default scoring scheme for Kraken samples, and it supports the mixing of samples with different read length. LENGTH: The score of a read will be its length (or the combined length of mate pairs). LOGLENGTH: Logarithm (base 10) of the length score. NORMA: This score is the normalized score SHEL / LENGTH in percentage, so it takes into account both the assignment quality and the length of the read. Very useful when both the score assignments and lengths are variable among the reads. LMAT: This scoring scheme is only available for this classifier. CLARK_C: This scoring scheme is not available for other classifiers. It takes the confidence score as the score for a read, conf=h1/(h1+h2), or 1-conf=h2/(h1+h2) in case the majority of a read is not classified (1st assignment unclassified). See CLARK's README file for details on how h1 and h2 are calculated. If you use this scoring, you will probably want to filter to a minimum of 0.5 (-y 0.5) or beyond, as under 0.5 the assignments have very low confidence. CLARK_G: This scheme scores every read with its CLARK gamma score, so it is only available for this classifier.
You can choose a filter for read quality using the minscore option (--minscore)
You can include or exclude specific taxa using the NCBI taxid code

More advanced options

You can choose a filter for read quality specifically on the control samples
You cans specify the minimum taxa value to avoid collapsing one level into parent
A summary option is available produce a summary file Some other options are available and explicite in the more advanced panel of the tool

rcf - Release 1.8.1 - Mar 2022

Copyright (C) 2017–2022, Jose Manuel Martí Martínez

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.