Repository revision
repository tip
Select a revision to inspect and download versions of Galaxy utilities from this repository.

Repository bbtools_bbduk
Owner: iuc
Synopsis: BBTools: BBduk tool from the bbtools suite
BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data.
BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw,
with autodetection of quality encoding and interleaving. It is written in Java and works on any platform supporting
Java, including Linux, MacOS, and Microsoft Windows and Linux; there are no dependencies other than Java (version
7 or higher). Program descriptions and options are shown when running the shell scripts with no parameters.
Type: unrestricted
Revision: 10:d58c27d2c5a7
This revision can be installed: True
Times cloned / installed: 544

Repository README files - may contain important installation or license information

IMPORTANT NOTE REGARDING SYSTEM CONFIGURATION

All of the Galaxy wrappers contained herein call the respective bbtools' shell wrapper, which calls the underlying java-based tool. Unlike a C-based program, java will grab a pre-determined amount of memory at the very beginning of the execution.

Some of the algorithms (e.g. bbnorm) utilise a hash table, and potential collusions can decrease the numeric accuracy of the output. This problem is expected to become more pronounced if the fraction of the memory occupied w.r.t. allocated memory becomes high, i.e. when the available memory is low and/or the input file is big. If the tool generates a warning to stderr, and will be caught by the Galaxy wrapper resulting in a failed job. However, count min sketch <https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch> does not run out of memory, this is a gradual effect, and will NOT trigger a fatal error unless the load reaches this critically high level. You can read more about the implications of this at the BBtools manual <https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbnorm-guide/>

If you are administering a heteregenous computing environment with multiple nodes of very different quantities of physically available RAM, it is recommended to define a global cap on the RAM to be used to avoid introducing run-to-run bias by exporting an environmental variable, by something like: export _JAVA_OPTIONS="-Xmx2048m -Xms256m"

The tool currently considers the following limits, in the given priority order: 1) _JAVA_OPTIONS 2) JAVA_TOOL_OPTIONS 3) GALAXY_MEMORY_MB 4) 4 GB

Contents of this repository

Name Description Version Minimum Galaxy Version
decontamination using kmers 39.08+galaxy1 22.01

Categories
Sequence Analysis - Tools for performing Protein and DNA/RNA analysis