Galaxy | Tool Preview

Filter SFF by ID (version 0.0.2)
Multi-select list - hold the appropriate key while clicking to select multiple columns

Deprecated

This tool is now obsolete, and should not be used in future. It has been replaced by a more general version covering FASTA, FASTQ and SFF in one single tool.

What it does

By default it divides a Standard Flowgram Format (SFF) file in two, those sequences with or without an ID present in the tabular file column(s) specified. You can opt to have a single output file of just the matching records, or just the non-matching ones.

Note that the order of sequences in the original SFF file is preserved, as is any Roche XML Manifest. Also, if any sequences share an identifier (which would be very unusual in SFF files, duplicates are not removed).

Example Usage

You may have performed some kind of contamination search, for example running BLASTN against a database of cloning vectors or bacteria, giving you a tabular file containing read identifiers. You could use this tool to extract only the reads without BLAST matches (i.e. those which do not match your contaminant database).

Citation

This tool uses Biopython to read and write SFF files. If you use this tool in scientific work leading to a publication, please cite the Biopython application note:

Cock et al 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.