Galaxy | Tool Preview

Filter sequences by mapping (version 0.0.8)
FASTA, FASTQ, or SFF format.
SAM or BAM format.

What it does

By default it divides a FASTA, FASTQ or Standard Flowgram Format (SFF) file in two, those sequences (or read pairs) which do or don't map in the provided SAM/BAM file. You can opt to have a single output file of just the mapping reads, or just the non-mapping ones.

Example Usage

You might wish to perform a contamination screan by mapping your reads against known contaminant reference sequences, then use this tool to select only the unmapped reads for further analysis (e.g. de novo assembly).

Similarly you might wish to map your reads against a known bacterial reference, then take the non-mapping sequences forward for analysis if looking for novel plasmids.

References

If you use this Galaxy tool in work leading to a scientific publication please cite:

Peter J.A. Cock (2014), Galaxy tool for filtering reads by mapping http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_mapping

This tool uses Biopython to read and write SFF files, so you may also wish to cite the Biopython application note (and Galaxy too of course):

Cock et al (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. https://doi.org/10.1093/bioinformatics/btp163 pmid:19304878.

This tool is available to install into other Galaxy Instances via the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/seq_filter_by_mapping