view tools/filters/seq_select_by_id.xml @ 3:19e26966ed3e draft

Uploaded v0.0.6, handles Biopython dependency via the ToolShed, adopted MIT license, using reStructuredTest for the README file. No functional changes.
author peterjc
date Mon, 29 Jul 2013 09:13:13 -0400
parents 28d52478ace9
line wrap: on
line source

<tool id="seq_select_by_id" name="Select sequences by ID" version="0.0.6">
	<description>from a tabular file</description>
	<version_command interpreter="python"> --version</version_command>
	<command interpreter="python"> $input_tabular $column $input_file $input_file.ext $output_file
                <!-- Anything other than zero is an error -->
                <exit_code range="1:" />
                <exit_code range=":-1" />
		<param name="input_file" type="data" format="fasta,qual,fastq,sff" label="Sequence file to select from" help="FASTA, QUAL, FASTQ, or SFF format." />
		<param name="input_tabular" type="data" format="tabular" label="Tabular file containing sequence identifiers"/>
		<param name="column" type="data_column" data_ref="input_tabular" multiple="False" numerical="False" label="Column containing sequence identifiers"/>
		<data name="output_file" format="fasta" label="Selected sequences">
			<!-- TODO - Replace this with format="input:input_fastq" if/when that works -->
				<when input_dataset="input_file" attribute="extension" value="sff" format="sff" />
				<when input_dataset="input_file" attribute="extension" value="fastq" format="fastq" />
				<when input_dataset="input_file" attribute="extension" value="fastqsanger" format="fastqsanger" />
				<when input_dataset="input_file" attribute="extension" value="fastqsolexa" format="fastqsolexa" />
				<when input_dataset="input_file" attribute="extension" value="fastqillumina" format="fastqillumina" />
				<when input_dataset="input_file" attribute="extension" value="fastqcssanger" format="fastqcssanger" />
			<param name="input_file" value="k12_ten_proteins.fasta" ftype="fasta" />
			<param name="input_tabular" value="k12_hypothetical.tabular" ftype="tabular" />
			<param name="column" value="1" />
			<output name="output_file" file="k12_hypothetical.fasta" ftype="fasta" />
		<requirement type="python-module">Bio</requirement>

**What it does**

Takes a FASTA, QUAL, FASTQ or Standard Flowgram Format (SFF) file and produces a
new sequence file (of the same format) containing only the records with identifiers
in the tabular file (in the order from the tabular file).

WARNING: If you have any duplicates in the tabular file identifiers, you will get
duplicate sequences in the output.


This tool uses Biopython to read, write and index sequence files. If you use
this tool in scientific work leading to a publication, please cite the
Biopython application note (and Galaxy too of course):

Cock et al 2009. Biopython: freely available Python tools for computational
molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. pmid:19304878.

This tool is available to install into other Galaxy Instances via the Galaxy
Tool Shed at