annotate tools/filters/seq_filter_by_id.xml @ 1:262f08104540 draft

Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
author peterjc
date Mon, 15 Apr 2013 12:27:30 -0400
parents 5844f6a450ed
children abdd608c869b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
1 <tool id="seq_filter_by_id" name="Filter sequences by ID" version="0.0.4">
0
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
2 <description>from a tabular file</description>
1
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
3 <version_command interpreter="python">seq_filter_by_id.py --version</version_command>
0
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
4 <command interpreter="python">
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
5 seq_filter_by_id.py $input_tabular $columns $input_file $input_file.ext
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
6 #if $output_choice_cond.output_choice=="both"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
7 $output_pos $output_neg
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
8 #elif $output_choice_cond.output_choice=="pos"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
9 $output_pos -
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
10 #elif $output_choice_cond.output_choice=="neg"
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
11 - $output_neg
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
12 #end if
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
13 </command>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
14 <inputs>
1
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
15 <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file to filter on the identifiers" help="FASTA, FASTQ, or SFF format." />
0
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
16 <param name="input_tabular" type="data" format="tabular" label="Tabular file containing sequence identifiers"/>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
17 <param name="columns" type="data_column" data_ref="input_tabular" multiple="True" numerical="False" label="Column(s) containing sequence identifiers" help="Multi-select list - hold the appropriate key while clicking to select multiple columns">
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
18 <validator type="no_options" message="Pick at least one column"/>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
19 </param>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
20 <conditional name="output_choice_cond">
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
21 <param name="output_choice" type="select" label="Output positive matches, negative matches, or both?">
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
22 <option value="both">Both positive matches (ID on list) and negative matches (ID not on list), as two files</option>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
23 <option value="pos">Just positive matches (ID on list), as a single file</option>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
24 <option value="neg">Just negative matches (ID not on list), as a single file</option>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
25 </param>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
26 <!-- Seems need these dummy entries here, compare this to indels/indel_sam2interval.xml -->
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
27 <when value="both" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
28 <when value="pos" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
29 <when value="neg" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
30 </conditional>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
31 </inputs>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
32 <outputs>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
33 <data name="output_pos" format="fasta" label="With matched ID">
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
34 <!-- TODO - Replace this with format="input:input_fastq" if/when that works -->
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
35 <change_format>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
36 <when input_dataset="input_file" attribute="extension" value="sff" format="sff" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
37 <when input_dataset="input_file" attribute="extension" value="fastq" format="fastq" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
38 <when input_dataset="input_file" attribute="extension" value="fastqsanger" format="fastqsanger" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
39 <when input_dataset="input_file" attribute="extension" value="fastqsolexa" format="fastqsolexa" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
40 <when input_dataset="input_file" attribute="extension" value="fastqillumina" format="fastqillumina" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
41 <when input_dataset="input_file" attribute="extension" value="fastqcssanger" format="fastqcssanger" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
42 </change_format>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
43 <filter>output_choice_cond["output_choice"] != "neg"</filter>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
44 </data>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
45 <data name="output_neg" format="fasta" label="Without matched ID">
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
46 <!-- TODO - Replace this with format="input:input_fastq" if/when that works -->
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
47 <change_format>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
48 <when input_dataset="input_file" attribute="extension" value="sff" format="sff" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
49 <when input_dataset="input_file" attribute="extension" value="fastq" format="fastq" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
50 <when input_dataset="input_file" attribute="extension" value="fastqsanger" format="fastqsanger" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
51 <when input_dataset="input_file" attribute="extension" value="fastqsolexa" format="fastqsolexa" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
52 <when input_dataset="input_file" attribute="extension" value="fastqillumina" format="fastqillumina" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
53 <when input_dataset="input_file" attribute="extension" value="fastqcssanger" format="fastqcssanger" />
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
54 </change_format>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
55 <filter>output_choice_cond["output_choice"] != "pos"</filter>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
56 </data>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
57 </outputs>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
58 <tests>
1
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
59 <test>
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
60 <param name="input_file" value="k12_ten_proteins.fasta" ftype="fasta" />
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
61 <param name="input_tabular" value="k12_hypothetical.tabular" ftype="tabular" />
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
62 <param name="columns" value="1" />
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
63 <param name="output_choice" value="pos" />
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
64 <output name="output_pos" file="k12_hypothetical.fasta" ftype="fasta" />
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
65 </test>
0
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
66 </tests>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
67 <requirements>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
68 <requirement type="python-module">Bio</requirement>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
69 </requirements>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
70 <help>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
71
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
72 **What it does**
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
73
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
74 By default it divides a FASTA, FASTQ or Standard Flowgram Format (SFF) file in
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
75 two, those sequences with or without an ID present in the tabular file column(s)
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
76 specified. You can opt to have a single output file of just the matching records,
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
77 or just the non-matching ones.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
78
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
79 Note that the order of sequences in the original sequence file is preserved, as
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
80 is any Roche XML Manifest in an SFF file. Also, if any sequences share an
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
81 identifier (which would be very unusual in SFF files), duplicates are not removed.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
82
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
83 **Example Usage**
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
84
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
85 You may have performed some kind of contamination search, for example running
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
86 BLASTN against a database of cloning vectors or bacteria, giving you a tabular
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
87 file containing read identifiers. You could use this tool to extract only the
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
88 reads without BLAST matches (i.e. those which do not match your contaminant
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
89 database).
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
90
1
262f08104540 Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents: 0
diff changeset
91 You may have a file of FASTA sequences which has been used with some analysis
0
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
92 tool giving tabular output, which has then been filtered on some criteria.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
93 You can then use this tool to divide the original FASTA file into those entries
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
94 matching or not matching your criteria (those with or without their identifier
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
95 in the filtered tabular file).
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
96
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
97 **Citation**
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
98
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
99 This tool uses Biopython to read and write SFF files. If you use this tool in
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
100 scientific work leading to a publication, please cite the Biopython application
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
101 note (and Galaxy too of course):
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
102
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
103 Cock et al 2009. Biopython: freely available Python tools for computational
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
104 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
105 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878.
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
106
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
107 </help>
5844f6a450ed Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff changeset
108 </tool>