Mercurial > repos > peterjc > seq_filter_by_id
annotate tools/filters/seq_filter_by_id.xml @ 1:262f08104540 draft
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
author | peterjc |
---|---|
date | Mon, 15 Apr 2013 12:27:30 -0400 |
parents | 5844f6a450ed |
children | abdd608c869b |
rev | line source |
---|---|
1
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
1 <tool id="seq_filter_by_id" name="Filter sequences by ID" version="0.0.4"> |
0
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
2 <description>from a tabular file</description> |
1
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
3 <version_command interpreter="python">seq_filter_by_id.py --version</version_command> |
0
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
4 <command interpreter="python"> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
5 seq_filter_by_id.py $input_tabular $columns $input_file $input_file.ext |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
6 #if $output_choice_cond.output_choice=="both" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
7 $output_pos $output_neg |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
8 #elif $output_choice_cond.output_choice=="pos" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
9 $output_pos - |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
10 #elif $output_choice_cond.output_choice=="neg" |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
11 - $output_neg |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
12 #end if |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
13 </command> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
14 <inputs> |
1
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
15 <param name="input_file" type="data" format="fasta,fastq,sff" label="Sequence file to filter on the identifiers" help="FASTA, FASTQ, or SFF format." /> |
0
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
16 <param name="input_tabular" type="data" format="tabular" label="Tabular file containing sequence identifiers"/> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
17 <param name="columns" type="data_column" data_ref="input_tabular" multiple="True" numerical="False" label="Column(s) containing sequence identifiers" help="Multi-select list - hold the appropriate key while clicking to select multiple columns"> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
18 <validator type="no_options" message="Pick at least one column"/> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
19 </param> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
20 <conditional name="output_choice_cond"> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
21 <param name="output_choice" type="select" label="Output positive matches, negative matches, or both?"> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
22 <option value="both">Both positive matches (ID on list) and negative matches (ID not on list), as two files</option> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
23 <option value="pos">Just positive matches (ID on list), as a single file</option> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
24 <option value="neg">Just negative matches (ID not on list), as a single file</option> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
25 </param> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
26 <!-- Seems need these dummy entries here, compare this to indels/indel_sam2interval.xml --> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
27 <when value="both" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
28 <when value="pos" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
29 <when value="neg" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
30 </conditional> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
31 </inputs> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
32 <outputs> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
33 <data name="output_pos" format="fasta" label="With matched ID"> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
34 <!-- TODO - Replace this with format="input:input_fastq" if/when that works --> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
35 <change_format> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
36 <when input_dataset="input_file" attribute="extension" value="sff" format="sff" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
37 <when input_dataset="input_file" attribute="extension" value="fastq" format="fastq" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
38 <when input_dataset="input_file" attribute="extension" value="fastqsanger" format="fastqsanger" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
39 <when input_dataset="input_file" attribute="extension" value="fastqsolexa" format="fastqsolexa" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
40 <when input_dataset="input_file" attribute="extension" value="fastqillumina" format="fastqillumina" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
41 <when input_dataset="input_file" attribute="extension" value="fastqcssanger" format="fastqcssanger" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
42 </change_format> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
43 <filter>output_choice_cond["output_choice"] != "neg"</filter> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
44 </data> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
45 <data name="output_neg" format="fasta" label="Without matched ID"> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
46 <!-- TODO - Replace this with format="input:input_fastq" if/when that works --> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
47 <change_format> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
48 <when input_dataset="input_file" attribute="extension" value="sff" format="sff" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
49 <when input_dataset="input_file" attribute="extension" value="fastq" format="fastq" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
50 <when input_dataset="input_file" attribute="extension" value="fastqsanger" format="fastqsanger" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
51 <when input_dataset="input_file" attribute="extension" value="fastqsolexa" format="fastqsolexa" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
52 <when input_dataset="input_file" attribute="extension" value="fastqillumina" format="fastqillumina" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
53 <when input_dataset="input_file" attribute="extension" value="fastqcssanger" format="fastqcssanger" /> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
54 </change_format> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
55 <filter>output_choice_cond["output_choice"] != "pos"</filter> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
56 </data> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
57 </outputs> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
58 <tests> |
1
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
59 <test> |
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
60 <param name="input_file" value="k12_ten_proteins.fasta" ftype="fasta" /> |
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
61 <param name="input_tabular" value="k12_hypothetical.tabular" ftype="tabular" /> |
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
62 <param name="columns" value="1" /> |
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
63 <param name="output_choice" value="pos" /> |
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
64 <output name="output_pos" file="k12_hypothetical.fasta" ftype="fasta" /> |
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
65 </test> |
0
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
66 </tests> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
67 <requirements> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
68 <requirement type="python-module">Bio</requirement> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
69 </requirements> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
70 <help> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
71 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
72 **What it does** |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
73 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
74 By default it divides a FASTA, FASTQ or Standard Flowgram Format (SFF) file in |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
75 two, those sequences with or without an ID present in the tabular file column(s) |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
76 specified. You can opt to have a single output file of just the matching records, |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
77 or just the non-matching ones. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
78 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
79 Note that the order of sequences in the original sequence file is preserved, as |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
80 is any Roche XML Manifest in an SFF file. Also, if any sequences share an |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
81 identifier (which would be very unusual in SFF files), duplicates are not removed. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
82 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
83 **Example Usage** |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
84 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
85 You may have performed some kind of contamination search, for example running |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
86 BLASTN against a database of cloning vectors or bacteria, giving you a tabular |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
87 file containing read identifiers. You could use this tool to extract only the |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
88 reads without BLAST matches (i.e. those which do not match your contaminant |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
89 database). |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
90 |
1
262f08104540
Uploaded v0.0.4 which includes a unit test and is faster at filtering FASTA files with large records (e.g. whole chromosomes)
peterjc
parents:
0
diff
changeset
|
91 You may have a file of FASTA sequences which has been used with some analysis |
0
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
92 tool giving tabular output, which has then been filtered on some criteria. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
93 You can then use this tool to divide the original FASTA file into those entries |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
94 matching or not matching your criteria (those with or without their identifier |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
95 in the filtered tabular file). |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
96 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
97 **Citation** |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
98 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
99 This tool uses Biopython to read and write SFF files. If you use this tool in |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
100 scientific work leading to a publication, please cite the Biopython application |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
101 note (and Galaxy too of course): |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
102 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
103 Cock et al 2009. Biopython: freely available Python tools for computational |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
104 molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
105 http://dx.doi.org/10.1093/bioinformatics/btp163 pmid:19304878. |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
106 |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
107 </help> |
5844f6a450ed
Migrated tool version 0.0.1 from old tool shed archive to new tool shed repository
peterjc
parents:
diff
changeset
|
108 </tool> |