comparison tools/mira4/mira4_bait.xml @ 0:6a88b42ce6b9 draft

Uploaded v0.0.4, previously only on the TestToolShed
author peterjc
date Fri, 21 Nov 2014 06:42:56 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:6a88b42ce6b9
1 <tool id="mira_4_0_bait" name="MIRA v4.0 mirabait" version="0.0.3">
2 <description>Filter reads using kmer matches</description>
3 <requirements>
4 <requirement type="binary">mirabait</requirement>
5 <requirement type="package" version="4.0">MIRA</requirement>
6 </requirements>
7 <version_command interpreter="python">mira4_bait.py --version</version_command>
8 <command interpreter="python">
9 mira4_bait.py $input_reads.ext $output_choice $strand_choice $kmer_length $min_occurence "$bait_file" "$input_reads" "$output_reads"
10 </command>
11 <stdio>
12 <!-- Assume anything other than zero is an error -->
13 <exit_code range="1:" />
14 <exit_code range=":-1" />
15 </stdio>
16 <inputs>
17 <param name="bait_file" type="data" format="fasta,fastq,mira" required="true" label="Bait file (what to look for)" />
18 <param name="input_reads" type="data" format="fasta,fastq,mira" required="true" label="Reads to search" />
19 <param name="output_choice" type="select" label="Output positive matches, or negative matches?">
20 <option value="pos">Just positive matches</option>
21 <option value="neg">Just negative matches</option>
22 </param>
23 <param name="strand_choice" type="select" label="Check for matches on both strands?">
24 <option value="both">Check both strands</option>
25 <option value="fwd">Just forward strand</option>
26 </param>
27 <param name="kmer_length" type="integer" value="31" min="1" max="32"
28 label="k-mer length" help="Maximum 32" />
29 <param name="min_occurence" type="integer" value="1" min="1"
30 label="Minimum k-mer occurence"
31 help="How many k-mer matches do you want per read? Minimum one" />
32 </inputs>
33 <outputs>
34 <data name="output_reads" format="input" metadata_source="input_reads"
35 label="$input_reads.name #if str($output_choice)=='pos' then 'matching' else 'excluding matches to' # $bait_file.name"/>
36 </outputs>
37 <tests>
38 <test>
39 <param name="bait_file" value="tvc_bait.fasta" ftype="fasta" />
40 <param name="input_reads" value="tvc_mini.fastq" ftype="fastqsanger" />
41 <output name="output_reads" file="tvc_mini_bait_pos.fastq" ftype="fastqsanger" />
42 </test>
43 <test>
44 <param name="bait_file" value="tvc_bait.fasta" ftype="fasta" />
45 <param name="input_reads" value="tvc_mini.fastq" ftype="fastqsanger" />
46 <param name="kmer_length" value="32" />
47 <param name="min_occurence" value="50" />
48 <output name="output_reads" file="tvc_mini_bait_strict.fastq" ftype="fastqsanger" />
49 </test>
50 <test>
51 <param name="bait_file" value="tvc_bait.fasta" ftype="fasta" />
52 <param name="input_reads" value="tvc_mini.fastq" ftype="fastqsanger" />
53 <param name="output_choice" value="neg" />
54 <output name="output_reads" file="tvc_mini_bait_neg.fastq" ftype="fastqsanger" />
55 </test>
56 </tests>
57 <help>
58 **What it does**
59
60 Runs the ``mirabait`` utility from MIRA v4.0 to filter your input reads
61 according to whether or not they contain perfect kmer matches to your
62 bait file. By default this looks for 31-mers (kmers or *k*-mers where
63 the fragment length *k* is 31), and only requires a single matching kmer.
64
65 The ``mirabait`` utility is useful in many applications and pipelines
66 outside of using the main MIRA tool for assembly or mapping.
67
68 .. class:: warningmark
69
70 Note ``mirabait`` cannot be used on protein (amino acid) sequences.
71
72 **Example Usage**
73
74 To remove over abundant entries like rRNA sequences, run ``mirabait`` with
75 known rRNA sequences as the bait and select the *negative* matches.
76
77 To do targeted assembly by fishing out reads belonging to a gene and just
78 assemble these, run ``mirabait`` with the gene of interest as the bait and
79 select the *positive* matches.
80
81 To iteratively reconstruct mitochondria you could start by fishing out reads
82 matching any known mitochondrial sequence, assembly those, and repeat.
83
84
85 **Notes on paired read**
86
87 .. class:: warningmark
88
89 While MIRA4 is aware of many read naming conventions to identify paired read
90 partners, the ``mirabait`` tool considers each read in isolation. Applying
91 it to paired read files may leave you with orphaned reads.
92
93
94 **Citation**
95
96 If you use this Galaxy tool in work leading to a scientific publication please
97 cite the following papers:
98
99 Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
100 Galaxy tools and workflows for sequence analysis with applications
101 in molecular plant pathology. PeerJ 1:e167
102 http://dx.doi.org/10.7717/peerj.167
103
104 Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
105 Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
106 Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
107 http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
108
109 This wrapper is available to install into other Galaxy Instances via the Galaxy
110 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
111 </help>
112 </tool>