annotate mothur/tools/mothur/pre.cluster.xml @ 33:d53b9eb16c2d

mothur_wrapper.py - convert_value needs to return a String
author Jim Johnson <jj@umn.edu>
date Wed, 11 Sep 2013 16:56:31 -0500
parents 49058b1f8d3f
children 95d75b35e4d2
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
18
697156806162 Mothur - update for Mothur version 1.23.0
Jim Johnson <jj@umn.edu>
parents: 15
diff changeset
1 <tool id="mothur_pre_cluster" name="Pre.cluster" version="1.23.0">
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
2 <description>Remove sequences due to pyrosequencing errors</description>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
3 <command interpreter="python">
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
4 mothur_wrapper.py
2
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
5 #import re, os.path
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
6 #set results = ["'^mothur.\S+\.logfile$:'" + $logfile.__str__]
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
7 ## adds .precluster before the last extension to the input file
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
8 #set results = $results + ["'" + $re.sub(r'(^.*)\.(.*?)',r'\1.precluster.\2',$os.path.basename($fasta.__str__)) + ":'" + $fasta_out.__str__]
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
9 #set results = $results + ["'" + $re.sub(r'(^.*)\.(.*?)$',r'\1.precluster.names',$os.path.basename($fasta.__str__)) + ":'" + $names_out.__str__]
18
697156806162 Mothur - update for Mothur version 1.23.0
Jim Johnson <jj@umn.edu>
parents: 15
diff changeset
10 #set results = $results + ["'" + $re.sub(r'(^.*)\.(.*?)$',r'\1.precluster.map',$os.path.basename($fasta.__str__)) + ":'" + $map_out.__str__]
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
11 --cmd='pre.cluster'
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
12 --outputdir='$logfile.extra_files_path'
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
13 --fasta=$fasta
2
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
14 #if $name.__str__ != "None" and len($name.__str__) > 0:
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
15 --name=$name
2
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
16 #end if
15
a6189f58fedb Mothur - updated for Mothur version 1.22.0
Jim Johnson <jj@umn.edu>
parents: 7
diff changeset
17 #if $group.__str__ != "None" and len($group.__str__) > 0:
a6189f58fedb Mothur - updated for Mothur version 1.22.0
Jim Johnson <jj@umn.edu>
parents: 7
diff changeset
18 --group=$group
a6189f58fedb Mothur - updated for Mothur version 1.22.0
Jim Johnson <jj@umn.edu>
parents: 7
diff changeset
19 #end if
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
20 #if 20 >= int($diffs.__str__) >= 0:
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
21 --diffs=$diffs
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
22 #end if
2
e990ac8a0f58 Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
jjohnson
parents: 1
diff changeset
23 --result=#echo ','.join($results)
18
697156806162 Mothur - update for Mothur version 1.23.0
Jim Johnson <jj@umn.edu>
parents: 15
diff changeset
24 --processors=8
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
25 </command>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
26 <inputs>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
27 <param name="fasta" type="data" format="fasta" label="fasta - Sequence Fasta"/>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
28 <param name="name" type="data" format="names" optional="true" label="name - Sequences Name reference"/>
15
a6189f58fedb Mothur - updated for Mothur version 1.22.0
Jim Johnson <jj@umn.edu>
parents: 7
diff changeset
29 <param name="group" type="data" format="groups" optional="true" label="group - Sequences Name reference"/>
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
30 <param name="diffs" type="integer" value="1" label="diffs - Number of mismatched bases to allow between sequences in a group (default 1)"/>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
31 </inputs>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
32 <outputs>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
33 <data format="html" name="logfile" label="${tool.name} on ${on_string}: logfile" />
4
5265aa9067e0 set output formats to match input formats
jjohnson
parents: 2
diff changeset
34 <data format_source="fasta" name="fasta_out" label="${tool.name} on ${on_string}: precluster.fasta" />
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
35 <data format="names" name="names_out" label="${tool.name} on ${on_string}: precluster.names" />
18
697156806162 Mothur - update for Mothur version 1.23.0
Jim Johnson <jj@umn.edu>
parents: 15
diff changeset
36 <data format="tabular" name="map_out" label="${tool.name} on ${on_string}: precluster.map" />
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
37 </outputs>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
38 <requirements>
27
49058b1f8d3f Update to mothur version 1.27 and add tool_dependencies.xml to automatically install mothur
Jim Johnson <jj@umn.edu>
parents: 18
diff changeset
39 <requirement type="package" version="1.27">mothur</requirement>
0
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
40 </requirements>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
41 <tests>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
42 </tests>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
43 <help>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
44 **Mothur Overview**
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
45
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
46 Mothur_, initiated by Dr. Patrick Schloss and his software development team
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
47 in the Department of Microbiology and Immunology at The University of Michigan,
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
48 provides bioinformatics for the microbial ecology community.
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
49
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
50 .. _Mothur: http://www.mothur.org/wiki/Main_Page
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
51
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
52 **Command Documenation**
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
53
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
54 The pre.cluster_ command implements a pseudo-single linkage algorithm with the goal of removing sequences that are likely due to pyrosequencing errors. The basic idea is that abundant sequences are more likely to generate erroneous sequences than rare sequences. With that in mind, the algorithm proceeds by ranking sequences in order of their abundance. Then we walk through the list of sequences looking for rarer sequences that are within some threshold of the original sequence. Those that are within the threshold are merged with the larger sequence. The original Huse method performs this task on a distance matrix, whereas we do it based on the original sequences. The advantage of our approach is that the algorithm works on aligned sequences instead of a distance matrix. This is advantageous because by pre-clustering you remove a large number of sequences making the distance calculation much faster.
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
55
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
56 .. _pre.cluster: http://www.mothur.org/wiki/Pre.cluster
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
57
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
58
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
59 </help>
3202a38e44d9 Migrated tool version 1.15.1 from old tool shed archive to new tool shed repository
jjohnson
parents:
diff changeset
60 </tool>