view mothur/tools/mothur/pre.cluster.xml @ 2:e990ac8a0f58

Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
author jjohnson
date Tue, 07 Jun 2011 17:39:06 -0400
parents fcc0778f6987
children 5265aa9067e0
line wrap: on
line source

<tool id="mothur_pre_cluster" name="Pre.cluster" version="1.19.0">
 <description>Remove sequences due to pyrosequencing errors</description>
 <command interpreter="python">
  mothur_wrapper.py 
  #import re, os.path
  #set results = ["'^mothur.\S+\.logfile$:'" + $logfile.__str__]
  ## adds .precluster before the last extension to the input file
  #set results = $results + ["'" + $re.sub(r'(^.*)\.(.*?)',r'\1.precluster.\2',$os.path.basename($fasta.__str__)) + ":'" + $fasta_out.__str__]
  #set results = $results + ["'" + $re.sub(r'(^.*)\.(.*?)$',r'\1.precluster.names',$os.path.basename($fasta.__str__)) + ":'" + $names_out.__str__]
  --cmd='pre.cluster'
  --outputdir='$logfile.extra_files_path'
  --fasta=$fasta
  #if $name.__str__ != "None" and len($name.__str__) > 0:
   --name=$name
  #end if
  #if 20 >= int($diffs.__str__) >= 0:
   --diffs=$diffs
  #end if
  --result=#echo ','.join($results)
 </command>
 <inputs>
  <param name="fasta" type="data" format="fasta" label="fasta - Sequence Fasta"/>
  <param name="name" type="data" format="names" optional="true" label="name - Sequences Name reference"/>
  <param name="diffs" type="integer" value="1" label="diffs - Number of mismatched bases to allow between sequences in a group (default 1)"/>
 </inputs>
 <outputs>
  <data format="html" name="logfile" label="${tool.name} on ${on_string}: logfile" />
  <data format="fasta" name="fasta_out" label="${tool.name} on ${on_string}: precluster.fasta" />
  <data format="names" name="names_out" label="${tool.name} on ${on_string}: precluster.names" />
 </outputs>
 <requirements>
  <requirement type="binary">mothur</requirement>
 </requirements>
 <tests>
 </tests>
 <help>
**Mothur Overview**

Mothur_, initiated by Dr. Patrick Schloss and his software development team
in the Department of Microbiology and Immunology at The University of Michigan,
provides bioinformatics for the microbial ecology community.

.. _Mothur: http://www.mothur.org/wiki/Main_Page

**Command Documenation**

The pre.cluster_ command implements a pseudo-single linkage algorithm with the goal of removing sequences that are likely due to pyrosequencing errors. The basic idea is that abundant sequences are more likely to generate erroneous sequences than rare sequences. With that in mind, the algorithm proceeds by ranking sequences in order of their abundance. Then we walk through the list of sequences looking for rarer sequences that are within some threshold of the original sequence. Those that are within the threshold are merged with the larger sequence. The original Huse method performs this task on a distance matrix, whereas we do it based on the original sequences. The advantage of our approach is that the algorithm works on aligned sequences instead of a distance matrix. This is advantageous because by pre-clustering you remove a large number of sequences making the distance calculation much faster.

.. _pre.cluster: http://www.mothur.org/wiki/Pre.cluster


 </help>
</tool>