view mothur/tools/mothur/chimera.bellerophon.xml @ 2:e990ac8a0f58

Migrated tool version 1.19.0 from old tool shed archive to new tool shed repository
author jjohnson
date Tue, 07 Jun 2011 17:39:06 -0400
parents fcc0778f6987
children ce6e81622c6a
line wrap: on
line source

<tool id="mothur_chimera_bellerophon" name="Chimera.bellerophon" version="1.19.0">
 <description>Find putative chimeras using bellerophon</description>
 <command interpreter="python">
  mothur_wrapper.py 
  --cmd='chimera.bellerophon'
  --result='^mothur.\S+\.logfile$:'$logfile,'^\S+\.bellerophon\.chimeras$:'$out_file,'^\S+\.bellerophon\.accnos$:'$out_accnos
  --outputdir='$logfile.extra_files_path'
  --fasta=$fasta
  $filter
  $correction
  #if int($window.__str__) > 0:
   --window=$window
  #end if
  #if int($increment.__str__) > 0:
   --increment=$increment
  #end if
  --processors=2
 </command>
 <inputs>
  <param name="fasta" type="data" format="fasta" label="fasta - Candiate Sequences"/>
  <param name="filter" type="boolean" falsevalue="" truevalue="--filter=true" checked="false" label="filter - Apply a 50% soft vertical filter"/>
  <param name="correction" type="boolean" falsevalue="--correction=false" truevalue="" checked="true" label="correction - Use the square root of the distances instead of the distance value"/>
  <param name="window" type="integer" value="0" label="window - Length of sequence you want in each window analyzed (uses default if &lt; 1)" 
         help="Default is 25% of the sequence length."/>
  <param name="increment" type="integer" value="25" label="increment - Increment for window slide on each iteration (uses default if &lt; 1)"
         help="Default is 25, but you may set it up to sequence length minus twice the window."/>
 </inputs>
 <outputs>
  <data format="html" name="logfile" label="${tool.name} on ${on_string}: logfile" />
  <data format="txt" name="out_file" label="${tool.name} on ${on_string}: bellerophon.chimeras" />
  <data format="accnos" name="out_accnos" label="${tool.name} on ${on_string}: bellerophon.accnos" />
 </outputs>
 <requirements>
  <requirement type="binary">mothur</requirement>
 </requirements>
 <tests>
 </tests>
 <help>
**Mothur Overview**

Mothur_, initiated by Dr. Patrick Schloss and his software development team
in the Department of Microbiology and Immunology at The University of Michigan,
provides bioinformatics for the microbial ecology community.

.. _Mothur: http://www.mothur.org/wiki/Main_Page

**Command Documenation**

The chimera.bellerophon_ command identifies putative chimeras using the bellerophon_ approach.

Advantages of Bellerophon:

 1) You can process all sequences from a PCR-clone library in a single analysis and don't have to inspect outputs for every sequence in the dataset. 
 2) The approximate putative breakpoint is calculated using a sliding window (see above) and will help verification of the chimera manually.
 3) A chimeric sequence is not only tested against two (putative) parent sequences but rather is assessed by how well it fits into the complete phylogenetic environment of a multiple sequence alignment. Hence sequences do not become invisible to the program as is the case with CHIMERA_CHECK (see Ref 1 below). 
 4) The calculations Bellerophon uses to detect chimeric sequences are computationally relatively cheap and results are quickly calculated for datasets with up 50 sequences (~1 min). Larger datasets take longer - 100 sequences ~30 min, 300 sequences ~8 hours. 

Tips for using Bellerophon:

 1) Bellerophon works most efficiently if the parent sequences or non-chimeric sequences closely related to the parent sequences are present in the dataset analyzed. Therefore, as many sequences as possible from the one PCR-clone library should be included in the analysis since the parent sequences of any chimera are most likely to be in that dataset. Addition of non-chimeric outgroup sequences (e.g. from isolates) may help refine an analysis by providing reference points (and a broader phylogenetic context) in the analysis, but be aware of increasing analysis time with bigger datasets. 
 2) Bellerophon is compromised by using sequences of different lengths as this can produce artificial skews in distance matrices of fragments of the alignment. Datasets containing sequences of the same length and covering the same portion of the gene should be used (usually not an issue with sequences from a PCR-clone library). The filter will automatically remove sequences too short for the window size, i.e. less than 600 bp for a window size of 300. 
 3) If possible multiple window sizes should be used as the number of identified chimeras can vary with the choice of the window size. 
 4) Re-running the dataset without the first reported chimeras may identify additional putative chimeras by reducing noise in the analysis. Ideally, the dataset should continue to be re-run removing previously reported chimeras until no chimeras are identified. 
 5) Bellerophon should be used in concert with other detection methods such as CHIMERA_CHECK and putatively identified chimeras should always be confirmed by manual inspection of the sequences for signature shifts. 


.. _bellerophon: http://comp-bio.anu.edu.au/Bellerophon/doc/doc.html
.. _chimera.bellerophon: http://www.mothur.org/wiki/Chimera.bellerophon


 </help>
</tool>