view mothur/tools/mothur/cooccurrence.xml @ 35:95d75b35e4d2

Updated tools to use Mothur 1.33. Added some misc. fixes and updates (blast repository, tool fixes)
author certain cat
date Fri, 31 Oct 2014 15:09:32 -0400
parents 49058b1f8d3f
children
line wrap: on
line source

<tool id="mothur_cooccurrence" name="Cooccurrence" version="1.26.0" >
 <description>tests whether presence-absence patterns differ from chance</description>
 <command interpreter="python">
  mothur_wrapper.py 
  #import re, os.path
  --cmd='cooccurrence'
  --result='^mothur.\S+\.logfile$:'$logfile,'^\S+\.cooccurence\.summary$:'$out_summary
  --outputdir='$logfile.extra_files_path'
  --shared=$shared
  --metric=$metric
  --matrixmodel=$matrixmodel
  #if len($iters.__str__) > 0 and int($iters.__str__) > 0:
   --iters=$iters
  #end if
  #if $label.__str__ != "None" and len($label.__str__) > 0:
   --label='$label'
  #end if
  #if $groups.__str__ != "None" and len($groups.__str__) > 0:
    --groups=$groups
  #end if
 </command>
 <inputs>
  <param name="shared" type="data" format="shared" label="shared - OTU Shared file"/>
  <param name="iters" type="integer" value="1000" optional="true" label="iters - Number of iterations to try (default 1000)">
    <validator type="in_range" message="Number of iterations must be positive" min="1"/>
  </param>
  <param name="metric" type="select" label="metric - test metric for scoring">
    <option value="cscore" selected="true">cscore - species segregation</option>
    <option value="checker">checker - species segregation</option>
    <option value="combo">combo - unique species pairs</option>
    <option value="vratio">vratio - variance</option>
  </param>
  <param name="matrixmodel" type="select" label="matrixmodel - the scoring matrix" 
         help="See the notes below on choosing a metric/matrixmodel combination">
    <option value="sim1">sim1</option>
    <option value="sim2" selected="true">sim2</option>
    <option value="sim3">sim3</option>
    <option value="sim4">sim4</option>
    <option value="sim5">sim5</option>
    <option value="sim6">sim6</option>
    <option value="sim7">sim7</option>
    <option value="sim8">sim8</option>
    <option value="sim9">sim9</option>
  </param>

  <param name="groups" type="select" optional="true" label="groups - Groups to include" multiple="true"
     help="By default all are included if no selection is made.">
   <options>
    <filter type="data_meta" ref="shared" key="groups" />
   </options>
  </param>
  <param name="label" type="select" optional="true" label="label - Select OTU Labels to include" multiple="true" 
     help="By default all are included if no selection is made.">
   <options>
    <filter type="data_meta" ref="shared" key="labels" />
   </options>
  </param>
 </inputs>
 <outputs>
  <data format="html" name="logfile" label="${tool.name} on ${on_string}: logfile" />
  <data format="tabular" name="out_summary" label="${tool.name} on ${on_string}: cooccurence.summary" />
 </outputs>
 <requirements>
  <requirement type="package" version="1.33">mothur</requirement>
 </requirements>
 <tests>
 </tests>
 <help>
**Mothur Overview**

Mothur_, initiated by Dr. Patrick Schloss and his software development team
in the Department of Microbiology and Immunology at The University of Michigan,
provides bioinformatics for the microbial ecology community.

.. _Mothur: http://www.mothur.org/wiki/Main_Page

**Command Documenation**

The cooccurrence_ command variance calculates four metrics and tests their significance to assess whether presence-absence patterns are different than what one would expect by chance.   The input is a shared_ file.  The output can be filtered by groups and labels.



**metric**

The metric parameter options are **cscore**, **checker**, **combo** and **vratio**. Default=cscore. The cscore or checkerboard score [1] is a metric that measures species segregation. It is the mean number of checkerboard units per species pair. The checker metric [2] counts the number of species pairs forming a perfect checkerboard. The combo metric [3] is the number of unique species pairs. The vratio or variance ratio [4] is a measure of the species association calculated by the ratio of the variance in total species number to the sum of the variances of the species.  ::

	[1] Stone, L., and A. Roberts. 1990. The checkerboard score and species distributions. Ocelogia. 85:74-79.
	[2] Diamond, J. M. 1975. Assembly of species communities. Pages 342-444 in M. L. Cody and J. M. Diamond, editors. Ecology and evolution of communities. Harvard University Press, Cambridge, Massachusetts, USA.
	[3] Pielou, D. P., and E. C. Pielou. 1968 Association among species of infrequent occurrence: the insect and spider fauna of Polypours betulinus (Bulliard) Fries. Journal of Theoretical Biology 21:202-216.
	[4] Schluter, D. 1984. A variance test for detecting species associations, with some example applications. Ecology 65:998-1005.
	[5] Gotelli, Nicholas J. 2000. NULL MODEL ANALYSIS OF SPECIES CO-OCCURRENCE PATTERNS. Ecology 81:2606-2621.



**matrixmodel**

The matrixmodel parameter allows you to select the model you would like to use. Options are sim1, sim2, sim3, sim4, sim5, sim6, sim7, sim8 and sim9. Default=sim2.

Each sim implements a different algorithm for generating null matrices with constraints on the rows (species) and columns (sites).::
 ===================== ====================== ======================= ====================== 
  Rows                  Columns equiprobable   Columns proportional    Column totals fixed   
 ===================== ====================== ======================= ====================== 
  Rows equiprobable     sim1                   sim6                    sim3                  
  Rows proportional     sim7                   sim8                    sim5                  
  Row totals fixed      sim2                   sim4                    sim9                  
 ===================== ====================== ======================= ====================== 

Equiprobable rows or columns means that each row, column or both is not dependent on the original co-occurrence matrix. Each species or site has an equal change of occurring in the null matrix.
Proportional rows or columns means that the proportion of occurrences in rows, columns or both in the original co-occurrence matrix are preserved but the totals may differ. Each species or site's chances of occurring are proportional to their occurrence in the original co-occurrence matrix.
Fixed row or column totals preserves the total number of occurrences in rows, columns or both in the original co-occurrence matrix. Sim9 is a special case that is not probabilistic. Since both the row and column totals are preserved the only way to randomize the matrix is with a checkerboard swap. When a checkerboard appears in the matrix the 1s and 0s are swapped to their mirror image to preserve the species and site totals.

Checkerboard::

  10
  01

Swap::

  01
  10


suggested metric/matrixmodel combinations::
    ========  ========  ======== ========
     cscore    checker   combo    vratio
    ========  ========  ======== ========
     sim9      sim9      sim9     sim2
     sim2      sim2      sim2     sim4
     -         -         sim4     sim8
     -         -         sim8     -
    ========  ========  ======== ========


Careful readers will note that none of the suggested matrixmodels have equiprobable rows (species). This is because tests of co-occurrence are quite sensitive to the frequency of species occurrence. As such, rowtotals should be maintained or at least kept proportional in the null models. Sim9 is well suited to co-occurrence matrices that have an "island list" structure. Island lists are often found in classical ecology datasets that contain species with well defined habitat patches and are rarely degenerate (matrices that contain empty rows or columns). Sim2 is well suited for co-occurrence matrices that have a "sample list" structure. Sample list structured data are found where species have relatively homogeneous habitats and degenerate matrices are not uncommon. In these matrices species will often occur in only one site.
The default values of cscore and sim2 have been selected because the c-score is not very sensitive to noise in the data and when used with sim9 or sim2 is not particularly prone to false positives. Sim2 has been chosen because of the prevalence of degenerate matrices. These are just guidelines, however, be sure to select a metric and matrix model that is best suited to the type of data you are analyzing.
It should be noted that sim9 cannot be used with vratio because in sim9 both the column and row totals are maintained, hence there will be no variance.
Please see [5] for more details on metric/null model selection.


.. _shared: http://www.mothur.org/wiki/Shared_file
.. _cooccurrence: http://www.mothur.org/wiki/Cooccurrence

v1.26.0: Updated to Mothur 1.33

 </help>
</tool>