view ComMet_wrapper.xml @ 0:dfdfbdd47b32 default tip

migrate from GitHub
author yutaka-saito
date Sun, 19 Apr 2015 20:55:17 +0900
parents
children
line wrap: on
line source

<tool id="ComMet" name="ComMet" version="1.0.0">
  <description>Detection of differentially methylated regions from bisulfite-seq mapping data</description>
<!--
  <version_command></version_command>
-->

  <requirements>
    <requirement type="set_environment">TOOLDIR</requirement>
  </requirements>

  <command interpreter="perl">
    ComMet_wrapper.pl TOOLDIR $intype.mapper 

    #if $intype.mapper=="bsf-call"
      $in1 $in2 
    #else if $intype.mapper=="commet"
      $in 
    #else
      
    #end if

    $outdmc $outdmr
  </command>

  <inputs>
    <conditional name="intype">
      <param name="mapper" type="select" label="input type">
        <option value="bsf-call">bsf-call</option>
        <option value="commet">commet</option>
      </param>
      <when value="bsf-call">
        <param name="in1" type="data" format="tabular" label="bsf-call file for sample 1"/>
        <param name="in2" type="data" format="tabular" label="bsf-call file for sample 2"/>
      </when>
      <when value="commet">
        <param name="in" type="data" format="tabular" label="commet input file"/>
      </when>
    </conditional> 

  </inputs>

  <outputs>
    <data name="outdmc" format="tabular" label="${tool.name} on ${on_string}: differential methylation at individual cytosine sites"/>
    <data name="outdmr" format="tabular" label="${tool.name} on ${on_string}: differentially methylated regions"/>
  </outputs>

  <help>
**ComMet**

Detection of differentially methylated regions from bisulfite-seq mapping data

------

**Input format**

Let us consider that we detect differentially methylated regions by comparing sample1 and sample2. 
Inputs are a pair of two files, each of which contain bisulfite-seq mapping data obtained from sample1 or sample2. 
Each file should be in the format supported by the bsf-call tool::

  Col.| Description
  ----+--------------------------------------
  1   | chromosome label (e.g. chr1)
  2   | genomic position (0-based)
  3   | strand (+,-)
  4   | mC context (CG, CHG, CHH)
  5   | mC rate (float)
  6   | read coverage

Alternatively, you can use one input file, which contains bisulfite-seq mapping data for both samples (commet format)::

  Col.| Description
  ----+--------------------------------------
  1   | chromosome name 
  2   | 0-based genomic position
  3   | number of reads supporting mC in sample1
  4   | number of reads not supporting mC in sample1
  5   | number of reads supporting mC in sample2
  6   | number of reads not supporting mC in sample2
  
  reads supporting mC = C-C matches
  reads not supporting mC = otherwise

Make sure chromosome names and genomic positions are sorted by "sort -k1,1 -k2,2n".

Note that input files do not contain strand information. 
Normally, you should integrate both strands by summing the read counts at two neighbor CpGs,
i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand. 
Alternatively, if you are interested in strand-specific DMRs, you can prepare two input files 
for plus and minus strands, and apply them to ComMet separately. 


------

**Output format**

Output1 contains information of differential methylation at individual cytosine sites::
  
  Col.| Description
  ----+--------------------------------------
  1   | chromosome name 
  2   | 0-based genomic position
  3   | mC ratio in sample1
  4   | mC ratio in sample2 
  5   | prob. for hypermethylation (UP) in sample1 against sample2
  6   | prob. for hypomethylation (DOWN) in sample1 against sample2
  7   | prob. for no methylation change (NoCh) between sample1 and sample2

Output2 contains information of detected DMRs::
  
  Col.| Description
  ----+--------------------------------------
  1   | chromosome name 
  2   | 0-based genomic start position
  3   | 0-based genomic stop position
  4   | direction of differential methylation (UP/DOWN) comparing sample1 to sample2
  5   | log-likelihood ratio score 
  6   | log-likelihood ratio score divided by DMR length

Make sure output1 and output2 are used properly considering the purpose of your study. 
You should use output1 if you are interested only in differential methylation at 
individual cytosine sites (Note that it is the purpose of most existing packages for 
bisulfite sequencing data analysis developed by other groups).
ComMet is mainly designed for DMR detection, i.e. determining precise boundaries of 
regional differential methylation, even if DMRs include some cytosine sites whose 
observed methylation changes are relatively weak due to limited sequencing depth. 
Such an analysis is useful for identifying biologically important DMRs such as 
cis regulatory elements; output2 is suitable for this purpose. 

------

**FAQ**

\Q. What is the meaning of the error "distance between neighbor CpGs must not be less than 2"?
::
  
  A. 
  Your input file contains invalid genomic positions. 
  By definition of CpG, the base next to C must be G, and therefore two neighbor CpGs should be 
  separated by at least two bases. Your input file may violate this rule for several reasons.  
  First, the input file may contain two neighbor CpGs from different strands, 
  i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand. 
  See the "Input format" section above for this issue. 
  Second, the input file may contain cytosines in non-CpG context; just remove them. 

  
\Q. The read counts in the example input file are decimals rather than integers. Why?  
::
  
  A. 
  Either decimals or integers can be used for read counts in input files. 
  The reason that the example input file contains decimals is that some alignment tools produce 
  probability-weighted read counts. Of course, you can use your favorite aligners for preparing 
  input files that may contain integers only.


\Q. Can ComMet compute statistical significance (p-values) rather than likelihood ratio scores?
::

  A.
  No. But we are planning to address this issue in the next version of ComMet. 

------

**Contact**

Yutaka Saito

yutaka.saito AT aist.go.jp
  </help>

  <citations>
    <citation type="doi">10.1093/nar/gkt1373</citation>
  </citations>

</tool>