0
|
1 <tool id="ComMet" name="ComMet" version="1.0.0">
|
|
2 <description>Detection of differentially methylated regions from bisulfite-seq mapping data</description>
|
|
3 <!--
|
|
4 <version_command></version_command>
|
|
5 -->
|
|
6
|
|
7 <requirements>
|
|
8 <requirement type="set_environment">TOOLDIR</requirement>
|
|
9 </requirements>
|
|
10
|
|
11 <command interpreter="perl">
|
|
12 ComMet_wrapper.pl TOOLDIR $intype.mapper
|
|
13
|
|
14 #if $intype.mapper=="bsf-call"
|
|
15 $in1 $in2
|
|
16 #else if $intype.mapper=="commet"
|
|
17 $in
|
|
18 #else
|
|
19
|
|
20 #end if
|
|
21
|
|
22 $outdmc $outdmr
|
|
23 </command>
|
|
24
|
|
25 <inputs>
|
|
26 <conditional name="intype">
|
|
27 <param name="mapper" type="select" label="input type">
|
|
28 <option value="bsf-call">bsf-call</option>
|
|
29 <option value="commet">commet</option>
|
|
30 </param>
|
|
31 <when value="bsf-call">
|
|
32 <param name="in1" type="data" format="tabular" label="bsf-call file for sample 1"/>
|
|
33 <param name="in2" type="data" format="tabular" label="bsf-call file for sample 2"/>
|
|
34 </when>
|
|
35 <when value="commet">
|
|
36 <param name="in" type="data" format="tabular" label="commet input file"/>
|
|
37 </when>
|
|
38 </conditional>
|
|
39
|
|
40 </inputs>
|
|
41
|
|
42 <outputs>
|
|
43 <data name="outdmc" format="tabular" label="${tool.name} on ${on_string}: differential methylation at individual cytosine sites"/>
|
|
44 <data name="outdmr" format="tabular" label="${tool.name} on ${on_string}: differentially methylated regions"/>
|
|
45 </outputs>
|
|
46
|
|
47 <help>
|
|
48 **ComMet**
|
|
49
|
|
50 Detection of differentially methylated regions from bisulfite-seq mapping data
|
|
51
|
|
52 ------
|
|
53
|
|
54 **Input format**
|
|
55
|
|
56 Let us consider that we detect differentially methylated regions by comparing sample1 and sample2.
|
|
57 Inputs are a pair of two files, each of which contain bisulfite-seq mapping data obtained from sample1 or sample2.
|
|
58 Each file should be in the format supported by the bsf-call tool::
|
|
59
|
|
60 Col.| Description
|
|
61 ----+--------------------------------------
|
|
62 1 | chromosome label (e.g. chr1)
|
|
63 2 | genomic position (0-based)
|
|
64 3 | strand (+,-)
|
|
65 4 | mC context (CG, CHG, CHH)
|
|
66 5 | mC rate (float)
|
|
67 6 | read coverage
|
|
68
|
|
69 Alternatively, you can use one input file, which contains bisulfite-seq mapping data for both samples (commet format)::
|
|
70
|
|
71 Col.| Description
|
|
72 ----+--------------------------------------
|
|
73 1 | chromosome name
|
|
74 2 | 0-based genomic position
|
|
75 3 | number of reads supporting mC in sample1
|
|
76 4 | number of reads not supporting mC in sample1
|
|
77 5 | number of reads supporting mC in sample2
|
|
78 6 | number of reads not supporting mC in sample2
|
|
79
|
|
80 reads supporting mC = C-C matches
|
|
81 reads not supporting mC = otherwise
|
|
82
|
|
83 Make sure chromosome names and genomic positions are sorted by "sort -k1,1 -k2,2n".
|
|
84
|
|
85 Note that input files do not contain strand information.
|
|
86 Normally, you should integrate both strands by summing the read counts at two neighbor CpGs,
|
|
87 i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand.
|
|
88 Alternatively, if you are interested in strand-specific DMRs, you can prepare two input files
|
|
89 for plus and minus strands, and apply them to ComMet separately.
|
|
90
|
|
91
|
|
92 ------
|
|
93
|
|
94 **Output format**
|
|
95
|
|
96 Output1 contains information of differential methylation at individual cytosine sites::
|
|
97
|
|
98 Col.| Description
|
|
99 ----+--------------------------------------
|
|
100 1 | chromosome name
|
|
101 2 | 0-based genomic position
|
|
102 3 | mC ratio in sample1
|
|
103 4 | mC ratio in sample2
|
|
104 5 | prob. for hypermethylation (UP) in sample1 against sample2
|
|
105 6 | prob. for hypomethylation (DOWN) in sample1 against sample2
|
|
106 7 | prob. for no methylation change (NoCh) between sample1 and sample2
|
|
107
|
|
108 Output2 contains information of detected DMRs::
|
|
109
|
|
110 Col.| Description
|
|
111 ----+--------------------------------------
|
|
112 1 | chromosome name
|
|
113 2 | 0-based genomic start position
|
|
114 3 | 0-based genomic stop position
|
|
115 4 | direction of differential methylation (UP/DOWN) comparing sample1 to sample2
|
|
116 5 | log-likelihood ratio score
|
|
117 6 | log-likelihood ratio score divided by DMR length
|
|
118
|
|
119 Make sure output1 and output2 are used properly considering the purpose of your study.
|
|
120 You should use output1 if you are interested only in differential methylation at
|
|
121 individual cytosine sites (Note that it is the purpose of most existing packages for
|
|
122 bisulfite sequencing data analysis developed by other groups).
|
|
123 ComMet is mainly designed for DMR detection, i.e. determining precise boundaries of
|
|
124 regional differential methylation, even if DMRs include some cytosine sites whose
|
|
125 observed methylation changes are relatively weak due to limited sequencing depth.
|
|
126 Such an analysis is useful for identifying biologically important DMRs such as
|
|
127 cis regulatory elements; output2 is suitable for this purpose.
|
|
128
|
|
129 ------
|
|
130
|
|
131 **FAQ**
|
|
132
|
|
133 \Q. What is the meaning of the error "distance between neighbor CpGs must not be less than 2"?
|
|
134 ::
|
|
135
|
|
136 A.
|
|
137 Your input file contains invalid genomic positions.
|
|
138 By definition of CpG, the base next to C must be G, and therefore two neighbor CpGs should be
|
|
139 separated by at least two bases. Your input file may violate this rule for several reasons.
|
|
140 First, the input file may contain two neighbor CpGs from different strands,
|
|
141 i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand.
|
|
142 See the "Input format" section above for this issue.
|
|
143 Second, the input file may contain cytosines in non-CpG context; just remove them.
|
|
144
|
|
145
|
|
146 \Q. The read counts in the example input file are decimals rather than integers. Why?
|
|
147 ::
|
|
148
|
|
149 A.
|
|
150 Either decimals or integers can be used for read counts in input files.
|
|
151 The reason that the example input file contains decimals is that some alignment tools produce
|
|
152 probability-weighted read counts. Of course, you can use your favorite aligners for preparing
|
|
153 input files that may contain integers only.
|
|
154
|
|
155
|
|
156 \Q. Can ComMet compute statistical significance (p-values) rather than likelihood ratio scores?
|
|
157 ::
|
|
158
|
|
159 A.
|
|
160 No. But we are planning to address this issue in the next version of ComMet.
|
|
161
|
|
162 ------
|
|
163
|
|
164 **Contact**
|
|
165
|
|
166 Yutaka Saito
|
|
167
|
|
168 yutaka.saito AT aist.go.jp
|
|
169 </help>
|
|
170
|
|
171 <citations>
|
|
172 <citation type="doi">10.1093/nar/gkt1373</citation>
|
|
173 </citations>
|
|
174
|
|
175 </tool>
|
|
176
|