comparison ComMet_wrapper.xml @ 0:dfdfbdd47b32 default tip

migrate from GitHub
author yutaka-saito
date Sun, 19 Apr 2015 20:55:17 +0900
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:dfdfbdd47b32
1 <tool id="ComMet" name="ComMet" version="1.0.0">
2 <description>Detection of differentially methylated regions from bisulfite-seq mapping data</description>
3 <!--
4 <version_command></version_command>
5 -->
6
7 <requirements>
8 <requirement type="set_environment">TOOLDIR</requirement>
9 </requirements>
10
11 <command interpreter="perl">
12 ComMet_wrapper.pl TOOLDIR $intype.mapper
13
14 #if $intype.mapper=="bsf-call"
15 $in1 $in2
16 #else if $intype.mapper=="commet"
17 $in
18 #else
19
20 #end if
21
22 $outdmc $outdmr
23 </command>
24
25 <inputs>
26 <conditional name="intype">
27 <param name="mapper" type="select" label="input type">
28 <option value="bsf-call">bsf-call</option>
29 <option value="commet">commet</option>
30 </param>
31 <when value="bsf-call">
32 <param name="in1" type="data" format="tabular" label="bsf-call file for sample 1"/>
33 <param name="in2" type="data" format="tabular" label="bsf-call file for sample 2"/>
34 </when>
35 <when value="commet">
36 <param name="in" type="data" format="tabular" label="commet input file"/>
37 </when>
38 </conditional>
39
40 </inputs>
41
42 <outputs>
43 <data name="outdmc" format="tabular" label="${tool.name} on ${on_string}: differential methylation at individual cytosine sites"/>
44 <data name="outdmr" format="tabular" label="${tool.name} on ${on_string}: differentially methylated regions"/>
45 </outputs>
46
47 <help>
48 **ComMet**
49
50 Detection of differentially methylated regions from bisulfite-seq mapping data
51
52 ------
53
54 **Input format**
55
56 Let us consider that we detect differentially methylated regions by comparing sample1 and sample2.
57 Inputs are a pair of two files, each of which contain bisulfite-seq mapping data obtained from sample1 or sample2.
58 Each file should be in the format supported by the bsf-call tool::
59
60 Col.| Description
61 ----+--------------------------------------
62 1 | chromosome label (e.g. chr1)
63 2 | genomic position (0-based)
64 3 | strand (+,-)
65 4 | mC context (CG, CHG, CHH)
66 5 | mC rate (float)
67 6 | read coverage
68
69 Alternatively, you can use one input file, which contains bisulfite-seq mapping data for both samples (commet format)::
70
71 Col.| Description
72 ----+--------------------------------------
73 1 | chromosome name
74 2 | 0-based genomic position
75 3 | number of reads supporting mC in sample1
76 4 | number of reads not supporting mC in sample1
77 5 | number of reads supporting mC in sample2
78 6 | number of reads not supporting mC in sample2
79
80 reads supporting mC = C-C matches
81 reads not supporting mC = otherwise
82
83 Make sure chromosome names and genomic positions are sorted by "sort -k1,1 -k2,2n".
84
85 Note that input files do not contain strand information.
86 Normally, you should integrate both strands by summing the read counts at two neighbor CpGs,
87 i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand.
88 Alternatively, if you are interested in strand-specific DMRs, you can prepare two input files
89 for plus and minus strands, and apply them to ComMet separately.
90
91
92 ------
93
94 **Output format**
95
96 Output1 contains information of differential methylation at individual cytosine sites::
97
98 Col.| Description
99 ----+--------------------------------------
100 1 | chromosome name
101 2 | 0-based genomic position
102 3 | mC ratio in sample1
103 4 | mC ratio in sample2
104 5 | prob. for hypermethylation (UP) in sample1 against sample2
105 6 | prob. for hypomethylation (DOWN) in sample1 against sample2
106 7 | prob. for no methylation change (NoCh) between sample1 and sample2
107
108 Output2 contains information of detected DMRs::
109
110 Col.| Description
111 ----+--------------------------------------
112 1 | chromosome name
113 2 | 0-based genomic start position
114 3 | 0-based genomic stop position
115 4 | direction of differential methylation (UP/DOWN) comparing sample1 to sample2
116 5 | log-likelihood ratio score
117 6 | log-likelihood ratio score divided by DMR length
118
119 Make sure output1 and output2 are used properly considering the purpose of your study.
120 You should use output1 if you are interested only in differential methylation at
121 individual cytosine sites (Note that it is the purpose of most existing packages for
122 bisulfite sequencing data analysis developed by other groups).
123 ComMet is mainly designed for DMR detection, i.e. determining precise boundaries of
124 regional differential methylation, even if DMRs include some cytosine sites whose
125 observed methylation changes are relatively weak due to limited sequencing depth.
126 Such an analysis is useful for identifying biologically important DMRs such as
127 cis regulatory elements; output2 is suitable for this purpose.
128
129 ------
130
131 **FAQ**
132
133 \Q. What is the meaning of the error "distance between neighbor CpGs must not be less than 2"?
134 ::
135
136 A.
137 Your input file contains invalid genomic positions.
138 By definition of CpG, the base next to C must be G, and therefore two neighbor CpGs should be
139 separated by at least two bases. Your input file may violate this rule for several reasons.
140 First, the input file may contain two neighbor CpGs from different strands,
141 i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand.
142 See the "Input format" section above for this issue.
143 Second, the input file may contain cytosines in non-CpG context; just remove them.
144
145
146 \Q. The read counts in the example input file are decimals rather than integers. Why?
147 ::
148
149 A.
150 Either decimals or integers can be used for read counts in input files.
151 The reason that the example input file contains decimals is that some alignment tools produce
152 probability-weighted read counts. Of course, you can use your favorite aligners for preparing
153 input files that may contain integers only.
154
155
156 \Q. Can ComMet compute statistical significance (p-values) rather than likelihood ratio scores?
157 ::
158
159 A.
160 No. But we are planning to address this issue in the next version of ComMet.
161
162 ------
163
164 **Contact**
165
166 Yutaka Saito
167
168 yutaka.saito AT aist.go.jp
169 </help>
170
171 <citations>
172 <citation type="doi">10.1093/nar/gkt1373</citation>
173 </citations>
174
175 </tool>
176