Mercurial > repos > yutaka-saito > commet
comparison ComMet_wrapper.xml @ 0:dfdfbdd47b32 default tip
migrate from GitHub
author | yutaka-saito |
---|---|
date | Sun, 19 Apr 2015 20:55:17 +0900 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:dfdfbdd47b32 |
---|---|
1 <tool id="ComMet" name="ComMet" version="1.0.0"> | |
2 <description>Detection of differentially methylated regions from bisulfite-seq mapping data</description> | |
3 <!-- | |
4 <version_command></version_command> | |
5 --> | |
6 | |
7 <requirements> | |
8 <requirement type="set_environment">TOOLDIR</requirement> | |
9 </requirements> | |
10 | |
11 <command interpreter="perl"> | |
12 ComMet_wrapper.pl TOOLDIR $intype.mapper | |
13 | |
14 #if $intype.mapper=="bsf-call" | |
15 $in1 $in2 | |
16 #else if $intype.mapper=="commet" | |
17 $in | |
18 #else | |
19 | |
20 #end if | |
21 | |
22 $outdmc $outdmr | |
23 </command> | |
24 | |
25 <inputs> | |
26 <conditional name="intype"> | |
27 <param name="mapper" type="select" label="input type"> | |
28 <option value="bsf-call">bsf-call</option> | |
29 <option value="commet">commet</option> | |
30 </param> | |
31 <when value="bsf-call"> | |
32 <param name="in1" type="data" format="tabular" label="bsf-call file for sample 1"/> | |
33 <param name="in2" type="data" format="tabular" label="bsf-call file for sample 2"/> | |
34 </when> | |
35 <when value="commet"> | |
36 <param name="in" type="data" format="tabular" label="commet input file"/> | |
37 </when> | |
38 </conditional> | |
39 | |
40 </inputs> | |
41 | |
42 <outputs> | |
43 <data name="outdmc" format="tabular" label="${tool.name} on ${on_string}: differential methylation at individual cytosine sites"/> | |
44 <data name="outdmr" format="tabular" label="${tool.name} on ${on_string}: differentially methylated regions"/> | |
45 </outputs> | |
46 | |
47 <help> | |
48 **ComMet** | |
49 | |
50 Detection of differentially methylated regions from bisulfite-seq mapping data | |
51 | |
52 ------ | |
53 | |
54 **Input format** | |
55 | |
56 Let us consider that we detect differentially methylated regions by comparing sample1 and sample2. | |
57 Inputs are a pair of two files, each of which contain bisulfite-seq mapping data obtained from sample1 or sample2. | |
58 Each file should be in the format supported by the bsf-call tool:: | |
59 | |
60 Col.| Description | |
61 ----+-------------------------------------- | |
62 1 | chromosome label (e.g. chr1) | |
63 2 | genomic position (0-based) | |
64 3 | strand (+,-) | |
65 4 | mC context (CG, CHG, CHH) | |
66 5 | mC rate (float) | |
67 6 | read coverage | |
68 | |
69 Alternatively, you can use one input file, which contains bisulfite-seq mapping data for both samples (commet format):: | |
70 | |
71 Col.| Description | |
72 ----+-------------------------------------- | |
73 1 | chromosome name | |
74 2 | 0-based genomic position | |
75 3 | number of reads supporting mC in sample1 | |
76 4 | number of reads not supporting mC in sample1 | |
77 5 | number of reads supporting mC in sample2 | |
78 6 | number of reads not supporting mC in sample2 | |
79 | |
80 reads supporting mC = C-C matches | |
81 reads not supporting mC = otherwise | |
82 | |
83 Make sure chromosome names and genomic positions are sorted by "sort -k1,1 -k2,2n". | |
84 | |
85 Note that input files do not contain strand information. | |
86 Normally, you should integrate both strands by summing the read counts at two neighbor CpGs, | |
87 i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand. | |
88 Alternatively, if you are interested in strand-specific DMRs, you can prepare two input files | |
89 for plus and minus strands, and apply them to ComMet separately. | |
90 | |
91 | |
92 ------ | |
93 | |
94 **Output format** | |
95 | |
96 Output1 contains information of differential methylation at individual cytosine sites:: | |
97 | |
98 Col.| Description | |
99 ----+-------------------------------------- | |
100 1 | chromosome name | |
101 2 | 0-based genomic position | |
102 3 | mC ratio in sample1 | |
103 4 | mC ratio in sample2 | |
104 5 | prob. for hypermethylation (UP) in sample1 against sample2 | |
105 6 | prob. for hypomethylation (DOWN) in sample1 against sample2 | |
106 7 | prob. for no methylation change (NoCh) between sample1 and sample2 | |
107 | |
108 Output2 contains information of detected DMRs:: | |
109 | |
110 Col.| Description | |
111 ----+-------------------------------------- | |
112 1 | chromosome name | |
113 2 | 0-based genomic start position | |
114 3 | 0-based genomic stop position | |
115 4 | direction of differential methylation (UP/DOWN) comparing sample1 to sample2 | |
116 5 | log-likelihood ratio score | |
117 6 | log-likelihood ratio score divided by DMR length | |
118 | |
119 Make sure output1 and output2 are used properly considering the purpose of your study. | |
120 You should use output1 if you are interested only in differential methylation at | |
121 individual cytosine sites (Note that it is the purpose of most existing packages for | |
122 bisulfite sequencing data analysis developed by other groups). | |
123 ComMet is mainly designed for DMR detection, i.e. determining precise boundaries of | |
124 regional differential methylation, even if DMRs include some cytosine sites whose | |
125 observed methylation changes are relatively weak due to limited sequencing depth. | |
126 Such an analysis is useful for identifying biologically important DMRs such as | |
127 cis regulatory elements; output2 is suitable for this purpose. | |
128 | |
129 ------ | |
130 | |
131 **FAQ** | |
132 | |
133 \Q. What is the meaning of the error "distance between neighbor CpGs must not be less than 2"? | |
134 :: | |
135 | |
136 A. | |
137 Your input file contains invalid genomic positions. | |
138 By definition of CpG, the base next to C must be G, and therefore two neighbor CpGs should be | |
139 separated by at least two bases. Your input file may violate this rule for several reasons. | |
140 First, the input file may contain two neighbor CpGs from different strands, | |
141 i.e. the 5'-CpG-3' in the plus strand, and the neighboring 3'-GpC-5' in the minus strand. | |
142 See the "Input format" section above for this issue. | |
143 Second, the input file may contain cytosines in non-CpG context; just remove them. | |
144 | |
145 | |
146 \Q. The read counts in the example input file are decimals rather than integers. Why? | |
147 :: | |
148 | |
149 A. | |
150 Either decimals or integers can be used for read counts in input files. | |
151 The reason that the example input file contains decimals is that some alignment tools produce | |
152 probability-weighted read counts. Of course, you can use your favorite aligners for preparing | |
153 input files that may contain integers only. | |
154 | |
155 | |
156 \Q. Can ComMet compute statistical significance (p-values) rather than likelihood ratio scores? | |
157 :: | |
158 | |
159 A. | |
160 No. But we are planning to address this issue in the next version of ComMet. | |
161 | |
162 ------ | |
163 | |
164 **Contact** | |
165 | |
166 Yutaka Saito | |
167 | |
168 yutaka.saito AT aist.go.jp | |
169 </help> | |
170 | |
171 <citations> | |
172 <citation type="doi">10.1093/nar/gkt1373</citation> | |
173 </citations> | |
174 | |
175 </tool> | |
176 |