MOABS: MOdel based Analysis of Bisulfite Sequencing data
MOABS is a comprehensive, accurate and efficient solution for analysis of large scale base-resolution DNA methylation data, bisulfite sequencing or single molecule direct sequencing.
MOABS seamlessly integrates alignment, methylation calling, identification of hypomethylation for one sample and differential methylation for multiple samples, and other downstream analysis.
For more information, check https://github.com/sunnyisgalaxy/moabs.
Input files
MOABS needs to input Bisulfite sequencing reads in two groups of interest, e.g. KO vs WT. Each group of reads may have combined sequencing library, i.e. single-end reads and/or paired-end reads. Multiple replicates can be specified in each group.
Outputs
Five output files can be selected to report, namely
- DMC site file - the major DMC result file
- DMR region file - the major DMR result file
- Comparison file between two groups - the intermediate comparion result
- BAM files - intermediate BAM files
- Methylation BED files - intermediate methylation BED files
MOABS detects differentially methylated cytosines (DMCs) and differentially methylated regions (DMRs) using the input BS-Seq reads. The output DMC and DMR file are tab-delimited text files (not strictly a BED format), representing DMCs and DMRs.
A DMC site file has 15 columns as below.
chrom<TAB>start<TAB>end<TAB>totalC_0<TAB>nominalRatio_0<TAB>ratioCI_0<TAB>totalC_1<TAB>nominalRatio_1<TAB>ratioCI_1<TAB>nominalDif_1-0<TAB>credibleDif_1-0<TAB>difCI_1-0<TAB>p_sim_1_v_0<TAB>p_fet_1_v_0<TAB>class
- chrom - The chromosome of the CpG site.
- start - The start genomic locus of the CpG site.
- end - The end genomic locus of the CpG site.
- totalC_0 - The total number of CpG read coverage in group 0.
- nominalRatio_0 - The nominal methylation ratio of the CpG in group 0.
- ratioCI_0 - The confidence interval (CI) of the nominal methylation ratio at the CpG site in group 0.
- totalC_1 - The total number of CpG read coverage in group 1.
- nominalRatio_1 - The nominal methylation ratio of the CpG in group 1.
- ratioCI_1 - The confidence interval (CI) of the nominal methylation ratio at the CpG site in group 1.
- nominalDif_1-0 - The nominal methylation difference between the group 1 and the group 0.
- credibleDif_1-0 - The credible methylation difference (CDIF) between the group 1 and the group 0.
- difCI_1-0 - The difference of ratio CIs between the group 1 and the group 0.
- p_sim_1_v_0 - P-value according to the similarity probablities.
- p_fet_1_v_0 - P-value according to the Fisher exact test.
- class - 5-state class labels by methylation differences and p-values.
For example, CpGs in the DMC file are recorded in the following format.
Example:
chr11 87 89 6 1 0.652,1 10 0.2 0,0.47 -0.8 -0.327 -0.857,-0.327 0.000318 0.00699 strongHypo chr11 152 154 4 1 0.549,1 9 0.333 0.0575,0.609 -0.667 -0.113 -0.761,-0.113 0.0035 0.0699 hypo chr11 258 260 3 1 0.473,1 13 0.231 0,0.466 -0.769 -0.165 -0.809,-0.165 0.00236 0.0357 hypo chr11 331 333 6 0.667 0.341,0.992 22 0.227 0.0499,0.405 -0.439 -0.058 -0.66,-0.058 0.00636 0.0638 hypo chr11 630 632 3 1 0.473,1 5 0 0,0.393 -1 -0.271 -1,-0.271 0.000954 0.0179 strongHypo chr11 638 640 3 0.667 0.249,1 5 0.6 0.264,0.936 -0.0667 -0 -0.461,0.418 0.0286 1 hypo chr11 641 643 4 1 0.549,1 5 0.4 0.064,0.736 -0.6 -0.0122 -0.748,-0.0122 0.00715 0.167 hypo chr11 645 647 4 1 0.549,1 8 1 0.717,1 0 0 -0.183,0.374 0.067 1 hypo chr11 666 668 4 1 0.549,1 9 0.667 0.391,0.942 -0.333 -0 -0.506,0.15 0.021 0.497 hypo chr11 685 687 3 1 0.473,1 8 0 0,0.283 -1 -0.342 -1,-0.342 0.000364 0.00606 strongHypo
A DMR result file has 12 columns as below.
chrom<TAB>start<TAB>end<TAB>meanRatio_0<TAB>totalC_0<TAB>cSites_0<TAB>meanRatio_1<TAB>totalC_1<TAB>cSites_1<TAB>methDif_1-0<TAB>p_1_v_0<TAB>class_1_v_0
- chrom - The chromosome of the region.
- start - The start genomic locus of the region.
- end - The end genomic locus of the region.
- meanRatio_0 - Mean methylation ratio of the region in group 0.
- totalC_0 - Total cytosine coverage of the region in group 0.
- cSites_0 - The number of CpG sites of the region in group 0.
- meanRatio_1 - Mean methylation ratio of the region in group 1.
- totalC_1 - Total cytosine coverage of the region in group 1.
- cSites_1 - The number of CpG sites of the region in group 1.
- methDif_1-0 - Average methylation difference of the region between group 1 and group 0.
- p_1_v_0 - P-value from Fisher exact test of the region between group 1 and group 0.
- class_1_v_0 - 5-state class labels for the DMR.
For example, four DMRs are identified in the following format.
Example:
chr11 2529 2582 0.787 77 8 0.226 123 8 -0.204 3.79e-17 strongHypo chr11 2976 3065 0.833 222 3 0.241 116 3 -0.339 1.69e-22 strongHypo chr11 4327 5335 0.722 286 29 0.143 563 29 -0.201 2.3e-48 strongHypo chr11 6544 7008 0.955 86 17 0.126 97 17 -0.223 6.5e-34 strongHypo
The intermediate comparison file summarizes methylation ratio comparison results on CpG sites. It has 19 columns as below.
- chrom - The chromosome of the GpG site.
- start - The start position of the site.
- end - The end position of the site.
- single - The next two columns are attributes for the single position.
- totalC_0 - Total number of Cs in the first group.
- nominalRatio_0 - Nominal methylation ratio in the first group.
- ratioCI_0 - The confidence interval of the methylation ratio in the first group.
- single - The next two columns are attributes for the single position.
- totalC_1 - Total number of Cs in the second group.
- nominalRatio_1 - Nominal methylation ratio in the second group.
- ratioCI_1 - The confidence interval of the methylation ratio in the second group.
- pair - The next three columns are attributes for pairs of groups.
- nominalDif_1-0 - Nominal difference of methylation ratio between group 1 and group 0.
- credibleDif_1-0 - Credible methylation difference between group 1 and group 0.
- difCI_1-0 - Difference of confidence intervals between group 1 and group 0.
- p_sim - The next column is the simulation p-value.
- p_sim_1_v_0 - Simulation p-value between group 1 and group 0.
- p_fet - The next column is the FET p-value.
- p_fet_1_v_0 - FET p-value between group 1 and group 0.
The comparison result file can be reused for DMR calling.
BAM files are intermediate mapping results of input reads to the referene genome. These BAM files can be reused in downstream methylation analysis.
Methylation calling BED files are intermediate methylation calling results of Cs in two groups of input reads. These methyation calling results can be easily reused in downstream DMR calling and visualization. The BED file has 15 columns as below.
- chrom - The chromosome of the site.
- start - The start position of the site.
- end - The end position of the site.
- ratio - Methylation ratio in the site
- totalC - Total number of reads in current Cs.
- methC - Methylated Cs.
- strand - The strand information for prevous three columns.
- next - The next base.
- Plus - Next two columns are for forward strand.
- totalC - Total number of Cs.
- methC - Methylated Cs.
- Minus - Next two columns are for reverse strand.
- totalC - Total number of Cs.
- methC - Methylated Cs.
- localSeq - Local sequences.