Galaxy | Tool Preview

MOABS (version 1.3.4.6+galaxy1)
Cached FASTA
Select genome from the list
Group1: fastq files
Group1: fastq files 0
Group2: fastq files
Group2: fastq files 0
Advanced options for BSMAPs
Advanced options for BSMAP 0
Advanced options for MCALLs
Advanced options for MCALL 0
Advanced options for MCOMPs
Advanced options for MCOMP 0

MOABS: MOdel based Analysis of Bisulfite Sequencing data

MOABS is a comprehensive, accurate and efficient solution for analysis of large scale base-resolution DNA methylation data, bisulfite sequencing or single molecule direct sequencing.

MOABS seamlessly integrates alignment, methylation calling, identification of hypomethylation for one sample and differential methylation for multiple samples, and other downstream analysis.

For more information, check https://github.com/sunnyisgalaxy/moabs.


Input files

MOABS needs to input Bisulfite sequencing reads in two groups of interest, e.g. KO vs WT. Each group of reads may have combined sequencing library, i.e. single-end reads and/or paired-end reads. Multiple replicates can be specified in each group.

Outputs

Five output files can be selected to report, namely

  1. DMC site file - the major DMC result file
  2. DMR region file - the major DMR result file
  3. Comparison file between two groups - the intermediate comparion result
  4. BAM files - intermediate BAM files
  5. Methylation BED files - intermediate methylation BED files

MOABS detects differentially methylated cytosines (DMCs) and differentially methylated regions (DMRs) using the input BS-Seq reads. The output DMC and DMR file are tab-delimited text files (not strictly a BED format), representing DMCs and DMRs.

A DMC site file has 15 columns as below.

chrom<TAB>start<TAB>end<TAB>totalC_0<TAB>nominalRatio_0<TAB>ratioCI_0<TAB>totalC_1<TAB>nominalRatio_1<TAB>ratioCI_1<TAB>nominalDif_1-0<TAB>credibleDif_1-0<TAB>difCI_1-0<TAB>p_sim_1_v_0<TAB>p_fet_1_v_0<TAB>class

  1. chrom - The chromosome of the CpG site.
  2. start - The start genomic locus of the CpG site.
  3. end - The end genomic locus of the CpG site.
  4. totalC_0 - The total number of CpG read coverage in group 0.
  5. nominalRatio_0 - The nominal methylation ratio of the CpG in group 0.
  6. ratioCI_0 - The confidence interval (CI) of the nominal methylation ratio at the CpG site in group 0.
  7. totalC_1 - The total number of CpG read coverage in group 1.
  8. nominalRatio_1 - The nominal methylation ratio of the CpG in group 1.
  9. ratioCI_1 - The confidence interval (CI) of the nominal methylation ratio at the CpG site in group 1.
  10. nominalDif_1-0 - The nominal methylation difference between the group 1 and the group 0.
  11. credibleDif_1-0 - The credible methylation difference (CDIF) between the group 1 and the group 0.
  12. difCI_1-0 - The difference of ratio CIs between the group 1 and the group 0.
  13. p_sim_1_v_0 - P-value according to the similarity probablities.
  14. p_fet_1_v_0 - P-value according to the Fisher exact test.
  15. class - 5-state class labels by methylation differences and p-values.

For example, CpGs in the DMC file are recorded in the following format.

Example:

chr11  87      89      6       1       0.652,1 10      0.2     0,0.47  -0.8    -0.327  -0.857,-0.327   0.000318        0.00699 strongHypo
chr11  152     154     4       1       0.549,1 9       0.333   0.0575,0.609    -0.667  -0.113  -0.761,-0.113   0.0035  0.0699  hypo
chr11  258     260     3       1       0.473,1 13      0.231   0,0.466 -0.769  -0.165  -0.809,-0.165   0.00236 0.0357  hypo
chr11  331     333     6       0.667   0.341,0.992     22      0.227   0.0499,0.405    -0.439  -0.058  -0.66,-0.058    0.00636 0.0638  hypo
chr11  630     632     3       1       0.473,1 5       0       0,0.393 -1      -0.271  -1,-0.271       0.000954        0.0179  strongHypo
chr11  638     640     3       0.667   0.249,1 5       0.6     0.264,0.936     -0.0667 -0      -0.461,0.418    0.0286  1       hypo
chr11  641     643     4       1       0.549,1 5       0.4     0.064,0.736     -0.6    -0.0122 -0.748,-0.0122  0.00715 0.167   hypo
chr11  645     647     4       1       0.549,1 8       1       0.717,1 0       0       -0.183,0.374    0.067   1       hypo
chr11  666     668     4       1       0.549,1 9       0.667   0.391,0.942     -0.333  -0      -0.506,0.15     0.021   0.497   hypo
chr11  685     687     3       1       0.473,1 8       0       0,0.283 -1      -0.342  -1,-0.342       0.000364        0.00606 strongHypo

A DMR result file has 12 columns as below.

chrom<TAB>start<TAB>end<TAB>meanRatio_0<TAB>totalC_0<TAB>cSites_0<TAB>meanRatio_1<TAB>totalC_1<TAB>cSites_1<TAB>methDif_1-0<TAB>p_1_v_0<TAB>class_1_v_0

  1. chrom - The chromosome of the region.
  2. start - The start genomic locus of the region.
  3. end - The end genomic locus of the region.
  4. meanRatio_0 - Mean methylation ratio of the region in group 0.
  5. totalC_0 - Total cytosine coverage of the region in group 0.
  6. cSites_0 - The number of CpG sites of the region in group 0.
  7. meanRatio_1 - Mean methylation ratio of the region in group 1.
  8. totalC_1 - Total cytosine coverage of the region in group 1.
  9. cSites_1 - The number of CpG sites of the region in group 1.
  10. methDif_1-0 - Average methylation difference of the region between group 1 and group 0.
  11. p_1_v_0 - P-value from Fisher exact test of the region between group 1 and group 0.
  12. class_1_v_0 - 5-state class labels for the DMR.

For example, four DMRs are identified in the following format.

Example:

chr11  2529    2582    0.787   77      8       0.226   123     8       -0.204  3.79e-17        strongHypo
chr11  2976    3065    0.833   222     3       0.241   116     3       -0.339  1.69e-22        strongHypo
chr11  4327    5335    0.722   286     29      0.143   563     29      -0.201  2.3e-48 strongHypo
chr11  6544    7008    0.955   86      17      0.126   97      17      -0.223  6.5e-34 strongHypo

The intermediate comparison file summarizes methylation ratio comparison results on CpG sites. It has 19 columns as below.

  1. chrom - The chromosome of the GpG site.
  2. start - The start position of the site.
  3. end - The end position of the site.
  4. single - The next two columns are attributes for the single position.
  5. totalC_0 - Total number of Cs in the first group.
  6. nominalRatio_0 - Nominal methylation ratio in the first group.
  7. ratioCI_0 - The confidence interval of the methylation ratio in the first group.
  8. single - The next two columns are attributes for the single position.
  9. totalC_1 - Total number of Cs in the second group.
  10. nominalRatio_1 - Nominal methylation ratio in the second group.
  11. ratioCI_1 - The confidence interval of the methylation ratio in the second group.
  12. pair - The next three columns are attributes for pairs of groups.
  13. nominalDif_1-0 - Nominal difference of methylation ratio between group 1 and group 0.
  14. credibleDif_1-0 - Credible methylation difference between group 1 and group 0.
  15. difCI_1-0 - Difference of confidence intervals between group 1 and group 0.
  16. p_sim - The next column is the simulation p-value.
  17. p_sim_1_v_0 - Simulation p-value between group 1 and group 0.
  18. p_fet - The next column is the FET p-value.
  19. p_fet_1_v_0 - FET p-value between group 1 and group 0.

The comparison result file can be reused for DMR calling.


BAM files are intermediate mapping results of input reads to the referene genome. These BAM files can be reused in downstream methylation analysis.


Methylation calling BED files are intermediate methylation calling results of Cs in two groups of input reads. These methyation calling results can be easily reused in downstream DMR calling and visualization. The BED file has 15 columns as below.

  1. chrom - The chromosome of the site.
  2. start - The start position of the site.
  3. end - The end position of the site.
  4. ratio - Methylation ratio in the site
  5. totalC - Total number of reads in current Cs.
  6. methC - Methylated Cs.
  7. strand - The strand information for prevous three columns.
  8. next - The next base.
  9. Plus - Next two columns are for forward strand.
  10. totalC - Total number of Cs.
  11. methC - Methylated Cs.
  12. Minus - Next two columns are for reverse strand.
  13. totalC - Total number of Cs.
  14. methC - Methylated Cs.
  15. localSeq - Local sequences.