What it does
The following is a brief description of all options to control the Bismark methylation extractor. The script reads in a bisulfite read alignment results file produced by the Bismark bisulfite mapper and extracts the methylation information for individual cytosines. This information is found in the methylation call field which can contain the following characters:
- X = for methylated C in CHG context (was protected)
- x = for not methylated C CHG (was converted)
- H = for methylated C in CHH context (was protected)
- h = for not methylated C in CHH context (was converted)
- Z = for methylated C in CpG context (was protected)
- z = for not methylated C in CpG context (was converted)
- . = for any bases not involving cytosines
The methylation extractor outputs result files for cytosines in CpG, CHG and CHH context (this distinction is actually already made in Bismark itself). As the methylation information for every C analysed can produce files which easily have tens or even hundreds of millions of lines, file sizes can become very large and more difficult to handle. The C methylation info additionally splits cytosine methylation calls up into one of the four possible strands a given bisulfite read aligned against:
- OT = original top strand
- CTOT = complementary to original top strand
- OB = original bottom strand
- CTOB = complementary to original bottom strand
Thus, by default twelve individual output files are being generated per input file (unless --comprehensive is specified, see below). The output files can be imported into a genome viewer, such as SeqMonk, and re-combined into a single data group if desired (in fact unless the bisulfite reads were generated preserving directionality it doesn't make any sense to look at the data in a strand-specific manner). Strand-specific output files can optionally be skipped, in which case only three output files for CpG, CHG or CHH context will be generated. For both the strand-specific and comprehensive outputs there is also the option to merge both non-CpG contexts (CHG and CHH) into one single non-CpG context.
It is developed by Krueger F and Andrews SR. at the Babraham Institute. Krueger F, Andrews SR. (2011) Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics, 27, 1571-2.
Bismark settings
All of the options have a default value. You can change any of them. If any Bismark function is missing please contact the tool author or your Galaxy admin.
Outputs
The output files are in the following format (tab delimited):
Column Description -------- -------------------------------------------------------- 1 seq-ID 2 strand 3 chromosome 4 position 5 methylation call * Methylated cytosines receive a '+' orientation, * Unmethylated cytosines receive a '-' orientation.
OPTIONS
Input:
-s/--single-end Input file(s) are Bismark result file(s) generated from single-end read data. Specifying either --single-end or --paired-end is mandatory. -p/--paired-end Input file(s) are Bismark result file(s) generated from paired-end read data. Specifying either --paired-end or --single-end is mandatory. --no_overlap For paired-end reads it is theoretically possible that read_1 and read_2 overlap. This option avoids scoring overlapping methylation calls twice. Whilst this removes a bias towards more methylation calls towards the center of sequenced fragments it can de facto remove a good proportion of the data. --ignore INT Ignore the first INT bp at the 5' end of each read when processing the methylation call string. This can remove e.g. a restriction enzyme site at the start of each read.
Output:
--comprehensive Specifying this option will merge all four possible strand-specific methylation info into context-dependent output files. The default contexts are: - CpG context - CHG context - CHH context --merge_non_CpG This will produce two output files (in --comprehensive mode) or eight strand-specific output files (default) for Cs in - CpG context - non-CpG context --report Prints out a short methylation summary as well as the paramaters used to run this script.