comparison BSseeker2/README.md @ 1:8b26adf64adc draft default tip

V2.0.5
author weilong-guo
date Tue, 05 Nov 2013 01:55:39 -0500
parents e6df770c0e58
children
comparison
equal deleted inserted replaced
0:e6df770c0e58 1:8b26adf64adc
46 3. System requirements 46 3. System requirements
47 ============ 47 ============
48 48
49 * Linux or Mac OS platform 49 * Linux or Mac OS platform
50 * One of the following Aligner 50 * One of the following Aligner
51 - [bowtie](http://bowtie-bio.sourceforge.net/) 51 - [bowtie](http://bowtie-bio.sourceforge.net/) (fast)
52 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/) (Recommend) 52 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/) (Default)
53 - [soap](http://soap.genomics.org.cn/) 53 - [soap](http://soap.genomics.org.cn/)
54 - [rmap](http://www.cmb.usc.edu/people/andrewds/rmap/)
54 * [Python](http://www.python.org/download/) (Version 2.6 +) 55 * [Python](http://www.python.org/download/) (Version 2.6 +)
55 56
56 (It is normally pre-installed in Linux. Type " python -V" to see the installed version.) 57 (It is normally pre-installed in Linux. Type " python -V" to see the installed version.)
57 58
58 * [pysam](http://code.google.com/p/pysam/) package is needed. 59 * [pysam](http://code.google.com/p/pysam/) package is needed.
72 73
73 ##Usage : 74 ##Usage :
74 75
75 $ python FilterReads.py 76 $ python FilterReads.py
76 Usage: FilterReads.py -i <input> -o <output> [-k] 77 Usage: FilterReads.py -i <input> -o <output> [-k]
77 Author : Guo, Weilong; guoweilong@gmail.com; 2012-11-10 78 Author : Guo, Weilong; 2012-11-10
78 Unique reads for qseq/fastq/fasta/sequencce, and filter 79 Unique reads for qseq/fastq/fasta/sequencce, and filter
79 low quality file in qseq file. 80 low quality file in qseq file.
80 81
81 Options: 82 Options:
82 -h, --help show this help message and exit 83 -h, --help show this help message and exit
103 104
104 Options: 105 Options:
105 -h, --help show this help message and exit 106 -h, --help show this help message and exit
106 -f FILE, --file=FILE Input your reference genome file (fasta) 107 -f FILE, --file=FILE Input your reference genome file (fasta)
107 --aligner=ALIGNER Aligner program to perform the analysis: bowtie, 108 --aligner=ALIGNER Aligner program to perform the analysis: bowtie,
108 bowtie2, soap [Default: bowtie2] 109 bowtie2, soap, rmap [Default: bowtie]
109 -p PATH, --path=PATH Path to the aligner program. Defaults: 110 -p PATH, --path=PATH Path to the aligner program. Detected:
110 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8 111 bowtie: ~/install/bowtie
111 bowtie2: 112 bowtie2: ~/install/bowtie2
112 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7 113 rmap: ~/install/rmap_/bin
113 soap: /u/home/mcdb/weilong/install/soap2.21release/ 114 soap: ~/install/soap/
114 -d DBPATH, --db=DBPATH 115 -d DBPATH, --db=DBPATH
115 Path to the reference genome library (generated in 116 Path to the reference genome library (generated in
116 preprocessing genome) [Default: /u/home/mcdb/weilong/i 117 preprocessing genome) [Default: ~/install
117 nstall/BSseeker2/bs_utils/reference_genomes] 118 /BSseeker2/bs_utils/reference_genomes]
118 -v, --version show version of BS-Seeker2 119 -v, --version show version of BS-Seeker2
119 120
120 Reduced Representation Bisulfite Sequencing Options: 121 Reduced Representation Bisulfite Sequencing Options:
121 Use this options with conjuction of -r [--rrbs] 122 Use this options with conjuction of -r [--rrbs]
122 123
123 -r, --rrbs Build index specially for Reduced Representation 124 -r, --rrbs Build index specially for Reduced Representation
124 Bisulfite Sequencing experiments. Genome other than 125 Bisulfite Sequencing experiments. Genome other than
125 certain fragments will be masked. [Default: False] 126 certain fragments will be masked. [Default: False]
126 -l LOW_BOUND, --low=LOW_BOUND 127 -l LOW_BOUND, --low=LOW_BOUND
127 lower bound of fragment length (excluding recognition 128 lower bound of fragment length (excluding recognition
128 sequence such as C-CGG) [Default: 40] 129 sequence such as C-CGG) [Default: 20]
129 -u UP_BOUND, --up=UP_BOUND 130 -u UP_BOUND, --up=UP_BOUND
130 upper bound of fragment length (excluding recognition 131 upper bound of fragment length (excluding recognition
131 sequence such as C-CGG ends) [Default: 500] 132 sequence such as C-CGG ends) [Default: 500]
132 -c CUT_FORMAT, --cut-site=CUT_FORMAT 133 -c CUT_FORMAT, --cut-site=CUT_FORMAT
133 Cut sites of restriction enzyme. Ex: MspI(C-CGG), 134 Cut sites of restriction enzyme. Ex: MspI(C-CGG),
134 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG). 135 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
135 [Default: C-CGG] 136 [Default: C-CGG]
136 137
138
137 ##Example 139 ##Example
138 140
139 * Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH 141 * Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH
140 142
141 python bs_seeker2-build.py -f genome.fa --aligner=bowtie 143 python bs_seeker2-build.py -f genome.fa --aligner=bowtie
148 150
149 python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2 151 python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2
150 152
151 * Build genome index for RRBS library for double-enzyme : MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html)) 153 * Build genome index for RRBS library for double-enzyme : MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html))
152 154
153 python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC 155 python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC --aligner=bowtie
154 156
155 ##Tips: 157 ##Tips:
156 158
157 - Index built for BS-Seeker2 is different from the index for BS-Seeker 1. 159 - Index built for BS-Seeker2 is different from the index for BS-Seeker 1.
158 For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol. 160 For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol.
170 Module to map reads on 3-letter converted genome. 172 Module to map reads on 3-letter converted genome.
171 173
172 ##Usage : 174 ##Usage :
173 175
174 $ python ~/install/BSseeker2/bs_seeker2-align.py -h 176 $ python ~/install/BSseeker2/bs_seeker2-align.py -h
175 Usage: bs_seeker2-align.py [options] 177 Usage: bs_seeker2-align.py {-i <single> | -1 <mate1> -2 <mate2>} -g <genome.fa> [options]
176 178
177 Options: 179 Options:
178 -h, --help show this help message and exit 180 -h, --help show this help message and exit
179 181
180 For single end reads: 182 For single end reads:
181 -i INFILE, --input=INFILE 183 -i INFILE, --input=INFILE
182 Input your read file name (FORMAT: sequences, 184 Input read file (FORMAT: sequences, qseq, fasta,
183 fastq, qseq,fasta) 185 fastq). Ex: read.fa or read.fa.gz
184 186
185 For pair end reads: 187 For pair end reads:
186 -1 FILE, --input_1=FILE 188 -1 FILE, --input_1=FILE
187 Input your read file end 1 (FORMAT: sequences, 189 Input read file, mate 1 (FORMAT: sequences, qseq,
188 qseq, fasta, fastq) 190 fasta, fastq)
189 -2 FILE, --input_2=FILE 191 -2 FILE, --input_2=FILE
190 Input your read file end 2 (FORMAT: sequences, 192 Input read file, mate 2 (FORMAT: sequences, qseq,
191 qseq, fasta, fastq) 193 fasta, fastq)
192 --minins=MIN_INSERT_SIZE 194 -I MIN_INSERT_SIZE, --minins=MIN_INSERT_SIZE
193 The minimum insert size for valid paired-end 195 The minimum insert size for valid paired-end
194 alignments [Default: -1] 196 alignments [Default: 0]
195 --maxins=MAX_INSERT_SIZE 197 -X MAX_INSERT_SIZE, --maxins=MAX_INSERT_SIZE
196 The maximum insert size for valid paired-end 198 The maximum insert size for valid paired-end
197 alignments [Default: 400] 199 alignments [Default: 500]
198 200
199 Reduced Representation Bisulfite Sequencing Options: 201 Reduced Representation Bisulfite Sequencing Options:
200 -r, --rrbs Process reads from Reduced Representation Bisulfite 202 -r, --rrbs Map reads to the Reduced Representation genome
201 Sequencing experiments
202 -c pattern, --cut-site=pattern 203 -c pattern, --cut-site=pattern
203 Cutting sites of restriction enzyme. Ex: MspI(C-CGG), 204 Cutting sites of restriction enzyme. Ex: MspI(C-CGG),
204 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG). 205 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
206 [Default: C-CGG]
205 -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND 207 -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND
206 lower bound of fragment length (excluding C-CGG ends) 208 Lower bound of fragment length (excluding C-CGG ends)
207 [Default: 40] 209 [Default: 20]
208 -U RRBS_UP_BOUND, --up=RRBS_UP_BOUND 210 -U RRBS_UP_BOUND, --up=RRBS_UP_BOUND
209 upper bound of fragment length (excluding C-CGG ends) 211 Upper bound of fragment length (excluding C-CGG ends)
210 [Default: 500] 212 [Default: 500]
211 213
212 General options: 214 General options:
213 -t TAG, --tag=TAG [Y]es for undirectional lib, [N]o for directional 215 -t TAG, --tag=TAG [Y]es for undirectional lib, [N]o for directional
214 [Default: N] 216 [Default: N]
215 -s CUTNUMBER1, --start_base=CUTNUMBER1 217 -s CUTNUMBER1, --start_base=CUTNUMBER1
216 The first base of your read to be mapped [Default: 1] 218 The first cycle of the read to be mapped [Default: 1]
217 -e CUTNUMBER2, --end_base=CUTNUMBER2 219 -e CUTNUMBER2, --end_base=CUTNUMBER2
218 The last cycle number of your read to be mapped 220 The last cycle of the read to be mapped [Default: 200]
219 [Default: 200]
220 -a FILE, --adapter=FILE 221 -a FILE, --adapter=FILE
221 Input text file of your adaptor sequences (to be 222 Input text file of your adaptor sequences (to be
222 trimed from the 3'end of the reads). Input 1 seq for 223 trimmed from the 3'end of the reads, ). Input one seq
223 dir. lib., 2 seqs for undir. lib. One line per 224 for dir. lib., twon seqs for undir. lib. One line per
224 sequence 225 sequence. Only the first 10bp will be used
225 --am=ADAPTER_MISMATCH 226 --am=ADAPTER_MISMATCH
226 Number of mismatches allowed in adaptor [Default: 1] 227 Number of mismatches allowed in adapter [Default: 0]
227 -g GENOME, --genome=GENOME 228 -g GENOME, --genome=GENOME
228 Name of the reference genome (the same as the 229 Name of the reference genome (should be the same as
229 reference genome file in the preprocessing step) [ex. 230 "-f" in bs_seeker2-build.py ) [ex. chr21_hg18.fa]
230 chr21_hg18.fa] 231 -m NO_MISMATCHES, --mismatches=NO_MISMATCHES
231 -m INT_NO_MISMATCHES, --mismatches=INT_NO_MISMATCHES
232 Number of mismatches in one read [Default: 4] 232 Number of mismatches in one read [Default: 4]
233 --aligner=ALIGNER Aligner program to perform the analisys: bowtie, 233 --aligner=ALIGNER Aligner program for short reads mapping: bowtie,
234 bowtie2, soap [Default: bowtie2] 234 bowtie2, soap, rmap [Default: bowtie]
235 -p PATH, --path=PATH 235 -p PATH, --path=PATH
236 Path to the aligner program. Defaults: 236 Path to the aligner program. Detected:
237 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8 237 bowtie: ~/install/bowtie
238 bowtie2: 238 bowtie2: ~/install/bowtie2
239 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7 239 rmap: ~/install/rmap/bin
240 soap: /u/home/mcdb/weilong/soap2.21release/ 240 soap: ~/install/soap/
241 -d DBPATH, --db=DBPATH 241 -d DBPATH, --db=DBPATH
242 Path to the reference genome library (generated in 242 Path to the reference genome library (generated in
243 preprocessing genome) [Default: /u/home/mcdb/weilong/i 243 preprocessing genome) [Default: ~/i
244 nstall/BSseeker2/bs_utils/reference_genomes] 244 nstall/BSseeker2/bs_utils/reference_genomes]
245 -l NO_SPLIT, --split_line=NO_SPLIT 245 -l NO_SPLIT, --split_line=NO_SPLIT
246 Number of lines per split (the read file will be split 246 Number of lines per split (the read file will be split
247 into small files for mapping. The result will be 247 into small files for mapping. The result will be
248 merged. [Default: 4000000] 248 merged. [Default: 4000000]
249 -o OUTFILE, --output=OUTFILE 249 -o OUTFILE, --output=OUTFILE
250 The name of output file [INFILE.bs(se|pe|rrbs)] 250 The name of output file [INFILE.bs(se|pe|rrbs)]
251 -f FORMAT, --output-format=FORMAT 251 -f FORMAT, --output-format=FORMAT
252 Output format: bam, sam, bs_seeker1 [Default: bam] 252 Output format: bam, sam, bs_seeker1 [Default: bam]
253 --no-header Suppress SAM header lines [Default: False] 253 --no-header Suppress SAM header lines [Default: False]
254 --temp_dir=PATH The path to your temporary directory [Default: /tmp] 254 --temp_dir=PATH The path to your temporary directory [Detected: /tmp]
255 --XS=XS_FILTER Filter definition for tag XS, format X,Y. X=0.8 and 255 --XS=XS_FILTER Filter definition for tag XS, format X,Y. X=0.8 and
256 y=5 indicate that for one read, if #(mCH sites)/#(all 256 y=5 indicate that for one read, if #(mCH sites)/#(all
257 CH sites)>0.8 and #(mCH sites)>5, then tag XS=1; or 257 CH sites)>0.8 and #(mCH sites)>5, then tag XS=1; or
258 else tag XS=0. [Default: 0.5,5] 258 else tag XS=0. [Default: 0.5,5]
259 --multiple-hit Output reads with multiple hits to 259 --multiple-hit Output reads with multiple hits to
261 -v, --version show version of BS-Seeker2 261 -v, --version show version of BS-Seeker2
262 262
263 Aligner Options: 263 Aligner Options:
264 You may specify any additional options for the aligner. You just have 264 You may specify any additional options for the aligner. You just have
265 to prefix them with --bt- for bowtie, --bt2- for bowtie2, --soap- for 265 to prefix them with --bt- for bowtie, --bt2- for bowtie2, --soap- for
266 soap, and BS Seeker will pass them on. For example: --bt-p 4 will 266 soap, --rmap- for rmap, and BS Seeker will pass them on. For example:
267 increase the number of threads for bowtie to 4, --bt--tryhard will 267 --bt-p 4 will increase the number of threads for bowtie to 4, --bt--
268 instruct bowtie to try as hard as possible to find valid alignments 268 tryhard will instruct bowtie to try as hard as possible to find valid
269 when they exist, and so on. Be sure that you know what you are doing 269 alignments when they exist, and so on. Be sure that you know what you
270 when using these options! Also, we don't do any validation on the 270 are doing when using these options! Also, we don't do any validation
271 values. 271 on the values.
272
273 272
274 273
275 ##Examples : 274 ##Examples :
276 275
277 * Align from fasta format with bowtie2 (local alignment) for whole genome, allowing 3 mismatches 276 * WGBS library ; alignment mode, bowtie ; map to WGBS index
278 277
279 python bs_seeker2-align.py -i WGBS.fa -m 3 --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa 278 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie -o WGBS.bam -f bam -g genome.fa
280 279
281 * Align from qseq format for RRBS with bowtie, default parameters for RRBS fragments 280 * WGBS library ; alignment mode, bowtie2-local ; map to WGBS index
282 281
283 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.sam -f sam -g genome.fa -r -a adapter.txt 282 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa
284 283
285 * Align from qseq format for RRBS with bowtie (end-to-end), specifying lengths of fragments ranging [40bp, 400bp] 284 * WGBS library ; alignment mode, bowtie2-end-to-end ; map to WGBS index
286 285
287 python bs_seeker2-align.py -i RRBS.qseq --aligner=bowtie2 --bt2--end-to-end -o RRBS.bam -f bam -g genome.fa -r --low=40 --up=400 -a adapter.txt 286 python bs_seeker2-align.py -i WGBS.fa -m 3 --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa --bt2--end-to-end
287
288 * RRBS library ; alignment mode, bowtie ; map to RR index
289
290 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -r -a adapter.txt
291
292 * RRBS library ; alignment mode, bowtie ; map to WG index
293
294 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -a adapter.txt
295
296 * RRBS library ; alignment mode, bowtie2-end-to-end ; map to WG index
297
298 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.bam -g genome.fa -a adapter.txt --bt2--end-to-end
299
300 * Align from qseq format for RRBS with bowtie, specifying lengths of fragments ranging [40bp, 400bp]
301
302 python bs_seeker2-align.py -i RRBS.qseq --aligner=bowtie -o RRBS.bam -f bam -g genome.fa -r --low=40 --up=400 -a adapter.txt
288 303
289 The parameters '--low' and '--up' should be the same with corresponding parameters when building the genome index 304 The parameters '--low' and '--up' should be the same with corresponding parameters when building the genome index
290 305
306 * WGBS library ; alignment mode, bowtie ; map to WGBS index; use 8 threads for alignment
307
308 python bs_seeker2-align.py -i WGBS.fa --aligner=bowtie -o WGBS.bam -f bam -g genome.fa --bt-p 4
309
310 BS-Seeker2 will run TWO bowtie instances in parallel.
291 311
292 312
293 ##Input file: 313 ##Input file:
294 314
295 - Adapter.txt (example) : 315 - Adapter.txt (example) :
339 359
340 - It's always better to use a wider range for fragment length. 360 - It's always better to use a wider range for fragment length.
341 361
342 For example, if 95% of reads come from fragments with length range [50bp, 250bp], you'd better choose [40bp, 300bp]. 362 For example, if 95% of reads come from fragments with length range [50bp, 250bp], you'd better choose [40bp, 300bp].
343 363
364 - Fewer mismatches for the 'local alignment' mode.
365
366 As the 'local alignment', the bad sequenced bases are usually trimmed, and would not be considered by the parameter "-m".
367 It is suggested to user fewer mismatches for the 'local alignment' mode.
344 368
345 (3) bs_seeker2-call_methylation.py 369 (3) bs_seeker2-call_methylation.py
346 ------------ 370 ------------
347 371
348 372
349 This module calls methylation levels from the mapping result. 373 This module calls methylation levels from the mapping result.
350 374
351 ##Usage: 375 ##Usage:
352 376
353 $ python bs_seeker2-call_methylation.py -h 377 $ python bs_seeker2-call_methylation.py -h
354 Usage: bs_seeker2-call_methylation.py [options]
355
356 Options: 378 Options:
357 -h, --help show this help message and exit 379 -h, --help show this help message and exit
358 -i INFILE, --input=INFILE 380 -i INFILE, --input=INFILE
359 BAM output from bs_seeker2-align.py 381 BAM output from bs_seeker2-align.py
360 -d DBPATH, --db=DBPATH 382 -d DBPATH, --db=DBPATH
361 Path to the reference genome library (generated in 383 Path to the reference genome library (generated in
362 preprocessing genome) [Default: /u/home/mcdb/weilong/i 384 preprocessing genome) [Default: ~/install
363 nstall/BSseeker2/bs_utils/reference_genomes] 385 /BSseeker2/bs_utils/reference_genomes]
364 -o OUTFILE, --output-prefix=OUTFILE 386 -o OUTFILE, --output-prefix=OUTFILE
365 The output prefix to create ATCGmap and wiggle files 387 The output prefix to create ATCGmap and wiggle files
366 [INFILE] 388 [INFILE]
367 --wig=OUTFILE The output .wig file [INFILE.wig] 389 --wig=OUTFILE The output .wig file [INFILE.wig]
368 --CGmap=OUTFILE The output .CGmap file [INFILE.CGmap] 390 --CGmap=OUTFILE The output .CGmap file [INFILE.CGmap]
369 --ATCGmap=OUTFILE The output .ATCGmap file [INFILE.ATCGmap] 391 --ATCGmap=OUTFILE The output .ATCGmap file [INFILE.ATCGmap]
370 -x, --rm-SX Removed reads with tag 'XS:i:1', which would be 392 -x, --rm-SX Removed reads with tag 'XS:i:1', which would be
371 considered as not fully converted by bisulfite 393 considered as not fully converted by bisulfite
372 treatment [Default: False] 394 treatment [Default: False]
395 --txt Show CGmap and ATCGmap in .gz [Default: False]
373 -r READ_NO, --read-no=READ_NO 396 -r READ_NO, --read-no=READ_NO
374 The least number of reads covering one site to be 397 The least number of reads covering one site to be
375 shown in wig file [Default: 1] 398 shown in wig file [Default: 1]
376 -v, --version show version of BS-Seeker2 399 -v, --version show version of BS-Seeker2
377 400
428 (1) chromosome 451 (1) chromosome
429 (2) nucleotide on Watson (+) strand 452 (2) nucleotide on Watson (+) strand
430 (3) position 453 (3) position
431 (4) context (CG/CHG/CHH) 454 (4) context (CG/CHG/CHH)
432 (5) dinucleotide-context (CA/CC/CG/CT) 455 (5) dinucleotide-context (CA/CC/CG/CT)
433 (6) methyltion-level = #-of-C / (#-of-C + #-of-T) 456 (6) methylation-level = #_of_C / (#_of_C + #_of_T).
434 (7) #-of-C (methylated) 457 (7) #_of_C (methylated C, the count of reads showing C here)
435 (8) (#-ofC + #-of-T) (all cytosines) 458 (8) = #_of_C + #_of_T (all Cytosines, the count of reads showing C or T here)
436 459
437 460
438 - ATCGmap file 461 - ATCGmap file
439 462
440 Sample: 463 Sample:
441 464
442 chr1 T 3009410 -- -- 0 10 0 0 0 0 0 0 0 0 na 465 chr1 T 3009410 -- -- 0 10 0 0 0 0 0 0 0 0 na
443 chr1 C 3009411 CHH CC 0 10 0 0 0 0 0 0 0 0 0.0 466 chr1 C 3009411 CHH CC 0 10 0 0 0 0 0 0 0 0 0.0
444 chr1 C 3009412 CHG CC 0 10 0 0 0 0 0 0 0 0 0.0 467 chr1 C 3009412 CHG CC 0 10 0 0 0 0 0 0 0 0 0.0
445 chr1 C 3009413 CG CG 0 10 50 0 0 0 0 0 0 0 0.833333333333 468 chr1 C 3009413 CG CG 0 10 50 0 0 0 0 0 0 0 0.83
446 469
447 470
448 Format descriptions: 471 Format descriptions:
449 472
450 (1) chromosome 473 (1) chromosome
465 (12) # of reads from Crick strand mapped here, support T on Watson strand and A on Crick strand 488 (12) # of reads from Crick strand mapped here, support T on Watson strand and A on Crick strand
466 (13) # of reads from Crick strand mapped here, support C on Watson strand and G on Crick strand 489 (13) # of reads from Crick strand mapped here, support C on Watson strand and G on Crick strand
467 (14) # of reads from Crick strand mapped here, support G on Watson strand and C on Crick strand 490 (14) # of reads from Crick strand mapped here, support G on Watson strand and C on Crick strand
468 (15) # of reads from Crick strand mapped here, support N 491 (15) # of reads from Crick strand mapped here, support N
469 492
470 (16) methylation_level = #C/(#C+#T) = (C8+C14)/(C7+C8+C11+C14); "nan" means none reads support C/T at this position. 493 (16) methylation_level = #C/(#C+#T) = C8/(C7+C8) for Watson strand, =C14/(C11+C14) for Crick strand;
494 "nan" means none reads support C/T at this position.
471 495
472 496
473 497
474 Contact Information 498 Contact Information
475 ============ 499 ============
476 500
477 If you still have questions on BS-Seeker 2, or you find bugs when using BS-Seeker 2, or you have suggestions, please write email to guoweilong@gmail.com (Weilong Guo). 501 If you still have questions on BS-Seeker 2, or you find bugs when using BS-Seeker 2, or you have suggestions, please write email to [Weilong Guo](guoweilong@gmail.com).
478 502
479 503
480 504
481 Questions & Answers 505 Questions & Answers
482 ============ 506 ============
554 mv master BSSeeker2.zip 578 mv master BSSeeker2.zip
555 unzip BSSeeker2.zip 579 unzip BSSeeker2.zip
556 cd BSseeker2-master/ 580 cd BSseeker2-master/
557 581
558 582
559 (4)Run BS-Seeker2 583 (4)Run BS-Seeker2
560 584
561 Q: Can I add the path of BS-Seeker2's *.py to the $PATH, so I can call 585 Q: Can I add the path of BS-Seeker2's *.py to the $PATH, so I can call
562 BS-Seeker2 from anywhere? 586 BS-Seeker2 from anywhere?
563 587
564 A: If you're using the "python" from path "/usr/bin/python", you can directly 588 A: If you're using the "python" from path "/usr/bin/python", you can directly
583 607
584 bs_seeker_build.py -h 608 bs_seeker_build.py -h
585 bs_seeker-align.py -h 609 bs_seeker-align.py -h
586 bs_seeker-call_methylation.py -h 610 bs_seeker-call_methylation.py -h
587 611
588 612 (5) Unique alignment
589 613
590 614 Q: If I want to only keep alignments that map uniquely, is this an argument I should supply directly
615 to Bowtie2 (via BS Seeker 2's command line option), or is this an option that's available in
616 BS Seeker 2 itself?
617
618 A: BS-Seeker2 reports unique alignment by default already. If you want to know how many reads
619 could have multiple hits, run BS-Seeker2 with parameter "--multiple-hit".
620
621
622 (6) Output
623
624 Q: In CGmap files, why some lines shown "--" but not a motif (CG/CHG/CHH), for example:
625
626 chr01 C 4303711 -- CC 0.0 0 2
627 chr01 C 4303712 -- CN 0.0 0 2
628
629 A: In this example, the site 4303713 is "N" in genome, thus we could not decide the explict pattern.
630
631 (7) Algorithm to remove the adapter.
632
633 Q: What's the algorithm to remove the adapter
634
635 A: BS-Seeker2 has built-in algorithm for removing the adapter, which is developed by [Weilong Guo](http://bioinfo.au.tsinghua.edu.cn/member/wguo/index.html).
636
637 First, if the adapter was provided as "AGATCGGAAGAGCACACGTC", only the first 10bp would be used.
638 Second, we use semi-local alignment strategy for removing the adapters.
639 Exmaple:
640
641 Read : ACCGCGTTGATCGAGTACGTACGTGGGTC
642 Adapter : ....................ACGTGGGTCCCG
643
644 no_mismatch : the maximum number allowed for mismatches
645
646 Algorithm: (allowing 1 mismatch)
647 -Step 1:
648 ACCGCGTTGATCGAGTACGTACGTGGGTC
649 ||XX
650 ACGTGGGTCCCG
651 -Step 2:
652 ACCGCGTTGATCGAGTACGTACGTGGGTC
653 X||X
654 .ACGTGGGTCCCG
655 -Step 3:
656 ACCGCGTTGATCGAGTACGTACGTGGGTC
657 XX
658 ..ACGTGGGTCCCG
659 -Step ...
660 -Step N:
661 ACCGCGTTGATCGAGTACGTACGTGGGTC
662 |||||||||
663 ....................ACGTGGGTCCCG
664 Success & return!
665
666 Third, we also removed the synthesized bases at the end of RRBS fragments.
667 Take the "C-CGG" cutting site as example,
668
669 - - C|U G G - - =>cut=> - - C =>add=> - - C|C G =>sequencing
670 - - G G C|C - - - - G G C - - G G C
671
672 In our algorithm, the "CG" in "--CCG" (upper strand) was trimmed, in order to get accurate methyaltion level.
673
674
675
676
677