annotate BSseeker2/README.md @ 0:e6df770c0e58 draft

Initial upload
author weilong-guo
date Fri, 12 Jul 2013 18:47:28 -0400
parents
children 8b26adf64adc
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
1 BS-Seeker2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
2 =========
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
3
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
4 BS-Seeker2 (BS Seeker 2) performs accurate and fast mapping of bisulfite-treated short reads. BS-Seeker2 is an updated version on BS-Seeker.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
5
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
6 0. Availability
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
7 ============
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
8
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
9 Homepage of [BS-Seeker2](http://pellegrini.mcdb.ucla.edu/BS_Seeker2/).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
10
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
11 The source code for this package is available from
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
12 [https://github.com/BSSeeker/BSseeker2](https://github.com/BSSeeker/BSseeker2).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
13 Also, you can use an instance of BS-Seeker 2 in Galaxy from [http://galaxy.hoffman2.idre.ucla.edu](http://galaxy.hoffman2.idre.ucla.edu).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
14 (Label: "NGS: Methylation Mapping"/"Methylation Map with BS Seeker2")
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
15
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
16
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
17 1. Remarkable new features
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
18 ============
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
19 * Reduced index for RRBS, accelerating the mapping speed and increasing mappability
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
20 * Allowing local alignment with Bowtie 2, increased the mappability
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
21
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
22 2. Other features
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
23 ============
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
24 * Supported library types
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
25 - whole genome-wide bisulfite sequencing (WGBS)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
26 - reduced representative bisulfite sequencing (RRBS)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
27
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
28 * Supported formats for input file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
29 - [fasta](http://en.wikipedia.org/wiki/FASTA_format)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
30 - [fastq](http://en.wikipedia.org/wiki/FASTQ_format)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
31 - [qseq](http://jumpgate.caltech.edu/wiki/QSeq)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
32 - pure sequence (one-line one-sequence)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
33
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
34 * Supported alignment tools
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
35 - [bowtie](http://bowtie-bio.sourceforge.net/index.shtml) : Single-seed
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
36 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) : Multiple-seed, gapped-alignment
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
37 - [local alignment](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#local-alignment-example)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
38 - [end-to-end alignment](http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#end-to-end-alignment-example)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
39 - [SOAP](http://soap.genomics.org.cn/)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
40
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
41 * Supported formats for mapping results
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
42 - [BAM](http://genome.ucsc.edu/FAQ/FAQformat.html#format5.1)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
43 - [SAM](http://samtools.sourceforge.net/)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
44 - [BS-seeker 1](http://pellegrini.mcdb.ucla.edu/BS_Seeker/USAGE.html)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
45
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
46 3. System requirements
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
47 ============
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
48
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
49 * Linux or Mac OS platform
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
50 * One of the following Aligner
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
51 - [bowtie](http://bowtie-bio.sourceforge.net/)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
52 - [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/) (Recommend)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
53 - [soap](http://soap.genomics.org.cn/)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
54 * [Python](http://www.python.org/download/) (Version 2.6 +)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
55
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
56 (It is normally pre-installed in Linux. Type " python -V" to see the installed version.)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
57
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
58 * [pysam](http://code.google.com/p/pysam/) package is needed.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
59
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
60 (Read "Questions & Answers" if you have problem when installing this package.)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
61
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
62
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
63
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
64 4. Modules' descriptions
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
65 ============
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
66
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
67 (0) FilterReads.py
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
68 ------------
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
69
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
70 Optional and independent module.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
71 Some reads would be extremely amplified during the PCR. This script helps you get unique reads before doing the mapping. You can decide whether or not to filter reads before doing the mapping.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
72
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
73 ##Usage :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
74
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
75 $ python FilterReads.py
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
76 Usage: FilterReads.py -i <input> -o <output> [-k]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
77 Author : Guo, Weilong; guoweilong@gmail.com; 2012-11-10
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
78 Unique reads for qseq/fastq/fasta/sequencce, and filter
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
79 low quality file in qseq file.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
80
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
81 Options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
82 -h, --help show this help message and exit
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
83 -i FILE Name of the input qseq/fastq/fasta/sequence file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
84 -o FILE Name of the output file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
85 -k Would not filter low quality reads if specified
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
86
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
87
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
88 ##Tip :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
89
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
90 - This step is not suggested for RRBS library, as reads from RRBS library would more likely from the same location.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
91
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
92
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
93 (1) bs_seeker2-build.py
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
94 ------------
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
95
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
96 Module to build the index for BS-Seeker2.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
97
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
98
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
99 ##Usage :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
100
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
101 $ python bs_seeker2-build.py -h
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
102 Usage: bs_seeker2-build.py [options]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
103
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
104 Options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
105 -h, --help show this help message and exit
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
106 -f FILE, --file=FILE Input your reference genome file (fasta)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
107 --aligner=ALIGNER Aligner program to perform the analysis: bowtie,
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
108 bowtie2, soap [Default: bowtie2]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
109 -p PATH, --path=PATH Path to the aligner program. Defaults:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
110 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
111 bowtie2:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
112 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
113 soap: /u/home/mcdb/weilong/install/soap2.21release/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
114 -d DBPATH, --db=DBPATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
115 Path to the reference genome library (generated in
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
116 preprocessing genome) [Default: /u/home/mcdb/weilong/i
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
117 nstall/BSseeker2/bs_utils/reference_genomes]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
118 -v, --version show version of BS-Seeker2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
119
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
120 Reduced Representation Bisulfite Sequencing Options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
121 Use this options with conjuction of -r [--rrbs]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
122
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
123 -r, --rrbs Build index specially for Reduced Representation
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
124 Bisulfite Sequencing experiments. Genome other than
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
125 certain fragments will be masked. [Default: False]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
126 -l LOW_BOUND, --low=LOW_BOUND
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
127 lower bound of fragment length (excluding recognition
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
128 sequence such as C-CGG) [Default: 40]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
129 -u UP_BOUND, --up=UP_BOUND
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
130 upper bound of fragment length (excluding recognition
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
131 sequence such as C-CGG ends) [Default: 500]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
132 -c CUT_FORMAT, --cut-site=CUT_FORMAT
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
133 Cut sites of restriction enzyme. Ex: MspI(C-CGG),
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
134 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
135 [Default: C-CGG]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
136
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
137 ##Example
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
138
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
139 * Build genome index for WGBS using bowtie, path of bowtie should be included in $PATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
140
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
141 python bs_seeker2-build.py -f genome.fa --aligner=bowtie
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
142
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
143 * Build genome index for RRBS with default parameters specifying the path for bowtie2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
144
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
145 python bs_seeker2-build.py -f genome.fa --aligner=bowtie2 -p ~/install/bowtie2-2.0.0-beta7/ -r
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
146
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
147 * Build genome index for RRBS library using bowite2, with fragment lengths ranging [40bp, 400bp]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
148
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
149 python bs_seeker2-build.py -f genome.fa -r -l 40 -u 400 --aligner=bowtie2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
150
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
151 * Build genome index for RRBS library for double-enzyme : MspI (C-CGG) & ApeKI (G-CWGC, where W=A|T, see [IUPAC code](http://www.bioinformatics.org/sms/iupac.html))
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
152
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
153 python bs_seeker2-build.py -f genome.fa -r -c C-CGG,G-CWGC
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
154
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
155 ##Tips:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
156
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
157 - Index built for BS-Seeker2 is different from the index for BS-Seeker 1.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
158 For RRBS, you need to specify "-r" in the parameters. Also, you need to specify LOW_BOUND and UP_BOUND for the range of fragment lengths according your protocol.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
159
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
160 - The fragment length is different from read length. Fragments refers to the DNA fragments which you get by size-selection step (i.e. gel-cut oor AMPure beads). Lengths of fragments are supposed to be in a range, such as [50bp,250bp].
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
161
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
162 - The indexes for RRBS and WGBS are different. Also, indexes for RRBS are specific for fragment length parameters (LOW_BOUND and UP_BOUND).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
163
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
164
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
165
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
166
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
167 (2) bs_seeker2-align.py
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
168 ------------
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
169
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
170 Module to map reads on 3-letter converted genome.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
171
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
172 ##Usage :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
173
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
174 $ python ~/install/BSseeker2/bs_seeker2-align.py -h
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
175 Usage: bs_seeker2-align.py [options]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
176
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
177 Options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
178 -h, --help show this help message and exit
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
179
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
180 For single end reads:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
181 -i INFILE, --input=INFILE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
182 Input your read file name (FORMAT: sequences,
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
183 fastq, qseq,fasta)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
184
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
185 For pair end reads:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
186 -1 FILE, --input_1=FILE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
187 Input your read file end 1 (FORMAT: sequences,
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
188 qseq, fasta, fastq)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
189 -2 FILE, --input_2=FILE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
190 Input your read file end 2 (FORMAT: sequences,
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
191 qseq, fasta, fastq)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
192 --minins=MIN_INSERT_SIZE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
193 The minimum insert size for valid paired-end
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
194 alignments [Default: -1]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
195 --maxins=MAX_INSERT_SIZE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
196 The maximum insert size for valid paired-end
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
197 alignments [Default: 400]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
198
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
199 Reduced Representation Bisulfite Sequencing Options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
200 -r, --rrbs Process reads from Reduced Representation Bisulfite
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
201 Sequencing experiments
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
202 -c pattern, --cut-site=pattern
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
203 Cutting sites of restriction enzyme. Ex: MspI(C-CGG),
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
204 Mael:(C-TAG), double-enzyme MspI&Mael:(C-CGG,C-TAG).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
205 -L RRBS_LOW_BOUND, --low=RRBS_LOW_BOUND
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
206 lower bound of fragment length (excluding C-CGG ends)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
207 [Default: 40]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
208 -U RRBS_UP_BOUND, --up=RRBS_UP_BOUND
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
209 upper bound of fragment length (excluding C-CGG ends)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
210 [Default: 500]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
211
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
212 General options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
213 -t TAG, --tag=TAG [Y]es for undirectional lib, [N]o for directional
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
214 [Default: N]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
215 -s CUTNUMBER1, --start_base=CUTNUMBER1
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
216 The first base of your read to be mapped [Default: 1]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
217 -e CUTNUMBER2, --end_base=CUTNUMBER2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
218 The last cycle number of your read to be mapped
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
219 [Default: 200]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
220 -a FILE, --adapter=FILE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
221 Input text file of your adaptor sequences (to be
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
222 trimed from the 3'end of the reads). Input 1 seq for
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
223 dir. lib., 2 seqs for undir. lib. One line per
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
224 sequence
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
225 --am=ADAPTER_MISMATCH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
226 Number of mismatches allowed in adaptor [Default: 1]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
227 -g GENOME, --genome=GENOME
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
228 Name of the reference genome (the same as the
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
229 reference genome file in the preprocessing step) [ex.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
230 chr21_hg18.fa]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
231 -m INT_NO_MISMATCHES, --mismatches=INT_NO_MISMATCHES
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
232 Number of mismatches in one read [Default: 4]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
233 --aligner=ALIGNER Aligner program to perform the analisys: bowtie,
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
234 bowtie2, soap [Default: bowtie2]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
235 -p PATH, --path=PATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
236 Path to the aligner program. Defaults:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
237 bowtie: /u/home/mcdb/weilong/install/bowtie-0.12.8
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
238 bowtie2:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
239 /u/home/mcdb/weilong/install/bowtie2-2.0.0-beta7
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
240 soap: /u/home/mcdb/weilong/soap2.21release/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
241 -d DBPATH, --db=DBPATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
242 Path to the reference genome library (generated in
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
243 preprocessing genome) [Default: /u/home/mcdb/weilong/i
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
244 nstall/BSseeker2/bs_utils/reference_genomes]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
245 -l NO_SPLIT, --split_line=NO_SPLIT
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
246 Number of lines per split (the read file will be split
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
247 into small files for mapping. The result will be
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
248 merged. [Default: 4000000]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
249 -o OUTFILE, --output=OUTFILE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
250 The name of output file [INFILE.bs(se|pe|rrbs)]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
251 -f FORMAT, --output-format=FORMAT
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
252 Output format: bam, sam, bs_seeker1 [Default: bam]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
253 --no-header Suppress SAM header lines [Default: False]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
254 --temp_dir=PATH The path to your temporary directory [Default: /tmp]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
255 --XS=XS_FILTER Filter definition for tag XS, format X,Y. X=0.8 and
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
256 y=5 indicate that for one read, if #(mCH sites)/#(all
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
257 CH sites)>0.8 and #(mCH sites)>5, then tag XS=1; or
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
258 else tag XS=0. [Default: 0.5,5]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
259 --multiple-hit Output reads with multiple hits to
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
260 file"Multiple_hit.fa"
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
261 -v, --version show version of BS-Seeker2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
262
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
263 Aligner Options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
264 You may specify any additional options for the aligner. You just have
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
265 to prefix them with --bt- for bowtie, --bt2- for bowtie2, --soap- for
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
266 soap, and BS Seeker will pass them on. For example: --bt-p 4 will
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
267 increase the number of threads for bowtie to 4, --bt--tryhard will
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
268 instruct bowtie to try as hard as possible to find valid alignments
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
269 when they exist, and so on. Be sure that you know what you are doing
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
270 when using these options! Also, we don't do any validation on the
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
271 values.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
272
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
273
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
274
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
275 ##Examples :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
276
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
277 * Align from fasta format with bowtie2 (local alignment) for whole genome, allowing 3 mismatches
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
278
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
279 python bs_seeker2-align.py -i WGBS.fa -m 3 --aligner=bowtie2 -o WGBS.bam -f bam -g genome.fa
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
280
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
281 * Align from qseq format for RRBS with bowtie, default parameters for RRBS fragments
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
282
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
283 python bs_seeker2-align.py -i RRBS.fa --aligner=bowtie -o RRBS.sam -f sam -g genome.fa -r -a adapter.txt
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
284
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
285 * Align from qseq format for RRBS with bowtie (end-to-end), specifying lengths of fragments ranging [40bp, 400bp]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
286
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
287 python bs_seeker2-align.py -i RRBS.qseq --aligner=bowtie2 --bt2--end-to-end -o RRBS.bam -f bam -g genome.fa -r --low=40 --up=400 -a adapter.txt
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
288
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
289 The parameters '--low' and '--up' should be the same with corresponding parameters when building the genome index
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
290
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
291
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
292
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
293 ##Input file:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
294
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
295 - Adapter.txt (example) :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
296
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
297 AGATCGGAAGAGCACACGTC
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
298
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
299
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
300 ##Output files:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
301
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
302 - SAM file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
303
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
304 Sample:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
305
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
306 10918 0 chr1 133859922 255 100M * 0 0 TGGTTGTTTTTGTTATAGTTTTTTGTTGTAGAGTTTTTTTTGGAAAGTTGTGTTTATTTTTTTTTTTGTTTGGGTTTTGTTTGAAAGGGGTGGATGAGTT * XO:Z:+FW XS:i:0 NM:i:3 XM:Z:x--yx-zzzy--y--y--zz-zyx-yx-y--------z------------x--------z--zzz----y----y--x-zyx--------y--------z XG:Z:-C_CGGCCGCCCCTGCTGCAGCCTCCCGCCGCAGAGTTTTCTTTGGAAAGTTGCGTTTATTTCTTCCCTTGTCTGGGCTGCGCCCGAAAGGGGCAGATGAGTC_AC
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
307
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
308
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
309 Format descriptions:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
310
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
311 BS-Seeker2 specific tags:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
312 XO : orientation, from forward/reverted
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
313 XS : 1 when read is recognized as not fully converted by bisulfite treatment, or else 0
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
314 XM : number of sites for mismatch
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
315 X: methylated CG
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
316 x: un-methylated CG
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
317 Y: methylated CHG
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
318 y: un-methylated CHG
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
319 Z: methylated CHH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
320 z: un-methylated CHH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
321 XG : genome sequences, with 2bp extended on both ends, from 5' to 3'
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
322 YR : tag only for RRBS, serial id of mapped fragment
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
323 YS : tag only for RRBS, start position of mapped fragment
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
324 YE : tag only for RRBS, end position of mapped fragment
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
325
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
326 Note:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
327 For reads mapped on Watson(minus) strand, the 10th colum in SAM file is not the original reads but the revered sequences.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
328
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
329
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
330 ##Tips:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
331
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
332 - Removing adapter is recommended.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
333
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
334 If you don't know what's your parameter, please ask the person who generate the library for you.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
335
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
336 If you are too shy to ask for it, you can try to de novo motif finding tools (such as [DME](http://cb1.utdallas.edu/dme/index.htm) and [MEME](http://meme.nbcr.net/meme/cgi-bin/meme.cgi)) find the enriched pattern in 1000 reads.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
337
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
338 Of course, you can also use other tools (such as [cutadapt](http://code.google.com/p/cutadapt/) ) to remove adaptor first.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
339
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
340 - It's always better to use a wider range for fragment length.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
341
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
342 For example, if 95% of reads come from fragments with length range [50bp, 250bp], you'd better choose [40bp, 300bp].
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
343
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
344
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
345 (3) bs_seeker2-call_methylation.py
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
346 ------------
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
347
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
348
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
349 This module calls methylation levels from the mapping result.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
350
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
351 ##Usage:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
352
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
353 $ python bs_seeker2-call_methylation.py -h
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
354 Usage: bs_seeker2-call_methylation.py [options]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
355
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
356 Options:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
357 -h, --help show this help message and exit
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
358 -i INFILE, --input=INFILE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
359 BAM output from bs_seeker2-align.py
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
360 -d DBPATH, --db=DBPATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
361 Path to the reference genome library (generated in
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
362 preprocessing genome) [Default: /u/home/mcdb/weilong/i
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
363 nstall/BSseeker2/bs_utils/reference_genomes]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
364 -o OUTFILE, --output-prefix=OUTFILE
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
365 The output prefix to create ATCGmap and wiggle files
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
366 [INFILE]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
367 --wig=OUTFILE The output .wig file [INFILE.wig]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
368 --CGmap=OUTFILE The output .CGmap file [INFILE.CGmap]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
369 --ATCGmap=OUTFILE The output .ATCGmap file [INFILE.ATCGmap]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
370 -x, --rm-SX Removed reads with tag 'XS:i:1', which would be
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
371 considered as not fully converted by bisulfite
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
372 treatment [Default: False]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
373 -r READ_NO, --read-no=READ_NO
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
374 The least number of reads covering one site to be
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
375 shown in wig file [Default: 1]
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
376 -v, --version show version of BS-Seeker2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
377
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
378
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
379 ##Example :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
380
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
381 -For WGBS (whole genome bisulfite sequencing):
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
382
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
383 python bs_seeker2-call_methylation.py -i WGBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_bowtie/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
384
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
385 -For RRBS:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
386
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
387 python bs_seeker2-call_methylation.py -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_40_400_bowtie2/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
388
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
389 -For RRBS and removed un-converted reads (with tag XS=1):
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
390
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
391 python bs_seeker2-call_methylation.py -x -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_75_280_bowtie2/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
392
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
393 -For RRBS and only show sites covered by at least 10 reads in WIG file:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
394
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
395 python bs_seeker2-call_methylation.py -r 10 -i RRBS.bam -o output --db /path/to/BSseeker2/bs_utils/reference_genomes/genome.fa_rrbs_75_280_bowtie2/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
396
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
397
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
398 The folder “genome.fa\_rrbs\_40\_500\_bowtie2” is built in the first step
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
399
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
400 ##Output files:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
401
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
402 - wig file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
403
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
404 Sample:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
405
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
406 variableStep chrom=chr1
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
407 3000419 0.000000
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
408 3000423 -0.2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
409 3000440 0.000000
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
410 3000588 0.5
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
411 3000593 -0.000000
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
412
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
413
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
414 Format descriptions:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
415 WIG file format. Negative value for 2nd column indicate a Cytosine on minus strand.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
416
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
417
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
418 - CGmap file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
419
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
420 Sample:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
421
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
422 chr1 G 3000851 CHH CC 0.1 1 10
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
423 chr1 C 3001624 CHG CA 0.0 0 9
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
424 chr1 C 3001631 CG CG 1.0 5 5
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
425
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
426 Format descriptions:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
427
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
428 (1) chromosome
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
429 (2) nucleotide on Watson (+) strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
430 (3) position
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
431 (4) context (CG/CHG/CHH)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
432 (5) dinucleotide-context (CA/CC/CG/CT)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
433 (6) methyltion-level = #-of-C / (#-of-C + #-of-T)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
434 (7) #-of-C (methylated)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
435 (8) (#-ofC + #-of-T) (all cytosines)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
436
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
437
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
438 - ATCGmap file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
439
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
440 Sample:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
441
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
442 chr1 T 3009410 -- -- 0 10 0 0 0 0 0 0 0 0 na
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
443 chr1 C 3009411 CHH CC 0 10 0 0 0 0 0 0 0 0 0.0
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
444 chr1 C 3009412 CHG CC 0 10 0 0 0 0 0 0 0 0 0.0
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
445 chr1 C 3009413 CG CG 0 10 50 0 0 0 0 0 0 0 0.833333333333
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
446
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
447
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
448 Format descriptions:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
449
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
450 (1) chromosome
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
451 (2) nucleotide on Watson (+) strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
452 (3) position
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
453 (4) context (CG/CHG/CHH)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
454 (5) dinucleotide-context (CA/CC/CG/CT)
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
455
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
456 (6) - (10) plus strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
457 (6) # of reads from Watson strand mapped here, support A on Watson strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
458 (7) # of reads from Watson strand mapped here, support T on Watson strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
459 (8) # of reads from Watson strand mapped here, support C on Watson strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
460 (9) # of reads from Watson strand mapped here, support G on Watson strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
461 (10) # of reads from Watson strand mapped here, support N
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
462
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
463 (11) - (15) minus strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
464 (11) # of reads from Crick strand mapped here, support A on Watson strand and T on Crick strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
465 (12) # of reads from Crick strand mapped here, support T on Watson strand and A on Crick strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
466 (13) # of reads from Crick strand mapped here, support C on Watson strand and G on Crick strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
467 (14) # of reads from Crick strand mapped here, support G on Watson strand and C on Crick strand
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
468 (15) # of reads from Crick strand mapped here, support N
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
469
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
470 (16) methylation_level = #C/(#C+#T) = (C8+C14)/(C7+C8+C11+C14); "nan" means none reads support C/T at this position.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
471
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
472
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
473
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
474 Contact Information
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
475 ============
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
476
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
477 If you still have questions on BS-Seeker 2, or you find bugs when using BS-Seeker 2, or you have suggestions, please write email to guoweilong@gmail.com (Weilong Guo).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
478
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
479
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
480
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
481 Questions & Answers
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
482 ============
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
483
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
484 (1) Speed-up your alignment
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
485
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
486 Q: "It takes me days to do the alignment for one lane" ...
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
487
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
488 A: Yes, alignment is a time-consuming work, especially because the sequencing depth is increasing. An efficient way to align is :
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
489
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
490 i. cut the original sequence file into multiple small pieces;
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
491
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
492 Ex: split -l 4000000 input.fq
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
493
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
494 ii. align them in parallel;
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
495 iii. merge all the BAM files into a single one before running "bs-seeker2_call-methylation.py" (user "samtools merge" command).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
496
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
497 Ex: samtools merge out.bam in1.bam in2.bam in3.bam
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
498
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
499 (2) read in BAM/SAM
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
500
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
501 Q: Is the read sequence in BAM/SAM file is the same as my original one?
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
502
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
503 A: NO. They are different for several reasons.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
504
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
505 i. For RRBS, some reads are short because of trimming of the adapters
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
506 ii. For read mapping on Crick (-) strand, the reads are in fact the antisense version of the original sequence, opposite both in nucleotides and direction
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
507
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
508 (3) "Pysam" package related problem
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
509
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
510 Q: I'm normal account user for Linux(Cluster). I can't install "pysam". I get following error massages:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
511
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
512
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
513 $ python setup.py install
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
514 running install
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
515 error: can't create or remove files in install directory
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
516 The following error occurred while trying to add or remove files in the
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
517 installation directory:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
518 [Errno 13] Permission denied: '/usr/lib64/python2.6/site-packages/test-easy-install-26802.write-test'
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
519 ...
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
520
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
521
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
522 A: You can ask the administrator of your cluster to install pysam. If you don't want to bother him/her, you might need to build your own python, and then install the "pysam" package. The following script could be helpful for you.
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
523
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
524
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
525 mkdir ~/install
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
526 cd ~/install/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
527
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
528 # install python
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
529 wget http://www.python.org/ftp/python/2.7.4/Python-2.7.4.tgz # download the python from websites
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
530 tar zxvf Python-2.7.4.tgz # decompress
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
531 cd Python-2.7.4
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
532 ./configure --prefix=`pwd`
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
533 make
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
534 make install
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
535
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
536 # Add the path of Python to $PATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
537 # Please add the following line to file ~/.bashrc
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
538
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
539 export PATH=~/install/Python-2.7.4:$PATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
540
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
541 # save the ~/.bashrc file
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
542 source ~/.bashrc
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
543
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
544 # install pysam package
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
545 wget https://pysam.googlecode.com/files/pysam-0.7.4.tar.gz
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
546 tar zxvf pysam-0.7.4.tar.gz
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
547 cd pysam-0.7.4
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
548 python setup.py build
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
549 python setup.py install
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
550 # re-login the shell after finish installing pysam
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
551
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
552 # install BS-Seeker2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
553 wget https://github.com/BSSeeker/BSseeker2/archive/master.zip
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
554 mv master BSSeeker2.zip
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
555 unzip BSSeeker2.zip
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
556 cd BSseeker2-master/
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
557
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
558
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
559 (4)Run BS-Seeker2
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
560
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
561 Q: Can I add the path of BS-Seeker2's *.py to the $PATH, so I can call
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
562 BS-Seeker2 from anywhere?
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
563
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
564 A: If you're using the "python" from path "/usr/bin/python", you can directly
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
565 add the path of BS-Seeker2 in file "~/.bash_profile" (bash) or "~/.profile"
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
566 (other shell) or "~/.bashrc" (per-interactive-shell startup).
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
567 But if you are using python under other directories, you might need to modify
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
568 BS-Seeker2's script first. For example, if your python path is "/my_python/python",
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
569 please change the first line in "bs_seeker-build.py", "bs_seeker-align.py" and
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
570 "bs_seeker-call_methylation.py" to
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
571
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
572 #!/my_python/python
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
573
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
574 Then add
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
575
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
576 export PATH=/path/to/BS-Seeker2/:$PATH
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
577
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
578 to file "~/.bash_profile" (e.g.), and source the file:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
579
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
580 source ~/.bash_profile
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
581
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
582 Then you can use BS-Seeker2 globally by typing:
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
583
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
584 bs_seeker_build.py -h
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
585 bs_seeker-align.py -h
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
586 bs_seeker-call_methylation.py -h
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
587
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
588
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
589
e6df770c0e58 Initial upload
weilong-guo
parents:
diff changeset
590