annotate pyPRADA_1.2/tools/samtools-0.1.16/samtools.1 @ 3:f17965495ec9 draft default tip

Uploaded
author siyuan
date Tue, 11 Mar 2014 12:14:01 -0400
parents acc2ca1a3ba4
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
1 .TH samtools 1 "21 April 2011" "samtools-0.1.16" "Bioinformatics tools"
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
2 .SH NAME
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
3 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
4 samtools - Utilities for the Sequence Alignment/Map (SAM) format
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
5 .SH SYNOPSIS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
6 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
7 samtools view -bt ref_list.txt -o aln.bam aln.sam.gz
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
8 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
9 samtools sort aln.bam aln.sorted
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
10 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
11 samtools index aln.sorted.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
12 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
13 samtools idxstats aln.sorted.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
14 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
15 samtools view aln.sorted.bam chr2:20,100,000-20,200,000
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
16 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
17 samtools merge out.bam in1.bam in2.bam in3.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
18 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
19 samtools faidx ref.fasta
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
20 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
21 samtools pileup -vcf ref.fasta aln.sorted.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
22 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
23 samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
24 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
25 samtools tview aln.sorted.bam ref.fasta
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
26
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
27 .SH DESCRIPTION
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
28 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
29 Samtools is a set of utilities that manipulate alignments in the BAM
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
30 format. It imports from and exports to the SAM (Sequence Alignment/Map)
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
31 format, does sorting, merging and indexing, and allows to retrieve reads
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
32 in any regions swiftly.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
33
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
34 Samtools is designed to work on a stream. It regards an input file `-'
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
35 as the standard input (stdin) and an output file `-' as the standard
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
36 output (stdout). Several commands can thus be combined with Unix
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
37 pipes. Samtools always output warning and error messages to the standard
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
38 error output (stderr).
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
39
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
40 Samtools is also able to open a BAM (not SAM) file on a remote FTP or
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
41 HTTP server if the BAM file name starts with `ftp://' or `http://'.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
42 Samtools checks the current working directory for the index file and
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
43 will download the index upon absence. Samtools does not retrieve the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
44 entire alignment file unless it is asked to do so.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
45
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
46 .SH COMMANDS AND OPTIONS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
47
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
48 .TP 10
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
49 .B view
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
50 samtools view [-bchuHS] [-t in.refList] [-o output] [-f reqFlag] [-F
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
51 skipFlag] [-q minMapQ] [-l library] [-r readGroup] [-R rgFile] <in.bam>|<in.sam> [region1 [...]]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
52
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
53 Extract/print all or sub alignments in SAM or BAM format. If no region
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
54 is specified, all the alignments will be printed; otherwise only
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
55 alignments overlapping the specified regions will be output. An
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
56 alignment may be given multiple times if it is overlapping several
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
57 regions. A region can be presented, for example, in the following
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
58 format: `chr2' (the whole chr2), `chr2:1000000' (region starting from
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
59 1,000,000bp) or `chr2:1,000,000-2,000,000' (region between 1,000,000 and
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
60 2,000,000bp including the end points). The coordinate is 1-based.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
61
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
62 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
63 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
64 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
65 .B -b
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
66 Output in the BAM format.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
67 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
68 .BI -f \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
69 Only output alignments with all bits in INT present in the FLAG
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
70 field. INT can be in hex in the format of /^0x[0-9A-F]+/ [0]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
71 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
72 .BI -F \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
73 Skip alignments with bits present in INT [0]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
74 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
75 .B -h
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
76 Include the header in the output.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
77 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
78 .B -H
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
79 Output the header only.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
80 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
81 .BI -l \ STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
82 Only output reads in library STR [null]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
83 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
84 .BI -o \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
85 Output file [stdout]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
86 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
87 .BI -q \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
88 Skip alignments with MAPQ smaller than INT [0]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
89 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
90 .BI -r \ STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
91 Only output reads in read group STR [null]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
92 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
93 .BI -R \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
94 Output reads in read groups listed in
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
95 .I FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
96 [null]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
97 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
98 .B -S
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
99 Input is in SAM. If @SQ header lines are absent, the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
100 .B `-t'
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
101 option is required.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
102 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
103 .B -c
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
104 Instead of printing the alignments, only count them and print the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
105 total number. All filter options, such as
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
106 .B `-f',
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
107 .B `-F'
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
108 and
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
109 .B `-q'
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
110 , are taken into account.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
111 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
112 .BI -t \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
113 This file is TAB-delimited. Each line must contain the reference name
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
114 and the length of the reference, one line for each distinct reference;
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
115 additional fields are ignored. This file also defines the order of the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
116 reference sequences in sorting. If you run `samtools faidx <ref.fa>',
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
117 the resultant index file
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
118 .I <ref.fa>.fai
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
119 can be used as this
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
120 .I <in.ref_list>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
121 file.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
122 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
123 .B -u
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
124 Output uncompressed BAM. This option saves time spent on
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
125 compression/decomprssion and is thus preferred when the output is piped
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
126 to another samtools command.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
127 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
128
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
129 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
130 .B tview
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
131 samtools tview <in.sorted.bam> [ref.fasta]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
132
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
133 Text alignment viewer (based on the ncurses library). In the viewer,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
134 press `?' for help and press `g' to check the alignment start from a
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
135 region in the format like `chr10:10,000,000' or `=10,000,000' when
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
136 viewing the same reference sequence.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
137
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
138 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
139 .B mpileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
140 samtools mpileup [-EBug] [-C capQcoef] [-r reg] [-f in.fa] [-l list] [-M capMapQ] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
141
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
142 Generate BCF or pileup for one or multiple BAM files. Alignment records
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
143 are grouped by sample identifiers in @RG header lines. If sample
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
144 identifiers are absent, each input file is regarded as one sample.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
145
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
146 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
147 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
148 .TP 10
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
149 .B -A
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
150 Do not skip anomalous read pairs in variant calling.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
151 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
152 .B -B
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
153 Disable probabilistic realignment for the computation of base alignment
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
154 quality (BAQ). BAQ is the Phred-scaled probability of a read base being
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
155 misaligned. Applying this option greatly helps to reduce false SNPs
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
156 caused by misalignments.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
157 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
158 .BI -C \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
159 Coefficient for downgrading mapping quality for reads containing
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
160 excessive mismatches. Given a read with a phred-scaled probability q of
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
161 being generated from the mapped position, the new mapping quality is
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
162 about sqrt((INT-q)/INT)*INT. A zero value disables this
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
163 functionality; if enabled, the recommended value for BWA is 50. [0]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
164 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
165 .BI -d \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
166 At a position, read maximally
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
167 .I INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
168 reads per input BAM. [250]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
169 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
170 .B -D
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
171 Output per-sample read depth
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
172 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
173 .BI -e \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
174 Phred-scaled gap extension sequencing error probability. Reducing
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
175 .I INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
176 leads to longer indels. [20]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
177 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
178 .B -E
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
179 Extended BAQ computation. This option helps sensitivity especially for MNPs, but may hurt
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
180 specificity a little bit.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
181 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
182 .BI -f \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
183 The reference file [null]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
184 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
185 .B -g
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
186 Compute genotype likelihoods and output them in the binary call format (BCF).
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
187 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
188 .BI -h \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
189 Coefficient for modeling homopolymer errors. Given an
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
190 .IR l -long
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
191 homopolymer
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
192 run, the sequencing error of an indel of size
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
193 .I s
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
194 is modeled as
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
195 .IR INT * s / l .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
196 [100]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
197 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
198 .B -I
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
199 Do not perform INDEL calling
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
200 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
201 .BI -l \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
202 File containing a list of sites where pileup or BCF is outputted [null]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
203 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
204 .BI -L \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
205 Skip INDEL calling if the average per-sample depth is above
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
206 .IR INT .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
207 [250]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
208 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
209 .BI -o \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
210 Phred-scaled gap open sequencing error probability. Reducing
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
211 .I INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
212 leads to more indel calls. [40]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
213 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
214 .BI -P \ STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
215 Comma dilimited list of platforms (determined by
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
216 .BR @RG-PL )
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
217 from which indel candidates are obtained. It is recommended to collect
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
218 indel candidates from sequencing technologies that have low indel error
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
219 rate such as ILLUMINA. [all]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
220 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
221 .BI -q \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
222 Minimum mapping quality for an alignment to be used [0]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
223 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
224 .BI -Q \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
225 Minimum base quality for a base to be considered [13]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
226 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
227 .BI -r \ STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
228 Only generate pileup in region
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
229 .I STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
230 [all sites]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
231 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
232 .B -S
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
233 Output per-sample Phred-scaled strand bias P-value
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
234 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
235 .B -u
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
236 Similar to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
237 .B -g
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
238 except that the output is uncompressed BCF, which is preferred for piping.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
239 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
240
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
241 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
242 .B reheader
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
243 samtools reheader <in.header.sam> <in.bam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
244
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
245 Replace the header in
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
246 .I in.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
247 with the header in
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
248 .I in.header.sam.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
249 This command is much faster than replacing the header with a
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
250 BAM->SAM->BAM conversion.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
251
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
252 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
253 .B cat
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
254 samtools cat [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
255
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
256 Concatenate BAMs. The sequence dictionary of each input BAM must be identical,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
257 although this command does not check this. This command uses a similar trick
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
258 to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
259 .B reheader
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
260 which enables fast BAM concatenation.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
261
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
262 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
263 .B sort
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
264 samtools sort [-no] [-m maxMem] <in.bam> <out.prefix>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
265
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
266 Sort alignments by leftmost coordinates. File
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
267 .I <out.prefix>.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
268 will be created. This command may also create temporary files
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
269 .I <out.prefix>.%d.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
270 when the whole alignment cannot be fitted into memory (controlled by
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
271 option -m).
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
272
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
273 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
274 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
275 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
276 .B -o
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
277 Output the final alignment to the standard output.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
278 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
279 .B -n
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
280 Sort by read names rather than by chromosomal coordinates
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
281 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
282 .BI -m \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
283 Approximately the maximum required memory. [500000000]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
284 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
285
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
286 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
287 .B merge
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
288 samtools merge [-nur1f] [-h inh.sam] [-R reg] <out.bam> <in1.bam> <in2.bam> [...]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
289
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
290 Merge multiple sorted alignments.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
291 The header reference lists of all the input BAM files, and the @SQ headers of
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
292 .IR inh.sam ,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
293 if any, must all refer to the same set of reference sequences.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
294 The header reference list and (unless overridden by
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
295 .BR -h )
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
296 `@' headers of
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
297 .I in1.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
298 will be copied to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
299 .IR out.bam ,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
300 and the headers of other files will be ignored.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
301
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
302 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
303 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
304 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
305 .B -1
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
306 Use zlib compression level 1 to comrpess the output
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
307 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
308 .B -f
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
309 Force to overwrite the output file if present.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
310 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
311 .BI -h \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
312 Use the lines of
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
313 .I FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
314 as `@' headers to be copied to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
315 .IR out.bam ,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
316 replacing any header lines that would otherwise be copied from
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
317 .IR in1.bam .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
318 .RI ( FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
319 is actually in SAM format, though any alignment records it may contain
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
320 are ignored.)
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
321 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
322 .B -n
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
323 The input alignments are sorted by read names rather than by chromosomal
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
324 coordinates
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
325 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
326 .BI -R \ STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
327 Merge files in the specified region indicated by
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
328 .I STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
329 [null]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
330 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
331 .B -r
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
332 Attach an RG tag to each alignment. The tag value is inferred from file names.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
333 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
334 .B -u
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
335 Uncompressed BAM output
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
336 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
337
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
338 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
339 .B index
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
340 samtools index <aln.bam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
341
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
342 Index sorted alignment for fast random access. Index file
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
343 .I <aln.bam>.bai
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
344 will be created.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
345
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
346 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
347 .B idxstats
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
348 samtools idxstats <aln.bam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
349
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
350 Retrieve and print stats in the index file. The output is TAB delimited
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
351 with each line consisting of reference sequence name, sequence length, #
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
352 mapped reads and # unmapped reads.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
353
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
354 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
355 .B faidx
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
356 samtools faidx <ref.fasta> [region1 [...]]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
357
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
358 Index reference sequence in the FASTA format or extract subsequence from
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
359 indexed reference sequence. If no region is specified,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
360 .B faidx
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
361 will index the file and create
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
362 .I <ref.fasta>.fai
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
363 on the disk. If regions are speficified, the subsequences will be
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
364 retrieved and printed to stdout in the FASTA format. The input file can
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
365 be compressed in the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
366 .B RAZF
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
367 format.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
368
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
369 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
370 .B fixmate
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
371 samtools fixmate <in.nameSrt.bam> <out.bam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
372
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
373 Fill in mate coordinates, ISIZE and mate related flags from a
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
374 name-sorted alignment.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
375
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
376 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
377 .B rmdup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
378 samtools rmdup [-sS] <input.srt.bam> <out.bam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
379
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
380 Remove potential PCR duplicates: if multiple read pairs have identical
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
381 external coordinates, only retain the pair with highest mapping quality.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
382 In the paired-end mode, this command
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
383 .B ONLY
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
384 works with FR orientation and requires ISIZE is correctly set. It does
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
385 not work for unpaired reads (e.g. two ends mapped to different
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
386 chromosomes or orphan reads).
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
387
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
388 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
389 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
390 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
391 .B -s
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
392 Remove duplicate for single-end reads. By default, the command works for
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
393 paired-end reads only.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
394 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
395 .B -S
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
396 Treat paired-end reads and single-end reads.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
397 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
398
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
399 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
400 .B calmd
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
401 samtools calmd [-EeubSr] [-C capQcoef] <aln.bam> <ref.fasta>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
402
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
403 Generate the MD tag. If the MD tag is already present, this command will
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
404 give a warning if the MD tag generated is different from the existing
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
405 tag. Output SAM by default.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
406
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
407 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
408 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
409 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
410 .B -A
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
411 When used jointly with
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
412 .B -r
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
413 this option overwrites the original base quality.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
414 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
415 .B -e
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
416 Convert a the read base to = if it is identical to the aligned reference
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
417 base. Indel caller does not support the = bases at the moment.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
418 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
419 .B -u
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
420 Output uncompressed BAM
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
421 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
422 .B -b
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
423 Output compressed BAM
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
424 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
425 .B -S
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
426 The input is SAM with header lines
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
427 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
428 .BI -C \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
429 Coefficient to cap mapping quality of poorly mapped reads. See the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
430 .B pileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
431 command for details. [0]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
432 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
433 .B -r
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
434 Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
435 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
436 .B -E
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
437 Extended BAQ calculation. This option trades specificity for sensitivity, though the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
438 effect is minor.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
439 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
440
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
441 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
442 .B targetcut
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
443 samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] <in.bam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
444
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
445 This command identifies target regions by examining the continuity of read depth, computes
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
446 haploid consensus sequences of targets and outputs a SAM with each sequence corresponding
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
447 to a target. When option
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
448 .B -f
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
449 is in use, BAQ will be applied. This command is
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
450 .B only
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
451 designed for cutting fosmid clones from fosmid pool sequencing [Ref. Kitzman et al. (2010)].
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
452 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
453
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
454 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
455 .B phase
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
456 samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
457
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
458 Call and phase heterozygous SNPs.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
459 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
460 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
461 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
462 .B -A
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
463 Drop reads with ambiguous phase.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
464 .TP 8
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
465 .BI -b \ STR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
466 Prefix of BAM output. When this option is in use, phase-0 reads will be saved in file
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
467 .BR STR .0.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
468 and phase-1 reads in
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
469 .BR STR .1.bam.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
470 Phase unknown reads will be randomly allocated to one of the two files. Chimeric reads
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
471 with switch errors will be saved in
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
472 .BR STR .chimeric.bam.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
473 [null]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
474 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
475 .B -F
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
476 Do not attempt to fix chimeric reads.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
477 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
478 .BI -k \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
479 Maximum length for local phasing. [13]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
480 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
481 .BI -q \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
482 Minimum Phred-scaled LOD to call a heterozygote. [40]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
483 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
484 .BI -Q \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
485 Minimum base quality to be used in het calling. [13]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
486 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
487
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
488 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
489 .B pileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
490 samtools pileup [-2sSBicv] [-f in.ref.fasta] [-t in.ref_list] [-l
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
491 in.site_list] [-C capMapQ] [-M maxMapQ] [-T theta] [-N nHap] [-r
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
492 pairDiffRate] [-m mask] [-d maxIndelDepth] [-G indelPrior]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
493 <in.bam>|<in.sam>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
494
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
495 Print the alignment in the pileup format. In the pileup format, each
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
496 line represents a genomic position, consisting of chromosome name,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
497 coordinate, reference base, read bases, read qualities and alignment
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
498 mapping qualities. Information on match, mismatch, indel, strand,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
499 mapping quality and start and end of a read are all encoded at the read
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
500 base column. At this column, a dot stands for a match to the reference
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
501 base on the forward strand, a comma for a match on the reverse strand,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
502 a '>' or '<' for a reference skip, `ACGTN' for a mismatch on the forward
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
503 strand and `acgtn' for a mismatch on the reverse strand. A pattern
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
504 `\\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between this
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
505 reference position and the next reference position. The length of the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
506 insertion is given by the integer in the pattern, followed by the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
507 inserted sequence. Similarly, a pattern `-[0-9]+[ACGTNacgtn]+'
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
508 represents a deletion from the reference. The deleted bases will be
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
509 presented as `*' in the following lines. Also at the read base column, a
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
510 symbol `^' marks the start of a read. The ASCII of the character
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
511 following `^' minus 33 gives the mapping quality. A symbol `$' marks the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
512 end of a read segment.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
513
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
514 If option
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
515 .B -c
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
516 is applied, the consensus base, Phred-scaled consensus quality, SNP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
517 quality (i.e. the Phred-scaled probability of the consensus being
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
518 identical to the reference) and root mean square (RMS) mapping quality
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
519 of the reads covering the site will be inserted between the `reference
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
520 base' and the `read bases' columns. An indel occupies an additional
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
521 line. Each indel line consists of chromosome name, coordinate, a star,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
522 the genotype, consensus quality, SNP quality, RMS mapping quality, #
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
523 covering reads, the first alllele, the second allele, # reads supporting
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
524 the first allele, # reads supporting the second allele and # reads
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
525 containing indels different from the top two alleles.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
526
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
527 .B NOTE:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
528 Since 0.1.10, the `pileup' command is deprecated by `mpileup'.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
529
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
530 .B OPTIONS:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
531 .RS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
532 .TP 10
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
533 .B -B
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
534 Disable the BAQ computation. See the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
535 .B mpileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
536 command for details.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
537 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
538 .B -c
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
539 Call the consensus sequence. Options
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
540 .BR -T ", " -N ", " -I " and " -r
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
541 are only effective when
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
542 .BR -c " or " -g
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
543 is in use.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
544 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
545 .BI -C \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
546 Coefficient for downgrading the mapping quality of poorly mapped
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
547 reads. See the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
548 .B mpileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
549 command for details. [0]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
550 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
551 .BI -d \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
552 Use the first
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
553 .I NUM
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
554 reads in the pileup for indel calling for speed up. Zero for unlimited. [1024]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
555 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
556 .BI -f \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
557 The reference sequence in the FASTA format. Index file
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
558 .I FILE.fai
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
559 will be created if
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
560 absent.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
561 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
562 .B -g
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
563 Generate genotype likelihood in the binary GLFv3 format. This option
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
564 suppresses -c, -i and -s. This option is deprecated by the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
565 .B mpileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
566 command.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
567 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
568 .B -i
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
569 Only output pileup lines containing indels.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
570 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
571 .BI -I \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
572 Phred probability of an indel in sequencing/prep. [40]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
573 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
574 .BI -l \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
575 List of sites at which pileup is output. This file is space
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
576 delimited. The first two columns are required to be chromosome and
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
577 1-based coordinate. Additional columns are ignored. It is
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
578 recommended to use option
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
579 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
580 .BI -m \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
581 Filter reads with flag containing bits in
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
582 .I INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
583 [1796]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
584 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
585 .BI -M \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
586 Cap mapping quality at INT [60]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
587 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
588 .BI -N \ INT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
589 Number of haplotypes in the sample (>=2) [2]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
590 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
591 .BI -r \ FLOAT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
592 Expected fraction of differences between a pair of haplotypes [0.001]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
593 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
594 .B -s
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
595 Print the mapping quality as the last column. This option makes the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
596 output easier to parse, although this format is not space efficient.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
597 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
598 .B -S
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
599 The input file is in SAM.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
600 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
601 .BI -t \ FILE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
602 List of reference names ane sequence lengths, in the format described
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
603 for the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
604 .B import
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
605 command. If this option is present, samtools assumes the input
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
606 .I <in.alignment>
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
607 is in SAM format; otherwise it assumes in BAM format.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
608 .B -s
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
609 together with
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
610 .B -l
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
611 as in the default format we may not know the mapping quality.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
612 .TP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
613 .BI -T \ FLOAT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
614 The theta parameter (error dependency coefficient) in the maq consensus
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
615 calling model [0.85]
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
616 .RE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
617
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
618 .SH SAM FORMAT
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
619
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
620 SAM is TAB-delimited. Apart from the header lines, which are started
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
621 with the `@' symbol, each alignment line consists of:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
622
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
623 .TS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
624 center box;
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
625 cb | cb | cb
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
626 n | l | l .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
627 Col Field Description
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
628 _
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
629 1 QNAME Query (pair) NAME
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
630 2 FLAG bitwise FLAG
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
631 3 RNAME Reference sequence NAME
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
632 4 POS 1-based leftmost POSition/coordinate of clipped sequence
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
633 5 MAPQ MAPping Quality (Phred-scaled)
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
634 6 CIAGR extended CIGAR string
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
635 7 MRNM Mate Reference sequence NaMe (`=' if same as RNAME)
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
636 8 MPOS 1-based Mate POSistion
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
637 9 ISIZE Inferred insert SIZE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
638 10 SEQ query SEQuence on the same strand as the reference
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
639 11 QUAL query QUALity (ASCII-33 gives the Phred base quality)
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
640 12 OPT variable OPTional fields in the format TAG:VTYPE:VALUE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
641 .TE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
642
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
643 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
644 Each bit in the FLAG field is defined as:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
645
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
646 .TS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
647 center box;
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
648 cb | cb | cb
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
649 l | c | l .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
650 Flag Chr Description
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
651 _
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
652 0x0001 p the read is paired in sequencing
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
653 0x0002 P the read is mapped in a proper pair
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
654 0x0004 u the query sequence itself is unmapped
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
655 0x0008 U the mate is unmapped
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
656 0x0010 r strand of the query (1 for reverse)
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
657 0x0020 R strand of the mate
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
658 0x0040 1 the read is the first read in a pair
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
659 0x0080 2 the read is the second read in a pair
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
660 0x0100 s the alignment is not primary
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
661 0x0200 f the read fails platform/vendor quality checks
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
662 0x0400 d the read is either a PCR or an optical duplicate
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
663 .TE
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
664
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
665 .SH EXAMPLES
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
666 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
667 Import SAM to BAM when
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
668 .B @SQ
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
669 lines are present in the header:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
670
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
671 samtools view -bS aln.sam > aln.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
672
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
673 If
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
674 .B @SQ
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
675 lines are absent:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
676
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
677 samtools faidx ref.fa
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
678 samtools view -bt ref.fa.fai aln.sam > aln.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
679
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
680 where
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
681 .I ref.fa.fai
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
682 is generated automatically by the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
683 .B faidx
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
684 command.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
685
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
686 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
687 Attach the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
688 .B RG
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
689 tag while merging sorted alignments:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
690
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
691 perl -e 'print "@RG\\tID:ga\\tSM:hs\\tLB:ga\\tPL:Illumina\\n@RG\\tID:454\\tSM:hs\\tLB:454\\tPL:454\\n"' > rg.txt
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
692 samtools merge -rh rg.txt merged.bam ga.bam 454.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
693
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
694 The value in a
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
695 .B RG
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
696 tag is determined by the file name the read is coming from. In this
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
697 example, in the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
698 .IR merged.bam ,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
699 reads from
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
700 .I ga.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
701 will be attached
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
702 .IR RG:Z:ga ,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
703 while reads from
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
704 .I 454.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
705 will be attached
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
706 .IR RG:Z:454 .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
707
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
708 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
709 Call SNPs and short indels for one diploid individual:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
710
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
711 samtools mpileup -ugf ref.fa aln.bam | bcftools view -bvcg - > var.raw.bcf
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
712 bcftools view var.raw.bcf | vcfutils.pl varFilter -D 100 > var.flt.vcf
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
713
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
714 The
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
715 .B -D
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
716 option of varFilter controls the maximum read depth, which should be
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
717 adjusted to about twice the average read depth. One may consider to add
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
718 .B -C50
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
719 to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
720 .B mpileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
721 if mapping quality is overestimated for reads containing excessive
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
722 mismatches. Applying this option usually helps
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
723 .B BWA-short
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
724 but may not other mappers.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
725
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
726 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
727 Generate the consensus sequence for one diploid individual:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
728
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
729 samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
730
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
731 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
732 Phase one individual:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
733
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
734 samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
735
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
736 The
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
737 .B calmd
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
738 command is used to reduce false heterozygotes around INDELs.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
739
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
740 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
741 Call SNPs and short indels for multiple diploid individuals:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
742
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
743 samtools mpileup -P ILLUMINA -ugf ref.fa *.bam | bcftools view -bcvg - > var.raw.bcf
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
744 bcftools view var.raw.bcf | vcfutils.pl varFilter -D 2000 > var.flt.vcf
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
745
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
746 Individuals are identified from the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
747 .B SM
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
748 tags in the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
749 .B @RG
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
750 header lines. Individuals can be pooled in one alignment file; one
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
751 individual can also be separated into multiple files. The
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
752 .B -P
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
753 option specifies that indel candidates should be collected only from
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
754 read groups with the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
755 .B @RG-PL
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
756 tag set to
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
757 .IR ILLUMINA .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
758 Collecting indel candidates from reads sequenced by an indel-prone
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
759 technology may affect the performance of indel calling.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
760
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
761 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
762 Derive the allele frequency spectrum (AFS) on a list of sites from multiple individuals:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
763
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
764 samtools mpileup -Igf ref.fa *.bam > all.bcf
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
765 bcftools view -bl sites.list all.bcf > sites.bcf
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
766 bcftools view -cGP cond2 sites.bcf > /dev/null 2> sites.1.afs
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
767 bcftools view -cGP sites.1.afs sites.bcf > /dev/null 2> sites.2.afs
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
768 bcftools view -cGP sites.2.afs sites.bcf > /dev/null 2> sites.3.afs
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
769 ......
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
770
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
771 where
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
772 .I sites.list
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
773 contains the list of sites with each line consisting of the reference
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
774 sequence name and position. The following
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
775 .B bcftools
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
776 commands estimate AFS by EM.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
777
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
778 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
779 Dump BAQ applied alignment for other SNP callers:
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
780
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
781 samtools calmd -bAr aln.bam > aln.baq.bam
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
782
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
783 It adds and corrects the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
784 .B NM
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
785 and
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
786 .B MD
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
787 tags at the same time. The
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
788 .B calmd
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
789 command also comes with the
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
790 .B -C
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
791 option, the same as the one in
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
792 .B pileup
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
793 and
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
794 .BR mpileup .
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
795 Apply if it helps.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
796
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
797 .SH LIMITATIONS
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
798 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
799 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
800 Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
801 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
802 In merging, the input files are required to have the same number of
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
803 reference sequences. The requirement can be relaxed. In addition,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
804 merging does not reconstruct the header dictionaries
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
805 automatically. Endusers have to provide the correct header. Picard is
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
806 better at merging.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
807 .IP o 2
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
808 Samtools paired-end rmdup does not work for unpaired reads (e.g. orphan
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
809 reads or ends mapped to different chromosomes). If this is a concern,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
810 please use Picard's MarkDuplicate which correctly handles these cases,
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
811 although a little slower.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
812
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
813 .SH AUTHOR
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
814 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
815 Heng Li from the Sanger Institute wrote the C version of samtools. Bob
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
816 Handsaker from the Broad Institute implemented the BGZF library and Jue
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
817 Ruan from Beijing Genomics Institute wrote the RAZF library. John
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
818 Marshall and Petr Danecek contribute to the source code and various
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
819 people from the 1000 Genomes Project have contributed to the SAM format
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
820 specification.
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
821
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
822 .SH SEE ALSO
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
823 .PP
acc2ca1a3ba4 Uploaded
siyuan
parents:
diff changeset
824 Samtools website: <http://samtools.sourceforge.net>