annotate ezBAMQC/src/htslib/tabix.1 @ 0:dfa3745e5fd8

Uploaded
author youngkim
date Thu, 24 Mar 2016 17:12:52 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
1 .TH tabix 1 "3 February 2015" "htslib-1.2.1" "Bioinformatics tools"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
2 .SH NAME
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
3 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
4 bgzip \- Block compression/decompression utility
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
5 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
6 tabix \- Generic indexer for TAB-delimited genome position files
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
7 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
8 .\" Copyright (C) 2009-2011 Broad Institute.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
9 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
10 .\" Author: Heng Li <lh3@sanger.ac.uk>
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
11 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
12 .\" Permission is hereby granted, free of charge, to any person obtaining a
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
13 .\" copy of this software and associated documentation files (the "Software"),
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
14 .\" to deal in the Software without restriction, including without limitation
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
15 .\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
16 .\" and/or sell copies of the Software, and to permit persons to whom the
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
17 .\" Software is furnished to do so, subject to the following conditions:
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
18 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
19 .\" The above copyright notice and this permission notice shall be included in
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
20 .\" all copies or substantial portions of the Software.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
21 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
22 .\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
23 .\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
24 .\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
25 .\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
26 .\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
27 .\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
28 .\" DEALINGS IN THE SOFTWARE.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
29 .\"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
30 .SH SYNOPSIS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
31 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
32 .B bgzip
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
33 .RB [ -cdhB ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
34 .RB [ -b
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
35 .IR virtualOffset ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
36 .RB [ -s
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
37 .IR size ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
38 .RI [ file ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
39 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
40 .B tabix
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
41 .RB [ -0lf ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
42 .RB [ -p
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
43 gff|bed|sam|vcf]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
44 .RB [ -s
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
45 .IR seqCol ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
46 .RB [ -b
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
47 .IR begCol ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
48 .RB [ -e
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
49 .IR endCol ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
50 .RB [ -S
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
51 .IR lineSkip ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
52 .RB [ -c
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
53 .IR metaChar ]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
54 .I in.tab.bgz
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
55 .RI [ "region1 " [ "region2 " [ ... "]]]"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
56
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
57 .SH DESCRIPTION
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
58 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
59 Tabix indexes a TAB-delimited genome position file
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
60 .I in.tab.bgz
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
61 and creates an index file (
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
62 .I in.tab.bgz.tbi
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
63 or
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
64 .I in.tab.bgz.csi
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
65 ) when
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
66 .I region
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
67 is absent from the command-line. The input data file must be position
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
68 sorted and compressed by
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
69 .B bgzip
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
70 which has a
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
71 .BR gzip (1)
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
72 like interface. After indexing, tabix is able to quickly retrieve data
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
73 lines overlapping
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
74 .I regions
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
75 specified in the format "chr:beginPos-endPos". Fast data retrieval also
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
76 works over network if URI is given as a file name and in this case the
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
77 index file will be downloaded if it is not present locally.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
78
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
79 .SH INDEXING OPTIONS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
80 .TP 10
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
81 .B -0, --zero-based
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
82 Specify that the position in the data file is 0-based (e.g. UCSC files)
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
83 rather than 1-based.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
84 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
85 .BI "-b, --begin " INT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
86 Column of start chromosomal position. [4]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
87 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
88 .BI "-c, --comment " CHAR
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
89 Skip lines started with character CHAR. [#]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
90 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
91 .BI "-C, --csi"
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
92 Skip lines started with character CHAR. [#]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
93 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
94 .BI "-e, --end " INT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
95 Column of end chromosomal position. The end column can be the same as the
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
96 start column. [5]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
97 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
98 .B "-f, --force "
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
99 Force to overwrite the index file if it is present.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
100 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
101 .BI "-m, --min-shift" INT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
102 set minimal interval size for CSI indices to 2^INT [14]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
103 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
104 .BI "-p, --preset " STR
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
105 Input format for indexing. Valid values are: gff, bed, sam, vcf.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
106 This option should not be applied together with any of
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
107 .BR -s ", " -b ", " -e ", " -c " and " -0 ;
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
108 it is not used for data retrieval because this setting is stored in
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
109 the index file. [gff]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
110 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
111 .BI "-s, --sequence " INT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
112 Column of sequence name. Option
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
113 .BR -s ", " -b ", " -e ", " -S ", " -c " and " -0
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
114 are all stored in the index file and thus not used in data retrieval. [1]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
115 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
116 .BI "-S, --skip-lines " INT
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
117 Skip first INT lines in the data file. [0]
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
118
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
119 .SH QUERYING AND OTHER OPTIONS
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
120 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
121 .B "-h, --print-header "
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
122 Print also the header/meta lines.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
123 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
124 .B "-H, --only-header "
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
125 Print only the header/meta lines.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
126 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
127 .B "-i, --file-info "
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
128 Print file format info.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
129 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
130 .B "-l, --list-chroms "
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
131 List the sequence names stored in the index file.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
132 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
133 .B "-r, --reheader " FILE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
134 Replace the header with the content of FILE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
135 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
136 .B "-R, --regions " FILE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
137 Restrict to regions listed in the FILE. The FILE can be BED file (requires .bed, .bed.gz, .bed.bgz
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
138 file name extension) or a TAB-delimited file with CHROM, POS, and, optionally,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
139 POS_TO columns, where positions are 1-based and inclusive. When this option is in use, the input
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
140 file may not be sorted.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
141 regions.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
142 .TP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
143 .B "-T, --targets" FILE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
144 Similar to
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
145 .B -R
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
146 but the entire input will be read sequentially and regions not listed in FILE will be skipped.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
147 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
148 .SH EXAMPLE
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
149 (grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
150
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
151 tabix -p gff sorted.gff.gz;
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
152
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
153 tabix sorted.gff.gz chr1:10,000,000-20,000,000;
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
154
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
155 .SH NOTES
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
156 It is straightforward to achieve overlap queries using the standard
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
157 B-tree index (with or without binning) implemented in all SQL databases,
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
158 or the R-tree index in PostgreSQL and Oracle. But there are still many
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
159 reasons to use tabix. Firstly, tabix directly works with a lot of widely
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
160 used TAB-delimited formats such as GFF/GTF and BED. We do not need to
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
161 design database schema or specialized binary formats. Data do not need
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
162 to be duplicated in different formats, either. Secondly, tabix works on
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
163 compressed data files while most SQL databases do not. The GenCode
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
164 annotation GTF can be compressed down to 4%. Thirdly, tabix is
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
165 fast. The same indexing algorithm is known to work efficiently for an
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
166 alignment with a few billion short reads. SQL databases probably cannot
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
167 easily handle data at this scale. Last but not the least, tabix supports
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
168 remote data retrieval. One can put the data file and the index at an FTP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
169 or HTTP server, and other users or even web services will be able to get
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
170 a slice without downloading the entire file.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
171
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
172 .SH AUTHOR
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
173 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
174 Tabix was written by Heng Li. The BGZF library was originally
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
175 implemented by Bob Handsaker and modified by Heng Li for remote file
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
176 access and in-memory caching.
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
177
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
178 .SH SEE ALSO
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
179 .PP
dfa3745e5fd8 Uploaded
youngkim
parents:
diff changeset
180 .BR samtools (1)