Mercurial > repos > youngkim > ezbamqc
comparison ezBAMQC/src/htslib/tabix.1 @ 0:dfa3745e5fd8
Uploaded
author | youngkim |
---|---|
date | Thu, 24 Mar 2016 17:12:52 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:dfa3745e5fd8 |
---|---|
1 .TH tabix 1 "3 February 2015" "htslib-1.2.1" "Bioinformatics tools" | |
2 .SH NAME | |
3 .PP | |
4 bgzip \- Block compression/decompression utility | |
5 .PP | |
6 tabix \- Generic indexer for TAB-delimited genome position files | |
7 .\" | |
8 .\" Copyright (C) 2009-2011 Broad Institute. | |
9 .\" | |
10 .\" Author: Heng Li <lh3@sanger.ac.uk> | |
11 .\" | |
12 .\" Permission is hereby granted, free of charge, to any person obtaining a | |
13 .\" copy of this software and associated documentation files (the "Software"), | |
14 .\" to deal in the Software without restriction, including without limitation | |
15 .\" the rights to use, copy, modify, merge, publish, distribute, sublicense, | |
16 .\" and/or sell copies of the Software, and to permit persons to whom the | |
17 .\" Software is furnished to do so, subject to the following conditions: | |
18 .\" | |
19 .\" The above copyright notice and this permission notice shall be included in | |
20 .\" all copies or substantial portions of the Software. | |
21 .\" | |
22 .\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
23 .\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
24 .\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL | |
25 .\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
26 .\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING | |
27 .\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER | |
28 .\" DEALINGS IN THE SOFTWARE. | |
29 .\" | |
30 .SH SYNOPSIS | |
31 .PP | |
32 .B bgzip | |
33 .RB [ -cdhB ] | |
34 .RB [ -b | |
35 .IR virtualOffset ] | |
36 .RB [ -s | |
37 .IR size ] | |
38 .RI [ file ] | |
39 .PP | |
40 .B tabix | |
41 .RB [ -0lf ] | |
42 .RB [ -p | |
43 gff|bed|sam|vcf] | |
44 .RB [ -s | |
45 .IR seqCol ] | |
46 .RB [ -b | |
47 .IR begCol ] | |
48 .RB [ -e | |
49 .IR endCol ] | |
50 .RB [ -S | |
51 .IR lineSkip ] | |
52 .RB [ -c | |
53 .IR metaChar ] | |
54 .I in.tab.bgz | |
55 .RI [ "region1 " [ "region2 " [ ... "]]]" | |
56 | |
57 .SH DESCRIPTION | |
58 .PP | |
59 Tabix indexes a TAB-delimited genome position file | |
60 .I in.tab.bgz | |
61 and creates an index file ( | |
62 .I in.tab.bgz.tbi | |
63 or | |
64 .I in.tab.bgz.csi | |
65 ) when | |
66 .I region | |
67 is absent from the command-line. The input data file must be position | |
68 sorted and compressed by | |
69 .B bgzip | |
70 which has a | |
71 .BR gzip (1) | |
72 like interface. After indexing, tabix is able to quickly retrieve data | |
73 lines overlapping | |
74 .I regions | |
75 specified in the format "chr:beginPos-endPos". Fast data retrieval also | |
76 works over network if URI is given as a file name and in this case the | |
77 index file will be downloaded if it is not present locally. | |
78 | |
79 .SH INDEXING OPTIONS | |
80 .TP 10 | |
81 .B -0, --zero-based | |
82 Specify that the position in the data file is 0-based (e.g. UCSC files) | |
83 rather than 1-based. | |
84 .TP | |
85 .BI "-b, --begin " INT | |
86 Column of start chromosomal position. [4] | |
87 .TP | |
88 .BI "-c, --comment " CHAR | |
89 Skip lines started with character CHAR. [#] | |
90 .TP | |
91 .BI "-C, --csi" | |
92 Skip lines started with character CHAR. [#] | |
93 .TP | |
94 .BI "-e, --end " INT | |
95 Column of end chromosomal position. The end column can be the same as the | |
96 start column. [5] | |
97 .TP | |
98 .B "-f, --force " | |
99 Force to overwrite the index file if it is present. | |
100 .TP | |
101 .BI "-m, --min-shift" INT | |
102 set minimal interval size for CSI indices to 2^INT [14] | |
103 .TP | |
104 .BI "-p, --preset " STR | |
105 Input format for indexing. Valid values are: gff, bed, sam, vcf. | |
106 This option should not be applied together with any of | |
107 .BR -s ", " -b ", " -e ", " -c " and " -0 ; | |
108 it is not used for data retrieval because this setting is stored in | |
109 the index file. [gff] | |
110 .TP | |
111 .BI "-s, --sequence " INT | |
112 Column of sequence name. Option | |
113 .BR -s ", " -b ", " -e ", " -S ", " -c " and " -0 | |
114 are all stored in the index file and thus not used in data retrieval. [1] | |
115 .TP | |
116 .BI "-S, --skip-lines " INT | |
117 Skip first INT lines in the data file. [0] | |
118 | |
119 .SH QUERYING AND OTHER OPTIONS | |
120 .TP | |
121 .B "-h, --print-header " | |
122 Print also the header/meta lines. | |
123 .TP | |
124 .B "-H, --only-header " | |
125 Print only the header/meta lines. | |
126 .TP | |
127 .B "-i, --file-info " | |
128 Print file format info. | |
129 .TP | |
130 .B "-l, --list-chroms " | |
131 List the sequence names stored in the index file. | |
132 .TP | |
133 .B "-r, --reheader " FILE | |
134 Replace the header with the content of FILE | |
135 .TP | |
136 .B "-R, --regions " FILE | |
137 Restrict to regions listed in the FILE. The FILE can be BED file (requires .bed, .bed.gz, .bed.bgz | |
138 file name extension) or a TAB-delimited file with CHROM, POS, and, optionally, | |
139 POS_TO columns, where positions are 1-based and inclusive. When this option is in use, the input | |
140 file may not be sorted. | |
141 regions. | |
142 .TP | |
143 .B "-T, --targets" FILE | |
144 Similar to | |
145 .B -R | |
146 but the entire input will be read sequentially and regions not listed in FILE will be skipped. | |
147 .PP | |
148 .SH EXAMPLE | |
149 (grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz; | |
150 | |
151 tabix -p gff sorted.gff.gz; | |
152 | |
153 tabix sorted.gff.gz chr1:10,000,000-20,000,000; | |
154 | |
155 .SH NOTES | |
156 It is straightforward to achieve overlap queries using the standard | |
157 B-tree index (with or without binning) implemented in all SQL databases, | |
158 or the R-tree index in PostgreSQL and Oracle. But there are still many | |
159 reasons to use tabix. Firstly, tabix directly works with a lot of widely | |
160 used TAB-delimited formats such as GFF/GTF and BED. We do not need to | |
161 design database schema or specialized binary formats. Data do not need | |
162 to be duplicated in different formats, either. Secondly, tabix works on | |
163 compressed data files while most SQL databases do not. The GenCode | |
164 annotation GTF can be compressed down to 4%. Thirdly, tabix is | |
165 fast. The same indexing algorithm is known to work efficiently for an | |
166 alignment with a few billion short reads. SQL databases probably cannot | |
167 easily handle data at this scale. Last but not the least, tabix supports | |
168 remote data retrieval. One can put the data file and the index at an FTP | |
169 or HTTP server, and other users or even web services will be able to get | |
170 a slice without downloading the entire file. | |
171 | |
172 .SH AUTHOR | |
173 .PP | |
174 Tabix was written by Heng Li. The BGZF library was originally | |
175 implemented by Bob Handsaker and modified by Heng Li for remote file | |
176 access and in-memory caching. | |
177 | |
178 .SH SEE ALSO | |
179 .PP | |
180 .BR samtools (1) |