nextgen_variant_identification: SNV/SNVMix2_source/SNVMix2-v0.12.1-rc1/samtools-0.1.6/samtools.txt comparison

comparison SNV/SNVMix2_source/SNVMix2-v0.12.1-rc1/samtools-0.1.6/samtools.txt @ 0:74f5ea818cea

Uploaded

author	ryanmorin
date	Wed, 12 Oct 2011 19:50:38 -0400
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:74f5ea818cea
+samtools(1)                  Bioinformatics tools                  samtools(1)
+NAME
+samtools - Utilities for the Sequence Alignment/Map (SAM) format
+SYNOPSIS
+samtools view -bt ref_list.txt -o aln.bam aln.sam.gz
+samtools sort aln.bam aln.sorted
+samtools index aln.sorted.bam
+samtools view aln.sorted.bam chr2:20,100,000-20,200,000
+samtools merge out.bam in1.bam in2.bam in3.bam
+samtools faidx ref.fasta
+samtools pileup -f ref.fasta aln.sorted.bam
+samtools tview aln.sorted.bam ref.fasta
+DESCRIPTION
+Samtools  is  a  set of utilities that manipulate alignments in the BAM
+format. It imports from and exports to the SAM (Sequence Alignment/Map)
+format,  does  sorting,  merging  and  indexing, and allows to retrieve
+reads in any regions swiftly.
+Samtools is designed to work on a stream. It regards an input file  `-'
+as  the  standard  input (stdin) and an output file `-' as the standard
+output (stdout). Several commands can thus be combined with Unix pipes.
+Samtools always output warning and error messages to the standard error
+output (stderr).
+Samtools is also able to open a BAM (not SAM) file on a remote  FTP  or
+HTTP  server  if  the  BAM file name starts with `ftp://' or `http://'.
+Samtools checks the current working directory for the  index  file  and
+will  download  the  index upon absence. Samtools does not retrieve the
+entire alignment file unless it is asked to do so.
+COMMANDS AND OPTIONS
+import    samtools import <in.ref_list> <in.sam> <out.bam>
+Since 0.1.4, this command is an alias of:
+samtools view -bt <in.ref_list> -o <out.bam> <in.sam>
+sort      samtools sort [-n] [-m maxMem] <in.bam> <out.prefix>
+Sort  alignments  by  leftmost  coordinates.  File  <out.pre-
+fix>.bam will be created. This command may also create tempo-
+rary files <out.prefix>.%d.bam when the whole alignment  can-
+not be fitted into memory (controlled by option -m).
+OPTIONS:
+-n      Sort by read names rather than by chromosomal coordi-
+nates
+-m INT  Approximately   the    maximum    required    memory.
+[500000000]
+merge     samtools   merge   [-h   inh.sam]  [-n]  <out.bam>  <in1.bam>
+<in2.bam> [...]
+Merge multiple sorted alignments.  The header reference lists
+of  all  the input BAM files, and the @SQ headers of inh.sam,
+if  any,  must  all  refer  to  the  same  set  of  reference
+sequences.   The header reference list and (unless overridden
+by -h) `@' headers of in1.bam will be copied to out.bam,  and
+the headers of other files will be ignored.
+OPTIONS:
+-h FILE Use  the lines of FILE as `@' headers to be copied to
+out.bam, replacing any header lines that would other-
+wise  be  copied  from in1.bam.  (FILE is actually in
+SAM format, though any alignment records it may  con-
+tain are ignored.)
+-n      The  input alignments are sorted by read names rather
+than by chromosomal coordinates
+index     samtools index <aln.bam>
+Index sorted alignment for fast  random  access.  Index  file
+<aln.bam>.bai will be created.
+view      samtools  view  [-bhuHS]  [-t  in.refList]  [-o  output]  [-f
+reqFlag] [-F skipFlag] [-q minMapQ] [-l  library]  [-r  read-
+Group] <in.bam>|<in.sam> [region1 [...]]
+Extract/print  all or sub alignments in SAM or BAM format. If
+no region is specified, all the alignments will  be  printed;
+otherwise  only  alignments overlapping the specified regions
+will be output. An alignment may be given multiple  times  if
+it is overlapping several regions. A region can be presented,
+for example, in the following format: `chr2',  `chr2:1000000'
+or `chr2:1,000,000-2,000,000'. The coordinate is 1-based.
+OPTIONS:
+-b      Output in the BAM format.
+-u      Output uncompressed BAM. This option saves time spent
+on compression/decomprssion  and  is  thus  preferred
+when the output is piped to another samtools command.
+-h      Include the header in the output.
+-H      Output the header only.
+-S      Input is in SAM. If @SQ header lines are absent,  the
+`-t' option is required.
+-t FILE This  file  is  TAB-delimited. Each line must contain
+the reference name and the length of  the  reference,
+one  line  for  each  distinct  reference; additional
+fields are ignored. This file also defines the  order
+of  the  reference  sequences  in sorting. If you run
+`samtools faidx <ref.fa>', the resultant  index  file
+<ref.fa>.fai  can be used as this <in.ref_list> file.
+-o FILE Output file [stdout]
+-f INT  Only output alignments with all bits in  INT  present
+in the FLAG field. INT can be in hex in the format of
+/^0x[0-9A-F]+/ [0]
+-F INT  Skip alignments with bits present in INT [0]
+-q INT  Skip alignments with MAPQ smaller than INT [0]
+-l STR  Only output reads in library STR [null]
+-r STR  Only output reads in read group STR [null]
+faidx     samtools faidx <ref.fasta> [region1 [...]]
+Index reference sequence in the FASTA format or extract  sub-
+sequence  from  indexed  reference  sequence. If no region is
+specified,   faidx   will   index   the   file   and   create
+<ref.fasta>.fai  on the disk. If regions are speficified, the
+subsequences will be retrieved and printed to stdout  in  the
+FASTA  format.  The  input file can be compressed in the RAZF
+format.
+pileup    samtools  pileup  [-f  in.ref.fasta]  [-t  in.ref_list]   [-l
+in.site_list]    [-iscgS2]   [-T   theta]   [-N   nHap]   [-r
+pairDiffRate] <in.bam>|<in.sam>
+Print the alignment in the pileup format. In the pileup  for-
+mat,  each  line represents a genomic position, consisting of
+chromosome name, coordinate, reference base, read bases, read
+qualities  and  alignment  mapping  qualities. Information on
+match, mismatch, indel, strand, mapping quality and start and
+end  of  a  read  are all encoded at the read base column. At
+this column, a dot stands for a match to the  reference  base
+on  the  forward  strand,  a comma for a match on the reverse
+strand, `ACGTN' for a mismatch  on  the  forward  strand  and
+`acgtn'  for  a  mismatch  on  the  reverse strand. A pattern
+`\+[0-9]+[ACGTNacgtn]+'  indicates  there  is  an   insertion
+between  this reference position and the next reference posi-
+tion. The length of the insertion is given by the integer  in
+the  pattern, followed by the inserted sequence. Similarly, a
+pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the
+reference.  The deleted bases will be presented as `*' in the
+following lines. Also at the read base column, a  symbol  `^'
+marks  the start of a read segment which is a contiguous sub-
+sequence on the read separated by `N/S/H'  CIGAR  operations.
+The  ASCII  of the character following `^' minus 33 gives the
+mapping quality. A symbol `$' marks the end of  a  read  seg-
+ment.
+If  option -c is applied, the consensus base, consensus qual-
+ity, SNP quality and RMS mapping quality of the reads  cover-
+ing  the  site  will be inserted between the `reference base'
+and the `read bases' columns. An indel occupies an additional
+line.  Each  indel  line consists of chromosome name, coordi-
+nate, a star, the genotype, consensus quality,  SNP  quality,
+RMS mapping quality, # covering reads, the first alllele, the
+second allele, # reads supporting the first allele,  #  reads
+supporting  the  second  allele and # reads containing indels
+different from the top two alleles.
+OPTIONS:
+-s        Print the mapping quality as the last column.  This
+option  makes  the output easier to parse, although
+this format is not space efficient.
+-S        The input file is in SAM.
+-i        Only output pileup lines containing indels.
+-f FILE   The reference sequence in the FASTA  format.  Index
+file FILE.fai will be created if absent.
+-M INT    Cap mapping quality at INT [60]
+-t FILE   List  of  reference  names ane sequence lengths, in
+the format described for  the  import  command.  If
+this  option is present, samtools assumes the input
+<in.alignment>  is  in  SAM  format;  otherwise  it
+assumes in BAM format.
+-l FILE   List  of sites at which pileup is output. This file
+is space  delimited.  The  first  two  columns  are
+required  to  be chromosome and 1-based coordinate.
+Additional columns are ignored. It  is  recommended
+to use option -s together with -l as in the default
+format we may not know the mapping quality.
+-c        Call the consensus  sequence  using  MAQ  consensus
+model. Options -T, -N, -I and -r are only effective
+when -c or -g is in use.
+-g        Generate genotype likelihood in  the  binary  GLFv3
+format. This option suppresses -c, -i and -s.
+-T FLOAT  The  theta parameter (error dependency coefficient)
+in the maq consensus calling model [0.85]
+-N INT    Number of haplotypes in the sample (>=2) [2]
+-r FLOAT  Expected fraction of differences between a pair  of
+haplotypes [0.001]
+-I INT    Phred  probability  of an indel in sequencing/prep.
+[40]
+tview     samtools tview <in.sorted.bam> [ref.fasta]
+Text alignment viewer (based on the ncurses library). In  the
+viewer,  press `?' for help and press `g' to check the align-
+ment   start   from   a   region   in   the    format    like
+`chr10:10,000,000'.
+fixmate   samtools fixmate <in.nameSrt.bam> <out.bam>
+Fill in mate coordinates, ISIZE and mate related flags from a
+name-sorted alignment.
+rmdup     samtools rmdup <input.srt.bam> <out.bam>
+Remove potential PCR duplicates: if multiple read pairs  have
+identical  external  coordinates,  only  retain the pair with
+highest mapping quality.  This command  ONLY  works  with  FR
+orientation and requires ISIZE is correctly set.
+rmdupse   samtools rmdupse <input.srt.bam> <out.bam>
+Remove potential duplicates for single-ended reads. This com-
+mand will treat all reads as single-ended even  if  they  are
+paired in fact.
+fillmd    samtools fillmd [-e] <aln.bam> <ref.fasta>
+Generate  the  MD tag. If the MD tag is already present, this
+command will give a warning if the MD tag generated  is  dif-
+ferent from the existing tag.
+OPTIONS:
+-e      Convert  a  the  read base to = if it is identical to
+the aligned reference base.  Indel  caller  does  not
+support the = bases at the moment.
+SAM FORMAT
+SAM  is  TAB-delimited.  Apart from the header lines, which are started
+with the `@' symbol, each alignment line consists of:
++----+-------+----------------------------------------------------------+
+|Col | Field |                       Description                        |
++----+-------+----------------------------------------------------------+
+| 1  | QNAME | Query (pair) NAME                                        |
+| 2  | FLAG  | bitwise FLAG                                             |
+| 3  | RNAME | Reference sequence NAME                                  |
+| 4  | POS   | 1-based leftmost POSition/coordinate of clipped sequence |
+| 5  | MAPQ  | MAPping Quality (Phred-scaled)                           |
+| 6  | CIAGR | extended CIGAR string                                    |
+| 7  | MRNM  | Mate Reference sequence NaMe (`=' if same as RNAME)      |
+| 8  | MPOS  | 1-based Mate POSistion                                   |
+| 9  | ISIZE | Inferred insert SIZE                                     |
+|10  | SEQ   | query SEQuence on the same strand as the reference       |
+|11  | QUAL  | query QUALity (ASCII-33 gives the Phred base quality)    |
+|12  | OPT   | variable OPTional fields in the format TAG:VTYPE:VALUE   |
++----+-------+----------------------------------------------------------+
+Each bit in the FLAG field is defined as:
++-------+--------------------------------------------------+
+| Flag  |                   Description                    |
++-------+--------------------------------------------------+
+|0x0001 | the read is paired in sequencing                 |
+|0x0002 | the read is mapped in a proper pair              |
+|0x0004 | the query sequence itself is unmapped            |
+|0x0008 | the mate is unmapped                             |
+|0x0010 | strand of the query (1 for reverse)              |
+|0x0020 | strand of the mate                               |
+|0x0040 | the read is the first read in a pair             |
+|0x0080 | the read is the second read in a pair            |
+|0x0100 | the alignment is not primary                     |
+|0x0200 | the read fails platform/vendor quality checks    |
+|0x0400 | the read is either a PCR or an optical duplicate |
++-------+--------------------------------------------------+
+LIMITATIONS
+o Unaligned  words  used  in  bam_import.c,  bam_endian.h,  bam.c   and
+bam_aux.c.
+o CIGAR operation P is not properly handled at the moment.
+o In  merging,  the input files are required to have the same number of
+reference sequences. The requirement can  be  relaxed.  In  addition,
+merging  does  not reconstruct the header dictionaries automatically.
+Endusers have to provide the correct  header.  Picard  is  better  at
+merging.
+o Samtools' rmdup does not work for single-end data and does not remove
+duplicates across chromosomes. Picard is better.
+AUTHOR
+Heng Li from the Sanger Institute wrote the C version of samtools.  Bob
+Handsaker from the Broad Institute implemented the BGZF library and Jue
+Ruan from Beijing Genomics Institute wrote the  RAZF  library.  Various
+people  in the 1000Genomes Project contributed to the SAM format speci-
+fication.
+SEE ALSO
+Samtools website: <http://samtools.sourceforge.net>
+samtools-0.1.6                 2 September 2009                    samtools(1)

Mercurial > repos > ryanmorin > nextgen_variant_identification

comparison SNV/SNVMix2_source/SNVMix2-v0.12.1-rc1/samtools-0.1.6/samtools.txt @ 0:74f5ea818cea