reformat, filter, or subsamplemacros.xml 0
-q $cond_filter.quality
#end if
#if str( $cond_filter.library ) != ''
-l '$cond_filter.library'
#end if
#if str( $cond_filter.cigarcons ) != ''
-m $cond_filter.cigarcons
#end if
#if str($cond_filter.inclusive_filter) != 'None':
#set $filter = $cond_filter.inclusive_filter
@FLAGS@
-f $flags
#end if
#if str($cond_filter.exclusive_filter) != 'None':
#set $filter = $cond_filter.exclusive_filter
@FLAGS@
-F $flags
#end if
#if str($cond_filter.exclusive_filter_all) != 'None':
#set $filter = $cond_filter.exclusive_filter_all
@FLAGS@
-G $flags
#end if
#for $i, $s in enumerate($cond_filter.readtags)
-x '${s.readtag}'
#end for
#end if
#if $cond_subsample.select_subsample == 'yes':
#set fraction=str($cond_subsample.subsample).split('.')[1]
#if str($cond_subsample.seed) == '':
-s "\${RANDOM}".$fraction
#else
-s $cond_subsample.seed.$fraction
#end if
#end if
## output options
$adv_output.header
$adv_output.collapsecigar
#if $adv_output.outputpassing == 'yes'
-U inv_outfile
#end if
-o outfile
## additional reference data
#if $reffa!=None:
-T '$reffa'
-t '$reffai'
#else:
--output-fmt-option no_ref
#end if
infile
## region filter need to be at the end
#if $cond_filter.select_filter == 'yes' and $cond_filter.cond_region.select_region == 'text':
'$cond_filter.cond_region.regions'
#end if
## if data is converted from an unsorted file (SAM, CRAM, or unsorted BAM) to BAM
## then sort the output by coordinate,
#if not $input.is_of_type('bam') and $outtype == 'bam':
&& samtools sort
-@ \$addthreads -m \${GALAXY_MEMORY_MB:-768}M -T sorttemp
-O bam
-o 'tmpsam'
outfile
&& mv tmpsam outfile
#if $adv_output.outputpassing == 'yes':
&& samtools sort
-@ \$addthreads -m \${GALAXY_MEMORY_MB:-768}M -T sorttemp
-O bam
-o 'tmpsam'
inv_outfile
&& mv tmpsam inv_outfile
#end if
#end if
]]>Select output type. In case of counts only the total number of alignments is returned. All filters are taken into accountReference data as fasta(.gz). Required for SAM input without @SQ headers and useful/required for writing CRAM output (see help).outtype != 'count'adv_output['outputpassing'] == 'yes' and outtype != 'count'outtype == 'count'
**What it does**
Samtools view can:
1. filter alignments according to various criteria
2. convert between alignment formats (SAM, BAM, CRAM)
With no options or regions specified, prints all alignments in the specified input alignment file (in SAM, BAM, or CRAM format) to standard output in SAM format (with no header).
**Alignment format conversion**
Inputs of type SAM, BAM, and CRAM are accepted and can be converted to each of these formats (alternatively alignment counts can be computed) by selecting the appropriate "Output type".
.. class:: infomark
samtools view allows to specify a reference sequence. This is required for SAM input with missing @SQ headers (which include sequence names, length, md5, etc) and useful (and sometimes necessary) for CRAM input and output. In the following the use of reference sequence in the CRAM format is detailed.
CRAM is (primarily) a reference-based compressed format, i.e. only differences between the stored sequences and the reference are stored. As a consequence the reference that was used to generate the alignemnts is always needed in order to interpret the alignments (a checksum stored in the CRAM file is used to verify that the only the correct sequence can be used), i.e. the CRAM file on its own is not useful per default. This allows for a more space efficient storage compared to BAM. But it is also possible to use CRAM without a reference with the disadvantage that the reference is stored explicitely (as in SAM and BAM).
The Galaxy tool allows both possibilities using the "reference data" option:
- the default ("no reference")
- reference data can be chosen from history or built in genomes can be used
The reference data required for reading/writing reference based CRAM.
**Filtering alignments**
samtools view allows to filter alignements based on various criteria, i.e. the output will contain only alignemnts matching all criteria (an additional output containing the remaining alignments can be created additionally, see "Output alignments not passing the filter" in "output options"): e.g. by regions (see below), alignment quality (see below), and tags or flags set in the alignments.
**Filtering by regions**
You may specify one or more space-separated region specifications after the input filename to restrict output to only those alignments which overlap the specified region(s). Use of region specifications requires a coordinate-sorted and indexed input file (in BAM or CRAM format).
Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all position coordinates are 1-based.
Important note: when multiple regions are given, some alignments may be output multiple times if they overlap more than one of the specified regions.
Examples of region specifications:
- chr1 Output all alignments mapped to the reference sequence named 'chr1' (i.e. @SQ SN:chr1).
- chr2:1000000 The region on chr2 beginning at base position 1,000,000 and ending at the end of the chromosome.
- chr3:1000-2000 The 1001bp region on chr3 beginning at base position 1,000 and ending at base position 2,000 (including both end positions).
- '*' Output the unmapped reads at the end of the file. (This does not include any unmapped reads placed on a reference sequence alongside their mapped mates.)
- . Output all alignments. (Mostly unnecessary as not specifying a region at all has the same effect.)
**Filtering by quality**
This filters based on the MAPQ column of the SAM format which gives an estimate about the correct placement of the alignemnt. Note that aligners do not follow a consistent definition.
The -x, -B, and -s options modify the data which is contained in each alignment.