# HG changeset patch # User iuc # Date 1579610418 18000 # Node ID b01db2684fa591f04b7a32f3b55140a5c56e88b9 # Parent ff313de5f7f4971dcbc70a20ec88e20defac5f20 "planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tool_collections/samtools/samtools_view commit 6692949aa694102abb64c67d46196a822fcb61bf" diff -r ff313de5f7f4 -r b01db2684fa5 macros.xml --- a/macros.xml Thu Oct 17 02:25:30 2019 -0400 +++ b/macros.xml Tue Jan 21 07:40:18 2020 -0500 @@ -77,18 +77,18 @@ - - - - - - - - - - - - + + + + + + + + + + + + - - Select output type. In case of counts only the total number of alignments is returned. All filters are taken into account - - - - - - - - - + + + + + - - - - - - + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - + + + +

+ + + + + + + + + + + + + + +

+ + + + + + + + + + + + + - - + + + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - @@ -234,53 +371,37 @@ -

- - - - - - - -

- outtype != 'count' - + mode['outtype'] == 'header' or mode['output_options']['reads_report_type'] != 'count' - - - + + + - adv_output['outputpassing'] == 'yes' and outtype != 'count' + mode['outtype'] == 'selected_reads' and mode['output_options']['reads_report_type'] != 'count' and mode['output_options']['complementary_output'] - - - + + + - outtype == 'count' + mode['outtype'] != 'header' and mode['output_options']['reads_report_type'] == 'count' - - - - - - @@ -289,7 +410,6 @@ - @@ -299,79 +419,141 @@ - + + +

+ + + +

+ + + + + + - - - - - - - + + + + + + - + + +

+ + + +

+ + + + + + - - - - - - - + + +

+ + + + +

+ + + + + + - - - - - - - - - + + + + + + + + - - + + + + + + + + - - + + + + + + + + - + + + + + + + + + + +

+ + + + +

+ + + + + + @@ -380,76 +562,177 @@ - + + +

+ + + + +

+ + + + + + - - - - - - - - - - - - - - - + + +

+ + + +

+ + + + + + - - - - - - - + + +

+ + + + +

+ + + + + - - - - - + + +

+ + + + +

+ + + + + + + + + + + + +

+ + + + + +

+ + + + + - - - - - + + +

+ + + + + +

+ + + + + + + + + + + + +

+ + + + + +

+ + + + + + + + + + + + +

+ + + + + +

+ + + + + - - - - - + + +

+ + + + + +

+ + + + + + @@ -459,20 +742,32 @@ Samtools view can: -1. filter alignments according to various criteria -2. convert between alignment formats (SAM, BAM, CRAM) +1. convert between alignment formats (SAM, BAM, CRAM) +2. filter and subsample alignments according to user-specified criteria +3. count the reads in the input dataset or those retained after filtering + and subsampling +4. obtain just the header of the input in any supported format + +In addition, the tool has (limited) options to modify read records during conversion and/or filtering by: -With no options or regions specified, prints all alignments in the specified input alignment file (in SAM, BAM, or CRAM format) to standard output in SAM format (with no header). +- stripping them of user-specified tags +- collapsing backward CIGAR operations if they are specified in their CIGAR + fields + +With default settings, the tool generates a BAM dataset with the header and +reads found in the input dataset (which can be in SAM, BAM, or CRAM format). **Alignment format conversion** +By changing the *Output format* it is possible to convert an input dataset to +another format. Inputs of type SAM, BAM, and CRAM are accepted and can be converted to each of these formats (alternatively alignment counts can be computed) by selecting the appropriate "Output type". .. class:: infomark -samtools view allows to specify a reference sequence. This is required for SAM input with missing @SQ headers (which include sequence names, length, md5, etc) and useful (and sometimes necessary) for CRAM input and output. In the following the use of reference sequence in the CRAM format is detailed. -CRAM is (intended as a primarily) a reference-based compressed format, i.e. only differences between the stored sequences and the reference are stored. As a consequence the reference that was used to generate the alignemnts is always needed in order to interpret the alignments (a checksum stored in the CRAM file is used to verify that the only the correct sequence can be used), i.e. the CRAM file on its own is not useful per default. This allows for a more space efficient storage compared to BAM. -But it is also possible to use CRAM without a reference with the disadvantage that the reference is stored explicitely (as in SAM and BAM). +The tool allows you to specify a reference sequence. This is required for SAM input with missing @SQ headers (which include sequence names, length, md5, etc) and useful (and sometimes necessary) for CRAM input and output. In the following the use of the reference sequence in the CRAM format is detailed. +CRAM is (primarily) a reference-based compressed format, i.e. only sequence differences between aligned reads and the reference are stored. As a consequence, the reference that was used during read mapping is needed in order to interpret the alignment records (a checksum stored in the CRAM file is used to verify that only the correct reference sequence can be used). This allows for more space-efficient storage than with BAM format, but such a CRAM file is not usable without its reference. +It is also possible, however, to use CRAM without a reference with the disadvantage that the reference sequence gets stored then explicitely (as in SAM and BAM). The Galaxy tool **currently generates only CRAM without reference sequence**. @@ -480,8 +775,17 @@ **Filtering alignments** -samtools view allows to filter alignements based on various criteria, i.e. the output will contain only alignemnts matching all criteria (an additional output containing the remaining alignments can be created additionally, see "Output alignments not passing the filter" in "output options"): e.g. by regions (see below), alignment quality (see below), and tags or flags set in the alignments. +If you ask for *A filtered/subsampled selection of reads*, the tool will allow +you to specify filter conditions and/or to choose a subsampling strategy, and +the output will contain one of the following depending on your choice under +*What would you like to have reported?*: +- All reads retained after filtering and subsampling +- Reads dropped during filtering and subsampling + +If instead you want to *split* the input reads based on your criteria and +obtain *two* datasets, one with the retained and one with the dropped reads, check +the *Produce extra dataset with dropped/retained reads?* option. **Filtering by regions** @@ -490,21 +794,30 @@ Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all position coordinates are 1-based. -Important note: when multiple regions are given, some alignments may be output multiple times if they overlap more than one of the specified regions. +.. class:: Warning mark + +When multiple regions are given, some alignments may be output multiple times if they overlap more than one of the specified regions. Examples of region specifications: -- chr1 Output all alignments mapped to the reference sequence named 'chr1' (i.e. @SQ SN:chr1). -- chr2:1000000 The region on chr2 beginning at base position 1,000,000 and ending at the end of the chromosome. -- chr3:1000-2000 The 1001bp region on chr3 beginning at base position 1,000 and ending at base position 2,000 (including both end positions). -- '*' Output the unmapped reads at the end of the file. (This does not include any unmapped reads placed on a reference sequence alongside their mapped mates.) -- . Output all alignments. (Mostly unnecessary as not specifying a region at all has the same effect.) +``chr1`` + Output all alignments mapped to the reference sequence named 'chr1' (i.e. @SQ SN:chr1). + +``chr2:1000000`` + The region on chr2 beginning at base position 1,000,000 and ending at the end of the chromosome. + +``chr3:1000-2000`` + The 1001bp region on chr3 beginning at base position 1,000 and ending at base position 2,000 (including both end positions). + +``*`` + Output the unmapped reads at the end of the file. (This does not include any unmapped reads placed on a reference sequence alongside their mapped mates.) + +``.`` + Output all alignments. (Mostly unnecessary as not specifying a region at all has the same effect.) **Filtering by quality** -This filters based on the MAPQ column of the SAM format which gives an estimate about the correct placement of the alignemnt. Note that aligners do not follow a consistent definition. - -The -x, -B, and -s options modify the data which is contained in each alignment. +This filters based on the MAPQ column of the SAM format which gives an estimate about the correct placement of the alignment. Note that aligners do not follow a consistent definition. diff -r ff313de5f7f4 -r b01db2684fa5 test-data/test.bed --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/test.bed Tue Jan 21 07:40:18 2020 -0500 @@ -0,0 +1,1 @@ +CHROMOSOME_I 1 120