comparison coverage_stats-1b5523d3d2c2/tools/coverage_stats/coverage_stats.xml @ 3:57b3ea22aff3 draft

Uploaded v0.1.0 which was already on the Test Tool Shed. Included Python 3 support.
author peterjc
date Tue, 11 Aug 2020 18:23:05 -0400
parents
children
comparison
equal deleted inserted replaced
2:7254ece0c0ff 3:57b3ea22aff3
1 <tool id="coverage_stats" name="BAM coverage statistics" version="0.1.0">
2 <description>using samtools idxstats and depth</description>
3 <requirements>
4 <requirement type="package" version="1.4.1">samtools</requirement>
5 </requirements>
6 <version_command>
7 python $__tool_directory__/coverage_stats.py --version
8 </version_command>
9 <command detect_errors="aggressive">
10 python $__tool_directory__/coverage_stats.py
11 -b '$input_bam'
12 -i '${input_bam.metadata.bam_index}'
13 -o '$out_tabular'
14 -d '$max_depth'
15 </command>
16 <inputs>
17 <param name="input_bam" type="data" format="bam" label="Input BAM file" />
18 <param name="max_depth" type="integer" min="0" max="10000000" label="Max depth" value="8000" />
19 </inputs>
20 <outputs>
21 <data name="out_tabular" format="tabular" label="$input_bam.name (coverage stats)" />
22 </outputs>
23 <tests>
24 <test>
25 <param name="input_bam" value="ex1.bam" ftype="bam" />
26 <param name="max_depth" value="123" />
27 <output name="out_tabular" file="ex1.coverage_stats.tabular" ftype="tabular" />
28 </test>
29 <test>
30 <param name="input_bam" value="ex1.bam" ftype="bam" />
31 <param name="max_depth" value="50" />
32 <output name="out_tabular" file="ex1.coverage_stats.md50.tabular" ftype="tabular" />
33 </test>
34 <test>
35 <param name="input_bam" value="coverage_test.bam" ftype="bam" />
36 <param name="max_depth" value="123" />
37 <output name="out_tabular" file="coverage_test.coverage_stats.tabular" ftype="tabular" />
38 </test>
39 </tests>
40 <help>
41 **What it does**
42
43 This tool runs the commands ``samtools idxstats`` and ``samtools depth`` from the
44 SAMtools toolkit, and parses their output to produce a consise summary of the
45 coverage information for each reference sequence.
46
47 Input is a sorted and indexed BAM file, the output is tabular. The first four
48 columns match the output from ``samtools idxstats``, the additional columns are
49 calculated from the ``samtools depth`` output. The final row with a star as the
50 reference identifier represents unmapped reads, and will have zeros in every
51 column except columns one and four.
52
53 ====== =================================================================================
54 Column Description
55 ------ ---------------------------------------------------------------------------------
56 1 Reference sequence identifier
57 2 Reference sequence length
58 3 Number of mapped reads
59 4 Number of placed but unmapped reads (typically unmapped partners of mapped reads)
60 5 Minimum coverage (per base of reference)
61 6 Maximum coverage (per base of reference)
62 7 Mean coverage (given to 2 dp)
63 ====== =================================================================================
64
65 Example output from a *de novo* assembly:
66
67 ========== ====== ====== ====== ======= ======= ========
68 identiifer length mapped placed min_cov max_cov mean_cov
69 ---------- ------ ------ ------ ------- ------- --------
70 contig_1 833604 436112 0 1 157 71.95
71 contig_2 14820 9954 0 1 152 91.27
72 contig_3 272099 142958 0 1 150 72.31
73 contig_4 135519 73288 0 1 149 75.23
74 contig_5 91245 46759 0 1 157 70.92
75 contig_6 175604 95744 0 1 146 75.99
76 contig_7 90586 48158 0 1 151 72.93
77 contig_9 234347 126458 0 1 159 75.40
78 contig_10 121515 60211 0 1 152 68.12
79 ... ... ... ... ... ... ...
80 contig_604 712 85 0 1 49 21.97
81 \* 0 0 950320 0 0 0.00
82 ========== ====== ====== ====== ======= ======= ========
83
84 In this example there were 604 contigs, each with one line in the output table,
85 plus the final row (labelled with an asterisk) representing 950320 unmapped reads.
86 In this BAM file, the fourth column was otherwise zero.
87
88 .. class:: warningmark
89
90 **Note**. If using this on a mapping BAM file, beware that the coverage counting is
91 done per base of the reference. This means if your reference has any extra bases
92 compared to the reads being mapped, those bases will be skipped by CIGAR D operators
93 and these "extra" bases can have an extremely low coverage, giving a potentially
94 misleading ``min_cov`` values. A sliding window coverage may be more appropriate.
95
96 **Note**. Up until samtools 1.2, there was an internal hard limit of 8000 for the
97 pileup routine, meaning the reported coverage from ``samtools depth`` would show
98 maximum coverage depths *around* 8000. This is now a run time option.
99
100
101 **Citation**
102
103 If you use this Galaxy tool in work leading to a scientific publication please
104 cite:
105
106 Heng Li et al (2009). The Sequence Alignment/Map format and SAMtools.
107 Bioinformatics 25(16), 2078-9.
108 https://doi.org/10.1093/bioinformatics/btp352
109
110 Peter J.A. Cock (2013), BAM coverage statistics using samtools idxstats and depth.
111 http://toolshed.g2.bx.psu.edu/view/peterjc/coverage_stats
112
113 This wrapper is available to install into other Galaxy Instances via the Galaxy
114 Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/coverage_stats
115 </help>
116 <citations>
117 <citation type="doi">10.1093/bioinformatics/btp352</citation>
118 </citations>
119 </tool>