view coverage_stats-1b5523d3d2c2/tools/coverage_stats/coverage_stats.xml @ 3:57b3ea22aff3 draft

Uploaded v0.1.0 which was already on the Test Tool Shed. Included Python 3 support.
author peterjc
date Tue, 11 Aug 2020 18:23:05 -0400
parents
children
line wrap: on
line source

<tool id="coverage_stats" name="BAM coverage statistics" version="0.1.0">
    <description>using samtools idxstats and depth</description>
    <requirements>
        <requirement type="package" version="1.4.1">samtools</requirement>
    </requirements>
    <version_command>
python $__tool_directory__/coverage_stats.py --version
    </version_command>
    <command detect_errors="aggressive">
python $__tool_directory__/coverage_stats.py
-b '$input_bam'
-i '${input_bam.metadata.bam_index}'
-o '$out_tabular'
-d '$max_depth'
    </command>
    <inputs>
        <param name="input_bam" type="data" format="bam" label="Input BAM file" />
        <param name="max_depth" type="integer" min="0" max="10000000" label="Max depth" value="8000" />
    </inputs>
    <outputs>
        <data name="out_tabular" format="tabular" label="$input_bam.name (coverage stats)" />
    </outputs>
    <tests>
        <test>
            <param name="input_bam" value="ex1.bam" ftype="bam" />
            <param name="max_depth" value="123" />
            <output name="out_tabular" file="ex1.coverage_stats.tabular" ftype="tabular" />
        </test>
        <test>
            <param name="input_bam" value="ex1.bam" ftype="bam" />
            <param name="max_depth" value="50" />
            <output name="out_tabular" file="ex1.coverage_stats.md50.tabular" ftype="tabular" />
        </test>
        <test>
            <param name="input_bam" value="coverage_test.bam" ftype="bam" />
            <param name="max_depth" value="123" />
            <output name="out_tabular" file="coverage_test.coverage_stats.tabular" ftype="tabular" />
        </test>
    </tests>
    <help>
**What it does**

This tool runs the commands ``samtools idxstats`` and ``samtools depth`` from the
SAMtools toolkit, and parses their output to produce a consise summary of the
coverage information for each reference sequence.

Input is a sorted and indexed BAM file, the output is tabular. The first four
columns match the output from ``samtools idxstats``, the additional columns are
calculated from the ``samtools depth`` output. The final row with a star as the
reference identifier represents unmapped reads, and will have zeros in every
column except columns one and four.

====== =================================================================================
Column Description
------ ---------------------------------------------------------------------------------
     1 Reference sequence identifier
     2 Reference sequence length
     3 Number of mapped reads
     4 Number of placed but unmapped reads (typically unmapped partners of mapped reads)
     5 Minimum coverage (per base of reference)
     6 Maximum coverage (per base of reference)
     7 Mean coverage (given to 2 dp)
====== =================================================================================

Example output from a *de novo* assembly:

========== ====== ====== ====== ======= ======= ========
identiifer length mapped placed min_cov max_cov mean_cov
---------- ------ ------ ------ ------- ------- --------
contig_1   833604 436112      0       1     157    71.95
contig_2    14820   9954      0       1     152    91.27
contig_3   272099 142958      0       1     150    72.31
contig_4   135519  73288      0       1     149    75.23
contig_5    91245  46759      0       1     157    70.92
contig_6   175604  95744      0       1     146    75.99
contig_7    90586  48158      0       1     151    72.93
contig_9   234347 126458      0       1     159    75.40
contig_10  121515  60211      0       1     152    68.12
...           ...    ...    ...     ...     ...      ...
contig_604    712     85      0       1      49    21.97
\*              0      0 950320       0       0     0.00
========== ====== ====== ====== ======= ======= ========

In this example there were 604 contigs, each with one line in the output table,
plus the final row (labelled with an asterisk) representing 950320 unmapped reads.
In this BAM file, the fourth column was otherwise zero.

.. class:: warningmark

**Note**. If using this on a mapping BAM file, beware that the coverage counting is
done per base of the reference.  This means if your reference has any extra bases
compared to the reads being mapped, those bases will be skipped by CIGAR D operators
and these "extra" bases can have an extremely low coverage, giving a potentially
misleading ``min_cov`` values. A sliding window coverage may be more appropriate.

**Note**. Up until samtools 1.2, there was an internal hard limit of 8000 for the
pileup routine, meaning the reported coverage from ``samtools depth`` would show
maximum coverage depths *around* 8000. This is now a run time option.


**Citation**

If you use this Galaxy tool in work leading to a scientific publication please
cite:

Heng Li et al (2009). The Sequence Alignment/Map format and SAMtools.
Bioinformatics 25(16), 2078-9.
https://doi.org/10.1093/bioinformatics/btp352

Peter J.A. Cock (2013), BAM coverage statistics using samtools idxstats and depth.
http://toolshed.g2.bx.psu.edu/view/peterjc/coverage_stats

This wrapper is available to install into other Galaxy Instances via the Galaxy
Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/coverage_stats
    </help>
    <citations>
        <citation type="doi">10.1093/bioinformatics/btp352</citation>
    </citations>
</tool>