view hicMergeMatrixBins.xml @ 12:faab8edcaac9 draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/hicexplorer commit 5744259254d4254a29cb7a6687fbbfd103301064"
author bgruening
date Wed, 05 Feb 2020 20:00:33 -0500
parents 49c8055f6830
children 03dd3c16ae4e
line wrap: on
line source

<tool id="hicexplorer_hicmergematrixbins" name="@BINARY@" version="@WRAPPER_VERSION@.0">
    <description>merge adjacent bins from a Hi-C contact matrix to reduce its resolution</description>
    <macros>
        <token name="@BINARY@">hicMergeMatrixBins</token>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="exit_code"><![CDATA[
        ln -s '$matrix_h5_cooler' 'matrix.$matrix_h5_cooler.ext' &&
        @BINARY@
            --matrix 'matrix.$matrix_h5_cooler.ext'
            --numBins $numBins
            $runningWindow
            --outFileName 'matrix.$matrix_h5_cooler.ext'
        && mv 'matrix.$matrix_h5_cooler.ext' matrix

]]>
    </command>
    <inputs>
        <expand macro='matrix_h5_cooler_macro' />

        <param argument="--numBins" type="integer" min="1" value="3" label="Number of bins to merge" />
        <param argument="--runningWindow" type="boolean" falsevalue="" truevalue="--runningWindow" label="Set to merge for using a running window of length --numBins. Usually not set." />
    </inputs>
    <outputs>
        <data name="outFileName" from_work_dir="matrix" format="cool">
              <change_format>
                <when input_dataset="matrix_h5_cooler" attribute="ext" value="h5" format="h5"/>
            </change_format>
        </data>
    </outputs>
    <tests>
        <test>
            <param name="matrix_h5_cooler" value="small_test_matrix.h5"/>
            <param name="numBins" value="5" />
            <output name="outFileName" ftype="h5">
                <assert_contents>
                    <has_h5_keys keys='intervals,matrix,nan_bins'/>
                </assert_contents>
            </output>
        </test>
    </tests>
    <help><![CDATA[

Change matrix resolution
========================

**hicMergeMatrixBins** is used to decrease the resolution of a matrix. With this tool, you can for example create out of a 100 kb
contact matrix a 1000 kb one:

Number of bins to merge = 10

100 kb * 10 = 1000 kb = 1 Mb

Depending on the downstream analyses to perform on a Hi-C matrix generated with HiCExplorer, one might need different bin resolutions. For example using ``hicPlotMatrix`` to display chromatin interactions of a whole chromosome will not produce any meaningful vizualisation if it is performed on a matrix at restriction sites resolution (unmerged). Furthermore, the higher the resolution of a matrix, the more detailed it is, which can make it
difficult to interpret, especially if the read depth of the Hi-C data is not high enough. **hicMergeMatrixBins** address these issues by merging a given number of adjacent bins to reduce Hi-C matrices resolution.

_________________

Usage
-----

To limit the loss of information, it is mandatory to perform **hicMergeMatrixBins** on matrices prior to any correction and any other bin merging (direct output from ``hicBuildMatrix``). After bin merging, ``hicCorrectMatrix`` must be used for downstream analyses requiring corrected matrices.

_________________

Output
------

**hicMergeMatrixBins** outputs a Hi-C matrix with reduced resolution.

Below, we will develop the example of a Hi-C matrix in *Drosophila melanogaster* that we want to display at the whole X-chromosome scale and at the scale of a 1Mb region of the X chromosome. To do this, we performed two different bin merging using **hicMergeMatrixBins** on an uncorrected matrix built at the restiction sites resolution using ``hicBuildMatrix``.

Starting from a matrix with bins of a median length of 529bp (restriction enzyme resolution, here DpnII), running **hicMergeMatrixBins** with a number of bins to merge of 3 produced a matrix with bins of a median length of 1661bp, while **hicMergeMatrixBins** with a number of bins to merge of 50 produced a matrix with bins of a median length of 29798bp.

After the correction of these three matrices using ``hicCorrectMatrix``, we plotted them using ``hicPlotMatrix`` at the scale of the whole X-chromosome and at the scale of the X:2000000-3000000 region to see the effect of bin merging on the interactions visualization.

- **Effect of bins merging at the scale of a chromosome:**

.. image:: $PATH_TO_IMAGES/hicMergeMatrixBins_Xchr.png
   :width: 60 %

When observed altogether, the plots above show that the merging of bins by 50 is the most adequate way to plot interactions for a whole chromosome in *Drosophila melanogaster* when starting from a matrix with bins of a median length of 529bp.

- **Effect of bins merging at the scale of a specific region:**

.. image:: $PATH_TO_IMAGES/hicMergeMatrixBins_Xregion.png
   :width: 60 %

When observed altogether, the plots above show that the merging of bins by 3 is the most adequate way to plot interactions for a region of 1Mb in Drosophila melanogaster when starting from a matrix with bins of a median length of 529bp.

_________________

| For more information about HiCExplorer please consider our documentation on readthedocs.io_

.. _readthedocs.io: http://hicexplorer.readthedocs.io/en/latest/index.html
]]></help>
    <expand macro="citations" />
</tool>