view hicCorrectMatrix.xml @ 5:44919af9194b draft

planemo upload for repository https://github.com/maxplanck-ie/HiCExplorer/tree/master/galaxy/wrapper/ commit 80462804e4fd7deafbcf8e8c5283cc7a98fa7dd5
author bgruening
date Sat, 30 Dec 2017 09:24:06 -0500
parents b55d7936cbe0
children 1612e9458940
line wrap: on
line source

<tool id="hicexplorer_hiccorrectmatrix" name="@BINARY@" version="@WRAPPER_VERSION@.0">
    <description>Runs Dekker's iterative correction over a hic matrix.</description>
    <macros>
        <token name="@BINARY@">hicCorrectMatrix</token>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command detect_errors="exit_code"><![CDATA[

        hicCorrectMatrix
            $mode.mode_selector
            --matrix '$matrix_h5_cooler'

            ## special: --chromosomes is optional, but if given needs at least one argument
            #set chroms = '" "'.join([ str($var.chromosome) for $var in $chromosomes ])
            #if chroms
                --chromosomes "$chroms"
            #end if

            #if $mode.mode_selector == 'correct':

                --iterNum $mode.iterNum
                #if $mode.outputFormat == 'h5'
                    --outFileName matrix.h5
                #elif $mode.outputFormat == 'cool'
                    --outFileName matrix.cool
                #end if

                #if $mode.filterThreshold_low and $mode.filterThreshold_large:
                    --filterThreshold $mode.filterThreshold_low $mode.filterThreshold_large
                #end if

                #if $mode.inflationCutoff:
                    --inflationCutoff $mode.inflationCutoff
                #end if

                #if $mode.transCutoff:
                    --transCutoff $mode.transCutoff
                #end if

                #if $mode.sequencedCountCutoff:
                    --sequencedCountCutoff $mode.sequencedCountCutoff
                #end if

                $mode.skipDiagonal
                $mode.perchr

            #elif $mode.mode_selector == 'merge_failed':
                --plotName diagnostic_plot.png
                --outMatrixFile corrected_matrix.npz.h5
                #if $mode.xMax:
                    --xMax $mode.xMax
                #end if
                #if $mode.filterThreshold_low and $mode.filterThreshold_large:
                    --filterThreshold '$mode.filterThreshold_low' '$mode.filterThreshold_large'
                #end if
            #else:
                --plotName diagnostic_plot.png
                #if $mode.xMax:
                    --xMax $mode.xMax
                #end if
            #end if

]]>
    </command>
    <inputs>
        <expand macro='matrix_h5_cooler_macro' />
        <conditional name="mode">
            <param name="mode_selector" type="select" label="Range restriction (in bp)" argument="--range">
                <option value="diagnostic_plot">Diagnostic plot</option>
                <option value="correct">Correct matrix</option>
            </param>
            <when value="diagnostic_plot">
                <expand macro="xMax" />
            </when>
            <when value="correct">
                <param argument="--iterNum" name="iterNum" type="integer" optional="true" value="500"
                    label="Number of iterations" />

                <param argument="--inflationCutoff" name="inflationCutoff" type="float" optional="true"
                    label="Inflation cutoff" value=""
                    help="Value corresponding to the maximum number of times a bin can be scaled up during the iterative correction.
                    For example, a inflationCutoff of 3 will filter out all bins that were expanded 3 times or more during the iterative correction."/>

                <param argument="--transCutoff" name="transCutoff" type="float" optional="true"
                    label="Trans region cutoff" value=""
                    help="Clip high counts in the top -transcut trans regions (i.e. between chromosomes). A usual value is 0.05."/>

                <param argument="--sequencedCountCutoff" name="sequencedCountCutoff" optional="true" type="float"
                    label="Sequenced count cutoff"
                    help="Each bin receives a value indicating the fraction that is covered by reads.
                        A cutoff of 0.5 will discard all those bins that have less than half of the bin covered."/>

                <param argument="--skipDiagonal" name="skipDiagonal" type="boolean" truevalue="--skipDiagonal" falsevalue="" checked="false"
                    label="Skip diagonal counts"/>

                <param argument="--perchr" name="perchr" type="boolean" truevalue="--perchr" falsevalue="" checked="false"
                    label="Normalize each chromosome separately" />
                <expand macro="filterThreshold" />
                <param name='outputFormat' type='select' label="Output file format">
                    <option value='h5'>HiCExplorer format</option>
                    <option value="cool">cool</option>
                </param>
            </when>
        </conditional>

        <repeat name="chromosomes" min="0"
            title="Include chromosomes" help="List of chromosomes to be included in the iterative correction.
                    The order of the given chromosomes will be kept for the resulting corrected matrix">
            <param name="chromosome" type="text" value="" >
                <validator type="empty_field" />
            </param>
        </repeat>

    </inputs>
    <outputs>
        <data name="outFileName_h5" from_work_dir="matrix.h5" format="h5">
            <filter>outputFormat == 'h5'</filter>
            <filter>mode['mode_selector'] == "correct"</filter>
            
        </data>
        <data name="outFileName_cool" from_work_dir="matrix.cool" format="cool">
            <filter>outputFormat == 'cool'</filter>
            <filter>mode['mode_selector'] == "correct"</filter>
            
        </data>
       
        <data name="diagnostic_plot" from_work_dir="diagnostic_plot.png" format="png">
            <filter>mode['mode_selector'] == "diagnostic_plot"</filter>
        </data>
    </outputs>
    <tests>
        <test>
            <param name="matrix_h5_cooler" value="small_test_matrix.h5"/>
            
            <param name="mode_selector" value="correct"/>
            <repeat name="chromosomes">
                <param name="chromosome" value="chrUextra"/>
            </repeat>
            <repeat name="chromosomes">
                <param name="chromosome" value="chr3LHet"/>
            </repeat> 
            <param name='outputFormat' value='h5'/>

            <output name="outFileName_h5" file="hicCorrectMatrix_result1.npz.h5" ftype="h5" compare="sim_size"/>
        </test>
        <test>
            <param name="matrix_h5_cooler" value="small_test_matrix.h5"/>
            <param name="mode_selector" value="diagnostic_plot"/>
            <repeat name="chromosomes">
                <param name="chromosome" value="chrUextra"/>
            </repeat>
            <repeat name="chromosomes">
                <param name="chromosome" value="chr3LHet"/>
            </repeat> 
            <output name="diagnostic_plot" file="diagnostic_plot.png" ftype="png" compare="sim_size"/>
        </test>
    </tests>
    <help><![CDATA[

Matrix correction
==================

``hicCorrectMatrix`` runs Dekker's iterative correction over a Hi-C matrix (`Imakaev 2012`_.). For correcting the matrix, 
it is important to remove the unassembled scaffolds (e.g. `NT_`), mitochondrial DNA and Y chromosome and keep only 
chromosomes, as scaffolds create problems with matrix correction. Therefore 
we use the chromosome names (1-19, X, Y) here. 
 
**Important**: Use ‘chr1 chr2 chr3 etc.’ if your genome index uses chromosome names with the ‘chr’ prefix.

Matrix correction works in two steps: first a histogram containing the sum of contact  per bin (row sum) is produced. This plot needs to be inspected to decide the best threshold for removing bins with lower number of reads. The second steps removes the low scoring bins and does the correction.

Input
-----


Diagnostic plot
~~~~~~~~~~~~~~~~
Plots a histogram of the coverage per bin together with the
modified z-score based on the median absolute deviation
method.

See Boris Iglewicz and David Hoaglin 1993, Volume 16: 
How to Detect and Handle Outliers The ASQC Basic References in Quality Control: Statistical Techniques,
Edward F. Mykytka, Ph.D., Editor.

Parameters
__________
- the contact matrix
- Max value for the x-axis in counts per bin
- include chromosomes


Correct
~~~~~~~

Run the iterative correction. 

Parameters
__________
- number of iterations
- inflation cutoff
- trans region cutoff
- sequenced count cutoff
- skip diagonal counts
- normalize each chromosome separately
- remove bins of low coverage
- remove bins of large coverage
- include chromosomes

Output
------

    Diagnostic plot:

    .. image:: $PATH_TO_IMAGES/diagnostic_plot.png
        :width: 70%
    
    Correct:
    - the corrected contact matrix

| For more information about HiCExplorer please consider our documentation on readthedocs.io_

.. _readthedocs.io: http://hicexplorer.readthedocs.io/en/latest/index.html

.. _`Imakaev 2012`: http://doi.org/doi:10.1038/nmeth.2148
]]></help>
    <expand macro="citations" />
</tool>