view gemini_roh.xml @ 7:345d41f3f574 draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/gemini commit 5ea789e5342c3ad1afd2e0068c88f2b6dc4f7246"
author iuc
date Tue, 10 Mar 2020 06:12:43 -0400
parents d65ca2fa673b
children e8349da26ed5
line wrap: on
line source

<tool id="gemini_@BINARY@" name="GEMINI @BINARY@" version="@VERSION@">
    <description>Identifying runs of homozygosity</description>
    <macros>
        <import>gemini_macros.xml</import>
        <token name="@BINARY@">roh</token>
    </macros>
    <expand macro="requirements" />
    <expand macro="stdio" />
    <expand macro="version_command" />
    <command>
<![CDATA[
        gemini @BINARY@
            --min-snps $min_snps
            --min-total-depth $min_total_depth
            --min-gt-depth $min_gt_depth
            --min-size $min_size
            --max-hets $max_hets
            --max-unknowns $max_unknowns
            #if $samples.strip():
                #set $samples = ','.join([f.strip() for f in $samples.split(',')])
                -s '$samples'
            #end if
            '$infile'
            > '$outfile'
]]>
    </command>
    <inputs>
        <expand macro="infile" />
        <param argument="--min-snps" name="min_snps" type="integer" value="25" min="0"
        label="Minimum number of expected homozygous SNPs"
        help="default: 25" />
        <param argument="--min-total-depth" name="min_total_depth" type="integer" value="20" min="0"
        label="The minimum overall sequencing depth required for a SNP to be considered"
        help="default: 20" />
        <param argument="--min-gt-depth" name="min_gt_depth" type="integer" value="0" min="0"
        label="The minimum required sequencing depth underlying a given sample's genotype for a SNP to be considered"
        help="default: 0" />
        <param argument="--min-size" name="min_size" type="integer" value="100000" min="1"
        label="Minimum run size in base pairs" help="default: 100000" />
        <param argument="--max-hets" name="max_hets" type="integer" value="1" min="0"
        label="Maximum number of allowed hets in the run" help="default: 1" />
        <param argument="--max-unknowns" name="max_unknowns" type="integer" value="3" min="0"
        label="Maximum number of allowed unknowns in the run"
        help="default: 3" />
        <param argument="-s" name="samples" type="text" value=""
        label="Comma separated list of samples to screen for ROHs" help="e.g. S120,S450"/>
    </inputs>
    <outputs>
        <data name="outfile" format="tabular" />
    </outputs>
    <tests>
        <test>
            <param name="infile" value="gemini_load_result1.db" ftype="gemini.sqlite" />
            <param name="min_snps" value="3" />
            <param name="min_size" value="10" />
            <param name="min_total_depth" value="0" />
            <output name="outfile">
                <assert_contents>
                    <has_line_matching expression="chrom&#009;start&#009;end&#009;.*run_length.*" />
                </assert_contents>
            </output>
        </test>
    </tests>
    <help><![CDATA[

**What it does**

Runs of homozygosity are long stretches of homozygous genotypes that reflect
segments shared identically by descent and are a result of consanguinity or
natural selection. Consanguinity elevates the occurrence of rare recessive
diseases (e.g. cystic fibrosis) that represent homozygotes for strongly deleterious
mutations. Hence, the identification of these runs holds medical value.

The 'roh' tool in GEMINI returns runs of homozygosity identified in whole genome data.
The tool basically looks at every homozygous position on the chromosome as a possible
start site for the run and looks for those that could give rise to a potentially long
stretch of homozygous genotypes.

For e.g. for the given example allowing ``1 HET`` genotype (h) and ``2 UKW`` genotypes (u)
the possible roh runs (H) would be:


::

	genotype_run = H H H H h H H H H u H H H H H u H H H H H H H h H H H H H h H H H H H
	roh_run1     = H H H H h H H H H u H H H H H u H H H H H H H
	roh_run2     =           H H H H u H H H H H u H H H H H H H h H H H H H
	roh_run3     =                     H H H H H u H H H H H H H h H H H H H
	roh_run4     =                                 H H H H H H H h H H H H H

roh returned for --min-snps = 20 would be:

::

	roh_run1     = H H H H h H H H H u H H H H H u H H H H H H H
	roh_run2     =           H H H H u H H H H H u H H H H H H H h H H H H H


As you can see, the immediate homozygous position right of a break (h or u) would be the possible
start of a new roh run and genotypes to the left of a break are pruned since they cannot
be part of a longer run than we have seen before.


    ]]></help>
    <expand macro="citations"/>
</tool>