# HG changeset patch # User wolma # Date 1418509221 18000 # Node ID 3771f6c914bca263b634ab2c97df19a9a196ac69 Imported from capsule None diff -r 000000000000 -r 3771f6c914bc deletion_predictor.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/deletion_predictor.xml Sat Dec 13 17:20:21 2014 -0500 @@ -0,0 +1,64 @@ + + Predicts deletions in one or more aligned read samples based on coverage of the reference genome and on insert sizes + + mimodd + + mimodd version -q + + mimodd delcall + #for $l in $list_input + ${l.bamfile} + #end for + $covfile -o $outputfile + --max-cov $max_cov --min-size $min_size $include_uncovered $group_by_id --verbose + + + + + + + + + + + + + + + + + + +.. class:: infomark + + **What it does** + +The tool predicts deletions from paired-end data in a two-step process: + +1) It finds regions of low-coverage, i.e., candidate regions for deletions, by scanning a BCF file produced by the *Variant Calling* tool. + + The *maximal coverage allowed inside a low-coverage region* and the *minimal deletion size* parameters are used at this step to define what is considered a low-coverage region. + + .. class:: warningmark + + The tool treats genome positions missing from the BCF input as zero coverage, so it is safe to use ONLY with BCF files produced by the *Variant Calling* tool or through other commands that keep the information for all sites. + +2) It assesses every low-coverage region statistically for evidence of it being a real deletion. **This step requires paired-end data** since it relies on shifts in the distribution of read pair insert sizes around real deletions. + +By default, the tool only reports Deletions, i.e., the subset of low-coverage regions that pass the statistical test. +If *include low-coverage regions* is selected, regions that failed the test will also be reported. + +With *group reads based on read group id only* selected, as it is by default, grouping of reads into samples is done strictly based on their read group IDs. +With the option deselected, grouping is done based on sample names in the first step of the analysis, i.e. the reads of all samples with a shared sample name are used to identify low-coverage regions. +In the second step, however, reads will be regrouped by their read group IDs again, i.e. the statistical assessment for real deletions is always done on a per read group basis. + +**TIP:** +Deselecting *group reads based on read group id only* can be useful, for example, if you have both paired-end and single-end sequencing data for the same sample. + +In this case, the two sets of reads will usually share a common sample name, but differ in their read groups. +With grouping based on sample names, the single-end data can be used together with the paired-end data to identify low-coverage regions, thus increasing overall coverage and reliability of this step. +Still, the assessment of deletions will use only the paired-end data (auto-detecting that the single-end reads do not provide insert size information). + + + + diff -r 000000000000 -r 3771f6c914bc tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Sat Dec 13 17:20:21 2014 -0500 @@ -0,0 +1,6 @@ + + + + + +