view tools/mytools/random_interval.xml @ 1:cdcb0ce84a1b

author xuebing
date Fri, 09 Mar 2012 19:45:15 -0500
parents 9071e359b9a3
line wrap: on
line source

<tool id="randominterval" name="shuffle intervals">
  <description>weight chromosome by length</description>
  <command interpreter="python"> $input $output $within $genome </command>
    <param name="input" format="interval" type="data" label="reference interval file to mimic"/>
    <param name="within" label="randomize within chromosome" help="If checked, for each original interval will move it to a random position in the SAME chromosome. The default is to move it to any chromosome (chance proportional to chromosome size)" type="boolean" truevalue="within" falsevalue="across" checked="False"/>
    <param name="genome" type="select" label="Select genome">
     <option value="/Users/xuebing/galaxy-dist/tool-data/genome/chrsize/mouse.mm9.genome" selected="true">mm9</option>
     <option value="/Users/xuebing/galaxy-dist/tool-data/genome/chrsize/mouse.mm8.genome">mm8</option>
     <option value="/Users/xuebing/galaxy-dist/tool-data/genome/chrsize/human.hg18.genome">hg18</option>
     <option value="/Users/xuebing/galaxy-dist/tool-data/genome/chrsize/human.hg19.genome">hg19</option>
    <data format="interval" name="output" />

**What it does**

This tool will generate a set of intervals randomly distributed in the genome, mimicking the size distribution of the reference set. The same number of intervals are generated.

**How it works**

For each interval in the reference set, the script picks a random position as the new start in the genome, and then pick the end such that the size of the random interval is the same as the original one. The default setting is to move the interval to any chromosome, with the probability proportional to the size/length of the chromosome. You can have it pick a random position in the same chromosome, such that in the randomized set each chromosome has the same number of intervals as the reference set. The size of the chromosome can be either learned from the reference set (chromosome size = max(interval end)) or read from a chromosome size file. When learning from the reference set, only regions spanned by reference intervals are used to generate random intervals. Regions (may be an entire chromosome) not covered by the reference set will not appear in the output.

**Chromosome size file**

Chromosome size files for hg18,hg19,mm8,and mm9 can be found in 'Shared Data'. To use those files, select the correct one and import into to the history, then the file will be listed in the drop-down menu of this tool. You can also make your own chromosme size file: each line specifies the size of a chromosome (tab-delimited):

chr1 92394392

chr2 232342342    

You can use the following script from UCSC genome browser to download chromosome size files for other genomes: