Galaxy |

Dataset formats

The input dataset is tabular (which includes gd_snp and gd_genotype), with required columns of chromosome, position, and score (in any column). The output dataset is interval. (Dataset missing?)

What it does

The user selects a tabular dataset (such as the SNV formats gd_snp and gd_genotype) and if the dataset is not in an SNV format, specifies the columns containing chromosome, position, and scores (such as an FST-value for the SNP). With SNV formats, the metadata tells which columns hold the chromosome and position. Other inputs include a percentage or raw score for the "score-shift" which should be greater than the average value for the scores column. A higher value will give smaller intervals in the output. If a percentage (e.g. 95%) is specified then that percentile of the scores is used as the shift; percentile may not work well if many rows or SNPs have the same score (in that case use a raw score).

The program subtracts the shift from every score, then finds genomic intervals (i.e., consecutive runs of SNPs) whose total score cannot be increased by adding or subtracting one or more adjusted scores at the ends of the interval. Another input is the number of times the data should be randomized (only intervals with score exceeding the maximum for the randomized data are reported). If 100 shuffles are requested, then any interval reported by the tool has a score with probability less than 0.01 of being equaled or exceeded by chance, assuming that the scores vary independently by position.

Example

Input (showing only the chromosome, position, and score columns):

chr2      39      0.40
chr2     103      0.97
chr2     188      0.72
chr2     203      0.68
chr2     321      0.92
...
chr2    1132      0.85
chr2    1321      0.34
...

Suppose the user-specified score-shift is 0.75. This value is subtracted from each score, giving:

chr2      39     -0.35
chr2     103      0.22
chr2     188     -0.03
chr2     203     -0.07
chr2     321      0.17
...
chr2    1132      0.10
chr2    1321     -0.41
...

The output, not reporting individual positions, might be (depending on the values not shown above):
```
chr2    103    1132    1.42
```