Mercurial > repos > miller-lab > genome_diversity
diff find_intervals.xml @ 28:184d14e4270d
Update to Miller Lab devshed revision 4ede22dd5500
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Wed, 17 Jul 2013 12:46:46 -0400 |
parents | 8997f2ca8c7a |
children | a631c2f6d913 |
line wrap: on
line diff
--- a/find_intervals.xml Mon Jul 15 10:47:35 2013 -0400 +++ b/find_intervals.xml Wed Jul 17 12:46:46 2013 -0400 @@ -84,59 +84,75 @@ </tests> <help> - **Dataset formats** -The input dataset is tabular_, with required columns of chromosome, position, -and score (in any column). -The output dataset is interval_. (`Dataset missing?`_) +The input dataset is tabular_ (which includes gd_snp_ and gd_genotype_), +with required columns of chromosome, position, and score (in any column). +The output dataset is interval_. (`Dataset missing?`_) +.. _tabular: ./static/formatHelp.html#tab +.. _gd_snp: ./static/formatHelp.html#gd_snp +.. _gd_genotype: ./static/formatHelp.html#gd_genotype .. _interval: ./static/formatHelp.html#interval -.. _tabular: ./static/formatHelp.html#tab .. _Dataset missing?: ./static/formatHelp.html ----- **What it does** -The user selects a tabular dataset (such as a gd_snp dataset) and -if the dataset is not also gd_snp format, specifies -the columns containing chromosome, position, and scores (such as an Fst-value for the SNP). -For gd_snp format the metadata can be used to specify the chromosome and -position. -Other inputs include -a percentage or raw score for the "score-shift" which should be greater than the -average value for the scores column. A higher value will give smaller intervals -in the output. -If a percentage (e.g. 95%) is specified -then that percentile of the scores is used as the shift; -percentile may not work well if many rows or SNPs have the same score -(in that case use a raw score). The program subtracts the -shift from every score, then finds genomic intervals (i.e., consecutive runs -of SNPs) whose total score cannot be increased by adding or subtracting one -or more adjusted scores at the ends of the interval. -Another input is the number of times the -data should be randomized (only intervals with score exceeding the maximum for -the randomized data are reported). -If 100 shuffles are requested, then any interval reported by the tool has a -score with probability less than 0.01 of being equaled or exceeded by chance. +The user selects a tabular dataset (such as the SNV formats gd_snp and +gd_genotype) and if the dataset is not in an SNV format, specifies the +columns containing chromosome, position, and scores (such as an FST-value +for the SNP). With SNV formats, the metadata tells which columns hold the +chromosome and position. Other inputs include a percentage or raw score +for the "score-shift" which should be greater than the average value +for the scores column. A higher value will give smaller intervals in +the output. If a percentage (e.g. 95%) is specified then that percentile +of the scores is used as the shift; percentile may not work well if many +rows or SNPs have the same score (in that case use a raw score). + +The program subtracts the shift from every score, then finds genomic +intervals (i.e., consecutive runs of SNPs) whose total score cannot be +increased by adding or subtracting one or more adjusted scores at the +ends of the interval. Another input is the number of times the data +should be randomized (only intervals with score exceeding the maximum +for the randomized data are reported). If 100 shuffles are requested, +then any interval reported by the tool has a score with probability +less than 0.01 of being equaled or exceeded by chance, assuming that +the scores vary independently by position. ----- **Example** -- input (gd_snp):: +- Input (showing only the chromosome, position, and score columns):: - Contig222_chr2_9817738_9818143 220 C T 888.0 chr2 9817960 C 17 0 2 78 12 0 2 63 20 0 2 87 8 0 2 51 11 0 2 60 12 0 2 63 Y 76 0.093 1 - Contig47_chr2_25470778_25471576 126 G A 888.0 chr2 25470896 G 12 0 2 63 14 0 2 69 14 0 2 69 10 0 2 57 18 0 2 81 13 0 2 66 N 11 0.289 1 + chr2 39 0.40 + chr2 103 0.97 + chr2 188 0.72 + chr2 203 0.68 + chr2 321 0.92 + ... + chr2 1132 0.85 + chr2 1321 0.34 ... - Contig115_chr2_61631913_61632510 310 G T 999.3 chr2 61632216 G 7 0 2 48 9 0 2 54 7 0 2 48 11 0 2 60 10 0 2 57 10 0 2 57 N 13 0.184 0 - Contig31_chr2_67331584_67331785 39 C T 999.0 chr2 67331623 C 11 0 2 60 10 0 2 57 7 0 2 48 9 0 2 54 2 0 2 33 4 0 2 39 N 110 0.647 1 - etc. + +- Suppose the user-specified score-shift is 0.75. This value is subtracted from each score, giving:: -- output not reporting individual positions:: + chr2 39 -0.35 + chr2 103 0.22 + chr2 188 -0.03 + chr2 203 -0.07 + chr2 321 0.17 + ... + chr2 1132 0.10 + chr2 1321 -0.41 + ... - chr2 9817960 67331624 1272.2000 +- The output, not reporting individual positions, might be (depending on the values not shown above):: + chr2 103 1132 1.42 </help> </tool> + +