Mercurial > repos > miller-lab > genome_diversity
view evaluate_population_numbers.xml @ 21:d6b961721037
Miller Lab Devshed version 4c04e35b18f6
author | Richard Burhans <burhans@bx.psu.edu> |
---|---|
date | Mon, 05 Nov 2012 12:44:17 -0500 |
parents | 8ae67e9fb6ff |
children |
line wrap: on
line source
<tool id="gd_evaluate_population_numbers" name="Population Complexity" version="1.0.0"> <description>: Evaluate possible numbers of ancestral populations</description> <command interpreter="bash"> evaluate_population_numbers.bash "${input.extra_files_path}/admix.ped" "$output" "$max_populations" </command> <inputs> <param name="input" type="data" format="gd_ped" label="Dataset" /> <param name="max_populations" type="integer" min="1" value="5" label="Maximum number of populations" /> </inputs> <outputs> <data name="output" format="txt" /> </outputs> <!-- <tests> <test> <param name="input" value="fake" ftype="gd_ped" > <metadata name="base_name" value="admix" /> <composite_data value="test_out/prepare_population_structure/prepare_population_structure.html" /> <composite_data value="test_out/prepare_population_structure/admix.ped" /> <composite_data value="test_out/prepare_population_structure/admix.map" /> <edit_attributes type="name" value="fake" /> </param> <param name="max_populations" value="2" /> <output name="output" file="test_out/evaluate_population_numbers/evaluate_population_numbers.txt" /> </test> </tests> --> <help> **Dataset formats** The input dataset is in gd_ped_ format. The output dataset is text. (`Dataset missing?`_) .. _gd_ped: ./static/formatHelp.html#gd_ped .. _Dataset missing?: ./static/formatHelp.html ----- **What it does** The user selects a gd_ped dataset generated by the Prepare Input tool. For all possible numbers K of ancestral populations, from 1 up to a user-specified maximum, this tool produces values that indicate how well the data can be explained as genotypes from individuals derived from K ancestral populations. These values are computed by a 5-fold cross-validation procedure, so that a good choice for K will exhibit a low cross-validation error (CVE) compared with other potential settings for K. ----- **Acknowledgments** We use the program "Admixture", downloaded from http://www.genetics.ucla.edu/software/admixture/ and described in the paper "Fast model-based estimation of ancestry in unrelated individuals" by David H. Alexander, John Novembre and Kenneth Lange, Genome Research 19 (2009), pp. 1655-1664. Admixture is called with the "--cv" flag to produce these values. ----- **Example** - output with max populations of 6:: CVE (K=1): 1.10120 CVE (K=2): 1.34683 CVE (K=3): 1.80611 CVE (K=4): 1.96339 CVE (K=5): 1.21522 CVE (K=6): 0.51501 </help> </tool>