Mercurial > repos > iuc > structure
view structure.xml @ 3:4bd5e5dbc6c5 draft default tip
"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/structure commit f03b7df7b3e4fd099ae0e1bd195c41ab2e0fa227"
author | iuc |
---|---|
date | Sun, 20 Mar 2022 10:57:34 +0000 |
parents | 0080ad37f8a0 |
children |
line wrap: on
line source
<tool id="structure" name="Structure" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.05"> <description>using multi-locus genotype data to investigate population structure</description> <xrefs> <xref type="bio.tools">structure</xref> </xrefs> <macros> <token name="@TOOL_VERSION@">2.3.4</token> <token name="@VERSION_SUFFIX@">1</token> </macros> <requirements> <requirement type="package" version="@TOOL_VERSION@">structure</requirement> </requirements> <version_command><![CDATA[ structure | grep -E -o 'Version.+' ]]></version_command> <command detect_errors="exit_code"><![CDATA[ cp '$mainparams' '$out_mainparams' && cp '$extraparams' '$out_extraparams' && mkdir out log #for $run in range(1, int($nb_run) + 1): && structure -i '$infile' -o outfile -m '$out_mainparams' -e '$out_extraparams' > 'log/K_${main.MAXPOPS}_run_${run}_f.log' && mv 'outfile_f' 'out/K_${main.MAXPOPS}_run_${run}_f' #end for ]]></command> <configfiles> <configfile name="mainparams"><![CDATA[ KEY PARAMETERS FOR THE PROGRAM structure. YOU WILL NEED TO SET THESE IN ORDER TO RUN THE PROGRAM. VARIOUS OPTIONS CAN BE ADJUSTED IN THE FILE extraparams. "(int)" means that this takes an integer value. "(B)" means that this variable is Boolean (ie insert 1 for True, and 0 for False) "(str)" means that this is a string (but not enclosed in quotes!) Basic Program Parameters #define MAXPOPS $main.MAXPOPS // default:2 // (int) number of populations assumed #define BURNIN $main.BURNIN // default:10000 // (int) length of burnin period #define NUMREPS $main.NUMREPS // default:20000 // (int) number of MCMC reps after burnin Input/Output files #define INFILE $infile // (str) name of input data file #define OUTFILE outfile //(str) name of output data file Data file format #define NUMINDS $main.NUMINDS // default:100 // (int) number of diploid individuals in data file #define NUMLOCI $main.NUMLOCI // default:100 // (int) number of loci in data file #define PLOIDY $main.PLOIDY // default:2 // (int) ploidy of data #define MISSING $main.MISSING // default:-9 // (int) value given to missing genotype data #define ONEROWPERIND $main.ONEROWPERIND // default:0 // (B) store data for individuals in a single line #define LABEL $main.LABEL // default:1 // (B) Input file contains individual labels #define POPDATA $main.POPDATA // default:1 // (B) Input file contains a population identifier #define POPFLAG ${extra.usepopinfo_cond.POPFLAG} // default:0 // (B) Input file contains a flag which says whether to use popinfo when USEPOPINFO==1 #define LOCDATA $main.LOCDATA // default:0 // (B) Input file contains a location identifier #define PHENOTYPE $main.PHENOTYPE // default:0 // (B) Input file contains phenotype information #define EXTRACOLS $main.EXTRACOLS // default:0 // (int) Number of additional columns of data before the genotype data start. #define MARKERNAMES $main.MARKERNAMES // default:1 // (B) data file contains row of marker names #define RECESSIVEALLELES $main.recessivealleles_cond.RECESSIVEALLELES // default:0 // (B) data file contains dominant markers (eg AFLPs) // and a row to indicate which alleles are recessive #define MAPDISTANCES $main.MAPDISTANCES // default:0 // (B) data file contains row of map distances // between loci Advanced data file options #define PHASED $main.PHASED // default:0 // (B) Data are in correct phase (relevant for linkage model only) #define PHASEINFO $main.PHASEINFO // default:0 // (B) the data for each individual contains a line indicating phase (linkage model) #define MARKOVPHASE $main.MARKOVPHASE // default:0 // (B) the phase info follows a Markov model. #define NOTAMBIGUOUS $main.recessivealleles_cond.NOTAMBIGUOUS // default:-999 // (int) for use in some analyses of polyploid data Command line options: -m mainparams -e extraparams -s stratparams -K MAXPOPS -L NUMLOCI -N NUMINDS -i input file -o output file -D SEED ]]></configfile> <configfile name="extraparams"><![CDATA[ EXTRA PARAMS FOR THE PROGRAM structure. THESE PARAMETERS CONTROL HOW THE PROGRAM RUNS. ATTRIBUTES OF THE DATAFILE AS WELL AS K AND RUNLENGTH ARE SPECIFIED IN mainparams. "(int)" means that this takes an integer value. "(d)" means that this is a double (ie, a Real number such as 3.14). "(B)" means that this variable is Boolean (ie insert 1 for True, and 0 for False). PROGRAM OPTIONS #define NOADMIX $extra.NOADMIX // default:0 // (B) Use no admixture model (0=admixture model, 1=no-admix) #define LINKAGE $extra.LINKAGE // default:0 // (B) Use the linkage model model #define USEPOPINFO $extra.usepopinfo_cond.USEPOPINFO // default:0 // (B) Use prior population information to pre-assign individuals to clusters #define LOCPRIOR $extra.LOCPRIOR // default:0 //(B) Use location information to improve weak data #define FREQSCORR $extra.FREQSCORR // default:1 // (B) allele frequencies are correlated among pops #define ONEFST $extra.ONEFST // default:0 // (B) assume same value of Fst for all subpopulations. #define INFERALPHA $extra.inferalpha_cond.INFERALPHA // default:1 // (B) Infer ALPHA (the admixture parameter) #define POPALPHAS $extra.POPALPHAS // default:0 // (B) Individual alpha for each population #define ALPHA $extra.inferalpha_cond.ALPHA // default:1.0 // (d) Dirichlet parameter for degree of admixture (this is the initial value if INFERALPHA==1). #define INFERLAMBDA $extra.inferlambda_cond.INFERLAMBDA // default:0 // (B) Infer LAMBDA (the allele frequencies parameter) #define POPSPECIFICLAMBDA $extra.inferlambda_cond.POPSPECIFICLAMBDA // default:0 //(B) infer a separate lambda for each pop (only if INFERLAMBDA=1). #define LAMBDA $extra.LAMBDA // default:1.0 // (d) Dirichlet parameter for allele frequencies PRIORS #define FPRIORMEAN $extra.FPRIORMEAN // default:0.01 // (d) Prior mean and SD of Fst for pops. #define FPRIORSD $extra.FPRIORSD // default:0.05 // (d) The prior is a Gamma distribution with these parameters #define UNIFPRIORALPHA $extra.unifprioralpha_cond.UNIFPRIORALPHA // default:1 // (B) use a uniform prior for alpha; otherwise gamma prior #define ALPHAMAX $extra.ALPHAMAX // default:10.0 // (d) max value of alpha if uniform prior #define ALPHAPRIORA $extra.unifprioralpha_cond.ALPHAPRIORA // default:1.0 // (only if UNIFPRIORALPHA==0): alpha has a gamma prior with mean A*B, and #define ALPHAPRIORB $extra.unifprioralpha_cond.ALPHAPRIORB // default:2.0 // variance A*B^2. #define LOG10RMIN $extra.LOG10RMIN // default:-4.0 //(d) Log10 of minimum allowed value of r under linkage model #define LOG10RMAX $extra.LOG10RMAX // default:1.0 //(d) Log10 of maximum allowed value of r #define LOG10RPROPSD $extra.LOG10RPROPSD // default:0.1 //(d) standard deviation of log r in update #define LOG10RSTART $extra.LOG10RSTART // default:-2.0 //(d) initial value of log10 r USING PRIOR POPULATION INFO (USEPOPINFO) #define GENSBACK $extra.GENSBACK // default:2 //(int) For use when inferring whether an indiv- idual is an immigrant, or has an immigrant an- cestor in the past GENSBACK generations. eg, if GENSBACK==2, it tests for immigrant ancestry back to grandparents. #define MIGRPRIOR $extra.usepopinfo_cond.MIGRPRIOR // default:0.01 //(d) prior prob that an individual is a migrant (used only when USEPOPINFO==1). This should be small, eg 0.01 or 0.1. #define PFROMPOPFLAGONLY $extra.PFROMPOPFLAGONLY // default:0 // (B) only use individuals with POPFLAG=1 to update P. This is to enable use of a reference set of individuals for clustering additional "test" individuals. LOCPRIOR MODEL FOR USING LOCATION INFORMATION #define LOCISPOP $extra.LOCISPOP // default:1 //(B) use POPDATA for location information #define LOCPRIORINIT $extra.LOCPRIORINIT // default:1.0 //(d) initial value for r, the location prior #define MAXLOCPRIOR $extra.MAXLOCPRIOR // default:20.0 //(d) max allowed value for r OUTPUT OPTIONS #define PRINTNET $extra.PRINTNET // default:1 // (B) Print the "net nucleotide distance" to screen during the run #define PRINTLAMBDA $extra.PRINTLAMBDA // default:1 // (B) Print current value(s) of lambda to screen #define PRINTQSUM $extra.PRINTQSUM // default:1 // (B) Print summary of current population membership to screen #define SITEBYSITE $extra.SITEBYSITE // default:0 // (B) whether or not to print site by site results. (Linkage model only) This is a large file! #define PRINTQHAT $extra.PRINTQHAT // default:0 // (B) Q-hat printed to a separate file. Turn this on before using STRAT. #define UPDATEFREQ $extra.UPDATEFREQ // default:100 // (int) frequency of printing update on the screen. Set automatically if this is 0. #define PRINTLIKES $extra.PRINTLIKES // default:0 // (B) print current likelihood to screen every rep #define INTERMEDSAVE $extra.INTERMEDSAVE // default:0 // (int) number of saves to file during run #define ECHODATA $extra.ECHODATA // default:1 // (B) Print some of data file to screen to check that the data entry is correct. (NEXT 3 ARE FOR COLLECTING DISTRIBUTION OF Q:) #define ANCESTDIST $extra.ANCESTDIST // default:0 // (B) collect data about the distribution of an- cestry coefficients (Q) for each individual #define NUMBOXES $extra.NUMBOXES // default:1000 // (int) the distribution of Q values is stored as a histogram with this number of boxes. #define ANCESTPINT $extra.ANCESTPINT // default:0.90 // (d) the size of the displayed probability interval on Q (values between 0.0--1.0) MISCELLANEOUS #define COMPUTEPROB $extra.COMPUTEPROB // default:1 // (B) Estimate the probability of the Data under the model. This is used when choosing the best number of subpopulations. #define ADMBURNIN $extra.ADMBURNIN // default:500 // (int) [only relevant for linkage model]: Initial period of burnin with admixture model (see Readme) #define ALPHAPROPSD $extra.ALPHAPROPSD // default:0.025 // (d) SD of proposal for updating alpha #define STARTATPOPINFO $extra.STARTATPOPINFO // default:0 // Use given populations as the initial condition for population origins. (Need POPDATA==1). It is assumed that the PopData in the input file are between 1 and k where k<=MAXPOPS. #define RANDOMIZE $extra.randomize_cond.RANDOMIZE // default:1 // (B) use new random seed for each run #define SEED $extra.randomize_cond.SEED // default:2245 // (int) seed value for random number generator (must set RANDOMIZE=0) #define METROFREQ $extra.METROFREQ // default:10 // (int) Frequency of using Metropolis step to update Q under admixture model (ie use the metr. move every i steps). If this is set to 0, it is never used. (Proposal for each q^(i) sampled from prior. The goal is to improve mixing for small alpha.) #define REPORTHITRATE $extra.REPORTHITRATE // default:0 // (B) report hit rate if using METROFREQ ]]></configfile> </configfiles> <inputs> <param name="infile" type="data" label="Genotype data" format="txt,tabular" /> <param name="nb_run" value="1" type="integer" label="Number of runs" min="1" max="10" help="Note that the runs are sequential. Please launch separate runs if it's too long" /> <section name="main" title="mainparams" expanded="True"> <!--Basic Program Parameters--> <param argument="MAXPOPS" value="" type="integer" label="Number of populations assumed" help="or [K]"/> <param argument="BURNIN" value="10000" type="integer" label="Length of burnin period" /> <param argument="NUMREPS" value="20000" type="integer" label="Number of MCMC reps after burnin" /> <!--Data file format--> <param argument="NUMINDS" value="" type="integer" label="Number of diploid individuals in data file" help="or [N]"/> <param argument="NUMLOCI" value="" type="integer" label="Number of loci in data file" help="or [L]"/> <param argument="PLOIDY" value="2" type="integer" label="Ploidy of data" /> <param argument="MISSING" value="-9" type="integer" label="Value given to missing genotype data" /> <param argument="ONEROWPERIND" checked="False" type="boolean" label="Store data for individuals in a single line" truevalue="1" falsevalue="0" help=" E.g., for diploid data, this would mean that the two alleles for each locus are in consecutive order in the same row, rather than being arranged in the same column, in two consecutive rows "/> <param argument="LABEL" checked="true" type="boolean" label="Input file contains individual labels" truevalue="1" falsevalue="0" /> <param argument="POPDATA" checked="true" type="boolean" label="Input file contains a user-defined population-of-origin for each individual" truevalue="1" falsevalue="0" /> <param argument="LOCDATA" checked="false" type="boolean" label="Input file contains a location identifier" truevalue="1" falsevalue="0" /> <param argument="PHENOTYPE" checked="false" type="boolean" label="Input file contains phenotype information" truevalue="1" falsevalue="0" /> <param argument="EXTRACOLS" value="0" type="integer" label="Number of additional columns of data before the genotype data start." /> <param argument="MARKERNAMES" checked="true" type="boolean" label="Data file contains row of marker names" truevalue="1" falsevalue="0" /> <conditional name="recessivealleles_cond"> <param argument="RECESSIVEALLELES" type="select" label="Data file contains dominant markers (eg AFLPs) and a row to indicate which alleles are recessive" > <option value="0" selected="True">No</option> <option value="1">Yes</option> </param> <when value="0"> <param argument="NOTAMBIGUOUS" value="-999" type="hidden" label="Defines the code indicating that genotype data at a marker are unambiguous." help="For use with polyploids when RECESSIVEALLELES=1/True. Must not match MISSING or any allele value in the data." /> </when> <when value="1"> <param argument="NOTAMBIGUOUS" value="-999" type="integer" label="Defines the code indicating that genotype data at a marker are unambiguous." help="For use with polyploids when RECESSIVEALLELES=1/True. Must not match MISSING or any allele value in the data." /> </when> </conditional> <param argument="MAPDISTANCES" checked="false" type="boolean" label="Data file contains row of map distances between loci" truevalue="1" falsevalue="0" /> <!--Advanced data file options--> <param argument="PHASED" checked="false" type="boolean" label="Data are in correct phase (relevant for linkage model only)" truevalue="1" falsevalue="0" /> <param argument="PHASEINFO" checked="false" type="boolean" label="The data for each individual contains a line indicating phase (linkage model)" truevalue="1" falsevalue="0" /> <param argument="MARKOVPHASE" checked="false" type="boolean" label="The phase info follows a Markov model." truevalue="1" falsevalue="0" /> </section> <section name="extra" title="extraparams" expanded="False"> <param argument="NOADMIX" checked="false" type="boolean" label="Use no admixture model" help="(0/False=admixture model, 1/True=no-admix)" truevalue="1" falsevalue="0" /> <param argument="LINKAGE" checked="false" type="boolean" label="Use the linkage model model" truevalue="1" falsevalue="0" /> <conditional name="usepopinfo_cond"> <param argument="USEPOPINFO" type="select" label="Use prior population information to pre-assign individuals to clusters"> <option value="0" selected="True">No</option> <option value="1">Yes</option> </param> <when value="0"> <param argument="POPFLAG" value="0" type="hidden" label="Input file contains a flag which says whether to use popinfo" help="[mainparams] when USEPOPINFO is 1/True" /> <param argument="MIGRPRIOR" value="0.01" type="hidden" label="Prior prob that an individual is a migrant" help="(used only when USEPOPINFO==1/True). This should be small, eg 0.01 or 0.1." /> </when> <when value="1"> <param argument="POPFLAG" checked="false" type="boolean" label="Input file contains a flag which says whether to use popinfo" help="[mainparams] when USEPOPINFO is 1/True" truevalue="1" falsevalue="0" /> <param argument="MIGRPRIOR" value="0.01" type="float" label="Prior prob that an individual is a migrant" help="(used only when USEPOPINFO==1/True). This should be small, eg 0.01 or 0.1." /> </when> </conditional> <param argument="LOCPRIOR" checked="false" type="boolean" label="Use location information to improve weak data" truevalue="1" falsevalue="0" /> <param argument="FREQSCORR" checked="true" type="boolean" label="Allele frequencies are correlated among pops" truevalue="1" falsevalue="0" /> <param argument="ONEFST" checked="false" type="boolean" label="Assume same value of Fst for all subpopulations" truevalue="1" falsevalue="0" /> <conditional name="inferalpha_cond"> <param argument="INFERALPHA" type="select" label="Infer ALPHA (the admixture parameter)"> <option value="1" selected="True">Yes</option> <option value="0">No</option> </param> <when value="1"> <param argument="ALPHA" value="1.0" type="float" label="Dirichlet parameter for degree of admixture" help="this is the initial value if INFERALPHA is 1/True." /> </when> <when value="0"> <param argument="ALPHA" value="1.0" type="hidden" label="Dirichlet parameter for degree of admixture" help="this is the initial value if INFERALPHA is 1/True." /> </when> </conditional> <param argument="POPALPHAS" checked="false" type="boolean" label="Individual alpha for each population" truevalue="1" falsevalue="0" /> <conditional name="inferlambda_cond"> <param argument="INFERLAMBDA" type="select" label="Infer LAMBDA (the allele frequencies parameter)"> <option value="0" selected="True">No</option> <option value="1">Yes</option> </param> <when value="0"> <param argument="POPSPECIFICLAMBDA" value="0" type="hidden" label="Infer a separate lambda for each pop" help="(only if INFERLAMBDA=1/True)." /> </when> <when value="1"> <param argument="POPSPECIFICLAMBDA" checked="false" type="boolean" label="Infer a separate lambda for each pop" help="(only if INFERLAMBDA=1/True)." truevalue="1" falsevalue="0" /> </when> </conditional> <param argument="LAMBDA" value="1.0" type="float" label="Dirichlet parameter for allele frequencies" /> <!-- PRIORS --> <param argument="FPRIORMEAN" value="0.01" type="float" label="The Prior (Gamma distribution) mean of Fst for pops." /> <param argument="FPRIORSD" value="0.05" type="float" label="The Prior (Gamma distribution) Standard Deviation of Fst for pops." /> <conditional name="unifprioralpha_cond"> <param argument="UNIFPRIORALPHA" type="select" label="Use a uniform prior for alpha; otherwise gamma prior"> <option value="1" selected="True">Yes</option> <option value="0">No</option> </param> <when value="1"> <param argument="ALPHAPRIORA" value="1.0" type="hidden" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)" /> <param argument="ALPHAPRIORB" value="2.0" type="hidden" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)" /> </when> <when value="0"> <param argument="ALPHAPRIORA" value="1.0" type="float" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)"/> <param argument="ALPHAPRIORB" value="2.0" type="float" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)"/> </when> </conditional> <param argument="ALPHAMAX" value="10.0" type="float" label="Max value of alpha if uniform prior" /> <param argument="LOG10RMIN" value="-4.0" type="float" label="Log10 of minimum allowed value of r under linkage model" /> <param argument="LOG10RMAX" value="1.0" type="float" label="Log10 of maximum allowed value of r" /> <param argument="LOG10RPROPSD" value="0.1" type="float" label="Standard deviation of log r in update" /> <param argument="LOG10RSTART" value="-2.0" type="float" label="Initial value of log10 r" /> <!-- USING PRIOR POPULATION INFO (USEPOPINFO) --> <param argument="GENSBACK" value="2" type="integer" label="For use when inferring whether an individual is an immigrant, or has an immigrant an cestor in the past GENSBACK generations." help="eg, if GENSBACK==2, it tests for immigrant ancestry back to grandparents." /> <param argument="PFROMPOPFLAGONLY" checked="false" type="boolean" label="Only use individuals with POPFLAG=1 to update P." help="This is to enable use of a reference set of individuals for clustering additional 'test' individuals." truevalue="1" falsevalue="0" /> <!-- LOCPRIOR MODEL FOR USING LOCATION INFORMATION --> <param argument="LOCISPOP" checked="true" type="boolean" label="Use POPDATA for location information" truevalue="1" falsevalue="0" /> <param argument="LOCPRIORINIT" value="1.0" type="float" label="Initial value for r, the location prior" /> <param argument="MAXLOCPRIOR" value="20.0" type="float" label="Max allowed value for r" /> <!-- OUTPUT OPTIONS --> <param argument="PRINTNET" checked="true" type="boolean" label="Print the 'net nucleotide distance' to screen during the run" truevalue="1" falsevalue="0" /> <param argument="PRINTLAMBDA" checked="true" type="boolean" label="Print current value(s) of lambda to screen" truevalue="1" falsevalue="0" /> <param argument="PRINTQSUM" checked="true" type="boolean" label="Print summary of current population membership to screen" truevalue="1" falsevalue="0" /> <param argument="SITEBYSITE" checked="false" type="boolean" label="whether or not to print site by site results." help="(Linkage model only) This is a large file!" truevalue="1" falsevalue="0" /> <param argument="PRINTQHAT" checked="false" type="boolean" label="Q-hat printed to a separate file." help="Turn this on before using STRAT." truevalue="1" falsevalue="0" /> <param argument="UPDATEFREQ" value="100" type="integer" label="Frequency of printing update on the screen." help="Set automatically if this is 0/False." /> <param argument="PRINTLIKES" checked="false" type="boolean" label="Print current likelihood to screen every rep" truevalue="1" falsevalue="0" /> <param argument="INTERMEDSAVE" value="0" type="integer" label="Number of saves to file during run" /> <param argument="ECHODATA" checked="false" type="boolean" label="Print some of data file to screen to check that the data entry is correct." help="(NEXT 3 ARE FOR COLLECTING DISTRIBUTION OF Q:)" truevalue="1" falsevalue="0" /> <param argument="ANCESTDIST" checked="false" type="boolean" label="Collect data about the distribution of ancestry coefficients (Q) for each individual" truevalue="1" falsevalue="0" /> <param argument="NUMBOXES" value="1000" type="integer" label="The distribution of Q values is stored as a histogram with this number of boxes." /> <param argument="ANCESTPINT" value="0.90" type="float" label="The size of the displayed probability interval on Q (values between 0.0--1.0)" /> <!-- MISCELLANEOUS --> <param argument="COMPUTEPROB" checked="true" type="boolean" label="Estimate the probability of the Data under the model." help="This is used when choosing the best number of subpopulations." truevalue="1" falsevalue="0" /> <param argument="ADMBURNIN" value="500" type="integer" label="Initial period of burnin with admixture model" help="[only relevant for linkage model] see Documentation" /> <param argument="ALPHAPROPSD" value="0.025" type="float" label="SD of proposal for updating alpha" /> <param argument="STARTATPOPINFO" checked="false" type="boolean" label="Use given populations as the initial condition for population origins." help="(Need POPDATA==1). It is assumed that the PopData in the input file are between 1 and k where k is less or equal MAXPOPS." truevalue="1" falsevalue="0" /> <conditional name="randomize_cond"> <param argument="RANDOMIZE" type="select" label="=use new random seed for each run"> <option value="1" selected="True">Yes</option> <option value="0">No</option> </param> <when value="1"> <param argument="SEED" value="2245" type="hidden" label="Seed value for random number generator" help="(must set RANDOMIZE=0)" /> </when> <when value="0"> <param argument="SEED" value="2245" type="integer" label="seed value for random number generator" help="(must set RANDOMIZE=0)" /> </when> </conditional> <param argument="METROFREQ" value="10" type="integer" label="Frequency of using Metropolis step to update Q under admixture model" help="(ie use the metr. move every i steps). If this is set to 0, it is never used. (Proposal for each q^(i) sampled from prior. The goal is to improve mixing for small alpha.)" /> <param argument="REPORTHITRATE" checked="false" type="boolean" label="Report hit rate if using METROFREQ" truevalue="1" falsevalue="0" /> </section> </inputs> <outputs> <data name="out_mainparams" format="txt" label="run_K_${main.MAXPOPS}.mainparams" /> <data name="out_extraparams" format="txt" label="run_K_${main.MAXPOPS}.extraparams" /> <collection name="out" type="list" label="run_K_${main.MAXPOPS}.out"> <discover_datasets pattern="__name__" format="txt" directory="out" /> </collection> <collection name="log" type="list" label="run_K_${main.MAXPOPS}.log"> <discover_datasets pattern="__name__" format="txt" directory="log" /> </collection> </outputs> <tests> <test> <!-- https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure-data.html --> <param name="infile" value="testdata1" /> <param name="nb_run" value="2" /> <section name="main"> <param name="NUMINDS" value="200" /> <param name="MAXPOPS" value="2" /> <param name="LABEL" value="1" /> <param name="POPDATA" value="1" /> <param name="NUMLOCI" value="5" /> <param name="LOCDATA" value="1" /> <param name="PLOIDY" value="2" /> <param name="MISSING" value="-999" /> <param name="ONEROWPERIND" value="0" /> <param name="MARKERNAMES" value="0" /> </section> <section name="extra"> <conditional name="randomize_cond"> <param name="RANDOMIZE" value="0" /> </conditional> </section> <output_collection name="out" type="list"> <element name="K_2_run_1_f" value="testdata1_f" lines_diff="6" /> <element name="K_2_run_2_f" value="testdata1_f" lines_diff="6" /> </output_collection> <output_collection name="log" type="list"> <element name="K_2_run_1_f.log"> <assert_contents> <has_line line="Final results printed to file outfile_f" /> </assert_contents> </element> <element name="K_2_run_2_f.log"> <assert_contents> <has_line line="Final results printed to file outfile_f" /> </assert_contents> </element> </output_collection> </test> </tests> <help><![CDATA[ **Introduction** The program structure_ implements a model-based clustering method for inferring population structure using genotype data consisting of unlinked markers. The method was introduced in a paper by Pritchard, Stephens and Donnelly (2000a) and extended in sequels by Falush, Stephens and Pritchard (2003a, 2007). Applications of our method include demonstrating the presence of population structure, identifying distinct genetic populations, assigning individuals to populations, and identifying migrants and admixed individuals. Briefly, we assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. It is assumed that within populations, the loci are at Hardy-Weinberg equilibrium, and linkage equilibrium. Loosely speaking, individuals are assigned to populations in such a way as to achieve this. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers including microsatellites, SNPs and RFLPs. The model assumes that markers are not in linkage disequilibrium (LD) within subpopulations, so we can’t handle markers that are extremely close together. Starting with version 2.0, we can now deal with weakly linked markers. While the computational approaches implemented here are fairly powerful, some care is needed in running the program in order to ensure sensible answers. For example, it is not possible to determine suitable run-lengths theoretically, and this requires some experimentation on the part of the user. This document describes the use and interpretation of the software and supplements the published papers, which provide more formal descriptions and evaluations of the methods. .. _structure: https://web.stanford.edu/group/pritchardlab/structure.html **Documentation** Please see the full Sructure documentation_ .. _documentation: https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/structure_doc.pdf **Upstream** Inputs can be produced from: - Microsatellite analysis - RADSeq analysis (eg: using populations_ from Stacks suite) .. _populations: http://catchenlab.life.illinois.edu/stacks/manual/#export **Input** ======= === ===== ===== ===== ===== ===== loc_a loc_b loc_c loc_d loc_e ======= === ===== ===== ===== ===== ===== George 1 -9 145 66 0 92 George 1 -9 -9 64 0 94 Paula 1 106 142 68 1 92 Paula 1 106 148 64 0 94 Matthew 2 110 145 -9 0 92 Matthew 2 110 148 66 1 -9 Bob 2 108 142 64 1 94 Bob 2 -9 142 -9 0 94 Anja 1 112 142 -9 1 -9 Anja 1 114 142 66 1 94 Peter 1 -9 145 66 0 -9 Peter 1 110 145 -9 1 -9 Carsten 2 108 145 62 0 -9 Carsten 2 110 145 64 1 92 ======= === ===== ===== ===== ===== ===== You will find other sample data sets: here_ .. _here: https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure-data.html **Downstream** - Clumpp_ - Distruct_ - Structure-harvester_ .. _Clumpp: https://rosenberglab.stanford.edu/clumpp.html .. _Distruct: https://rosenberglab.stanford.edu/distruct.html .. _Structure-harvester: http://taylor0.biology.ucla.edu/structureHarvester/ ]]></help> <citations> <citation type="doi">10.1111/j.1471-8286.2007.01758.x</citation> <citation type="doi">10.1111/j.1755-0998.2009.02591.x</citation> </citations> </tool>