Mercurial > repos > iuc > structure

<tool id="structure" name="Structure" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="20.05">
    <description>using multi-locus genotype data to investigate population structure</description>
    <xrefs>
        <xref type="bio.tools">structure</xref>
    </xrefs>
    <macros>
        <token name="@TOOL_VERSION@">2.3.4</token>
        <token name="@VERSION_SUFFIX@">1</token>
    </macros>
    <requirements>
        <requirement type="package" version="@TOOL_VERSION@">structure</requirement>
    </requirements>
    <version_command><![CDATA[
        structure | grep -E -o 'Version.+'
    ]]></version_command>
    <command detect_errors="exit_code"><![CDATA[
        cp '$mainparams' '$out_mainparams' &&
        cp '$extraparams' '$out_extraparams' &&

        mkdir out log

        #for $run in range(1, int($nb_run) + 1):
            && structure -i '$infile' -o outfile -m '$out_mainparams' -e '$out_extraparams' > 'log/K_${main.MAXPOPS}_run_${run}_f.log'
            && mv 'outfile_f' 'out/K_${main.MAXPOPS}_run_${run}_f'
        #end for

    ]]></command>
    <configfiles>
        <configfile name="mainparams"><![CDATA[
KEY PARAMETERS FOR THE PROGRAM structure.  YOU WILL NEED TO SET THESE
IN ORDER TO RUN THE PROGRAM.  VARIOUS OPTIONS CAN BE ADJUSTED IN THE
FILE extraparams.


"(int)" means that this takes an integer value.
"(B)"   means that this variable is Boolean
        (ie insert 1 for True, and 0 for False)
"(str)" means that this is a string (but not enclosed in quotes!)


Basic Program Parameters

#define MAXPOPS    $main.MAXPOPS  // default:2      // (int) number of populations assumed
#define BURNIN    $main.BURNIN  // default:10000   // (int) length of burnin period
#define NUMREPS   $main.NUMREPS  // default:20000   // (int) number of MCMC reps after burnin

Input/Output files

#define INFILE   $infile   // (str) name of input data file
#define OUTFILE  outfile  //(str) name of output data file

Data file format

#define NUMINDS    $main.NUMINDS  // default:100    // (int) number of diploid individuals in data file
#define NUMLOCI    $main.NUMLOCI  // default:100    // (int) number of loci in data file
#define PLOIDY       $main.PLOIDY  // default:2    // (int) ploidy of data
#define MISSING     $main.MISSING  // default:-9    // (int) value given to missing genotype data
#define ONEROWPERIND $main.ONEROWPERIND  // default:0    // (B) store data for individuals in a single line


#define LABEL     $main.LABEL  // default:1     // (B) Input file contains individual labels
#define POPDATA   $main.POPDATA  // default:1     // (B) Input file contains a population identifier
#define POPFLAG   ${extra.usepopinfo_cond.POPFLAG}  // default:0     // (B) Input file contains a flag which says
                              whether to use popinfo when USEPOPINFO==1
#define LOCDATA   $main.LOCDATA  // default:0     // (B) Input file contains a location identifier

#define PHENOTYPE $main.PHENOTYPE  // default:0     // (B) Input file contains phenotype information
#define EXTRACOLS $main.EXTRACOLS  // default:0     // (int) Number of additional columns of data
                             before the genotype data start.

#define MARKERNAMES      $main.MARKERNAMES  // default:1  // (B) data file contains row of marker names
#define RECESSIVEALLELES $main.recessivealleles_cond.RECESSIVEALLELES  // default:0  // (B) data file contains dominant markers (eg AFLPs)
                            // and a row to indicate which alleles are recessive
#define MAPDISTANCES     $main.MAPDISTANCES  // default:0  // (B) data file contains row of map distances
                            // between loci


Advanced data file options

#define PHASED           $main.PHASED  // default:0 // (B) Data are in correct phase (relevant for linkage model only)
#define PHASEINFO        $main.PHASEINFO  // default:0 // (B) the data for each individual contains a line
                                  indicating phase (linkage model)
#define MARKOVPHASE      $main.MARKOVPHASE  // default:0 // (B) the phase info follows a Markov model.
#define NOTAMBIGUOUS  $main.recessivealleles_cond.NOTAMBIGUOUS  // default:-999 // (int) for use in some analyses of polyploid data


Command line options:

-m mainparams
-e extraparams
-s stratparams
-K MAXPOPS
-L NUMLOCI
-N NUMINDS
-i input file
-o output file
-D SEED

        ]]></configfile>
        <configfile name="extraparams"><![CDATA[
EXTRA PARAMS FOR THE PROGRAM structure.  THESE PARAMETERS CONTROL HOW THE
PROGRAM RUNS.  ATTRIBUTES OF THE DATAFILE AS WELL AS K AND RUNLENGTH ARE
SPECIFIED IN mainparams.

"(int)" means that this takes an integer value.
"(d)"   means that this is a double (ie, a Real number such as 3.14).
"(B)"   means that this variable is Boolean
        (ie insert 1 for True, and 0 for False).

PROGRAM OPTIONS

#define NOADMIX     $extra.NOADMIX  // default:0 // (B) Use no admixture model (0=admixture model, 1=no-admix)
#define LINKAGE     $extra.LINKAGE  // default:0 // (B) Use the linkage model model
#define USEPOPINFO  $extra.usepopinfo_cond.USEPOPINFO  // default:0 // (B) Use prior population information to pre-assign individuals
                             to clusters
#define LOCPRIOR    $extra.LOCPRIOR  // default:0 //(B)  Use location information to improve weak data

#define FREQSCORR   $extra.FREQSCORR  // default:1 // (B) allele frequencies are correlated among pops
#define ONEFST      $extra.ONEFST  // default:0 // (B) assume same value of Fst for all subpopulations.

#define INFERALPHA  $extra.inferalpha_cond.INFERALPHA  // default:1 // (B) Infer ALPHA (the admixture parameter)
#define POPALPHAS   $extra.POPALPHAS  // default:0 // (B) Individual alpha for each population
#define ALPHA     $extra.inferalpha_cond.ALPHA  // default:1.0 // (d) Dirichlet parameter for degree of admixture
                             (this is the initial value if INFERALPHA==1).

#define INFERLAMBDA $extra.inferlambda_cond.INFERLAMBDA  // default:0 // (B) Infer LAMBDA (the allele frequencies parameter)
#define POPSPECIFICLAMBDA $extra.inferlambda_cond.POPSPECIFICLAMBDA  // default:0 //(B) infer a separate lambda for each pop
                    (only if INFERLAMBDA=1).
#define LAMBDA    $extra.LAMBDA  // default:1.0 // (d) Dirichlet parameter for allele frequencies


PRIORS

#define FPRIORMEAN $extra.FPRIORMEAN  // default:0.01 // (d) Prior mean and SD of Fst for pops.
#define FPRIORSD   $extra.FPRIORSD  // default:0.05  // (d) The prior is a Gamma distribution with these parameters

#define UNIFPRIORALPHA $extra.unifprioralpha_cond.UNIFPRIORALPHA  // default:1 // (B) use a uniform prior for alpha;
                                otherwise gamma prior
#define ALPHAMAX     $extra.ALPHAMAX  // default:10.0 // (d) max value of alpha if uniform prior
#define ALPHAPRIORA   $extra.unifprioralpha_cond.ALPHAPRIORA  // default:1.0 // (only if UNIFPRIORALPHA==0): alpha has a gamma
                            prior with mean A*B, and
#define ALPHAPRIORB   $extra.unifprioralpha_cond.ALPHAPRIORB  // default:2.0 // variance A*B^2.


#define LOG10RMIN     $extra.LOG10RMIN  // default:-4.0   //(d) Log10 of minimum allowed value of r under linkage model
#define LOG10RMAX      $extra.LOG10RMAX  // default:1.0   //(d) Log10 of maximum allowed value of r
#define LOG10RPROPSD   $extra.LOG10RPROPSD  // default:0.1   //(d) standard deviation of log r in update
#define LOG10RSTART   $extra.LOG10RSTART  // default:-2.0   //(d) initial value of log10 r


USING PRIOR POPULATION INFO (USEPOPINFO)

#define GENSBACK    $extra.GENSBACK  // default:2  //(int) For use when inferring whether an indiv-
                         idual is an immigrant, or has an immigrant an-
                         cestor in the past GENSBACK generations.  eg, if
                         GENSBACK==2, it tests for immigrant ancestry
                         back to grandparents.
#define MIGRPRIOR $extra.usepopinfo_cond.MIGRPRIOR  // default:0.01 //(d) prior prob that an individual is a migrant
                             (used only when USEPOPINFO==1).  This should
                             be small, eg 0.01 or 0.1.
#define PFROMPOPFLAGONLY $extra.PFROMPOPFLAGONLY  // default:0 // (B) only use individuals with POPFLAG=1 to update    P.
                                  This is to enable use of a reference set of
                                  individuals for clustering additional "test"
                                  individuals.

LOCPRIOR MODEL FOR USING LOCATION INFORMATION

#define LOCISPOP      $extra.LOCISPOP  // default:1    //(B) use POPDATA for location information
#define LOCPRIORINIT  $extra.LOCPRIORINIT  // default:1.0  //(d) initial value for r, the location prior
#define MAXLOCPRIOR  $extra.MAXLOCPRIOR  // default:20.0  //(d) max allowed value for r


OUTPUT OPTIONS

#define PRINTNET     $extra.PRINTNET  // default:1 // (B) Print the "net nucleotide distance" to screen during the run
#define PRINTLAMBDA  $extra.PRINTLAMBDA  // default:1 // (B) Print current value(s) of lambda to screen
#define PRINTQSUM    $extra.PRINTQSUM  // default:1 // (B) Print summary of current population membership to screen

#define SITEBYSITE   $extra.SITEBYSITE  // default:0  // (B) whether or not to print site by site results.
                        (Linkage model only) This is a large file!
#define PRINTQHAT    $extra.PRINTQHAT  // default:0  // (B) Q-hat printed to a separate file.  Turn this
                           on before using STRAT.
#define UPDATEFREQ   $extra.UPDATEFREQ  // default:100  // (int) frequency of printing update on the screen.
                                 Set automatically if this is 0.
#define PRINTLIKES   $extra.PRINTLIKES  // default:0  // (B) print current likelihood to screen every rep
#define INTERMEDSAVE $extra.INTERMEDSAVE  // default:0  // (int) number of saves to file during run

#define ECHODATA     $extra.ECHODATA  // default:1  // (B) Print some of data file to screen to check
                              that the data entry is correct.
(NEXT 3 ARE FOR COLLECTING DISTRIBUTION OF Q:)
#define ANCESTDIST   $extra.ANCESTDIST  // default:0  // (B) collect data about the distribution of an-
                              cestry coefficients (Q) for each individual
#define NUMBOXES   $extra.NUMBOXES  // default:1000 // (int) the distribution of Q values is stored as
                              a histogram with this number of boxes.
#define ANCESTPINT $extra.ANCESTPINT  // default:0.90 // (d) the size of the displayed probability
                              interval on Q (values between 0.0--1.0)


MISCELLANEOUS

#define COMPUTEPROB $extra.COMPUTEPROB  // default:1     // (B) Estimate the probability of the Data under
                             the model.  This is used when choosing the
                             best number of subpopulations.
#define ADMBURNIN  $extra.ADMBURNIN  // default:500    // (int) [only relevant for linkage model]:
                             Initial period of burnin with admixture model (see Readme)
#define ALPHAPROPSD $extra.ALPHAPROPSD  // default:0.025 // (d) SD of proposal for updating alpha
#define STARTATPOPINFO $extra.STARTATPOPINFO  // default:0  // Use given populations as the initial condition
                             for population origins.  (Need POPDATA==1).  It
                             is assumed that the PopData in the input file
                             are between 1 and k where k<=MAXPOPS.
#define RANDOMIZE      $extra.randomize_cond.RANDOMIZE  // default:1  // (B) use new random seed for each run
#define SEED        $extra.randomize_cond.SEED  // default:2245  // (int) seed value for random number generator
                         (must set RANDOMIZE=0)
#define METROFREQ    $extra.METROFREQ  // default:10   // (int) Frequency of using Metropolis step to update
                             Q under admixture model (ie use the metr. move every
                             i steps).  If this is set to 0, it is never used.
                             (Proposal for each q^(i) sampled from prior.  The
                             goal is to improve mixing for small alpha.)
#define REPORTHITRATE $extra.REPORTHITRATE  // default:0 //   (B) report hit rate if using METROFREQ

        ]]></configfile>
    </configfiles>
    <inputs>
        <param name="infile" type="data" label="Genotype data" format="txt,tabular" />
        <param name="nb_run" value="1" type="integer" label="Number of runs" min="1" max="10" help="Note that the runs are sequential. Please launch separate runs if it's too long" />
        <section name="main" title="mainparams" expanded="True">
            <!--Basic Program Parameters-->
            <param argument="MAXPOPS" value="" type="integer" label="Number of populations assumed" help="or [K]"/>
            <param argument="BURNIN" value="10000" type="integer" label="Length of burnin period" />
            <param argument="NUMREPS" value="20000" type="integer" label="Number of MCMC reps after burnin" />

            <!--Data file format-->
            <param argument="NUMINDS" value="" type="integer" label="Number of diploid individuals in data file" help="or [N]"/>
            <param argument="NUMLOCI" value="" type="integer" label="Number of loci in data file" help="or [L]"/>
            <param argument="PLOIDY" value="2" type="integer" label="Ploidy of data" />
            <param argument="MISSING" value="-9" type="integer" label="Value given to missing genotype data" />
            <param argument="ONEROWPERIND" checked="False" type="boolean" label="Store data for individuals in a single line" truevalue="1" falsevalue="0" help=" E.g., for diploid data, this would mean that the two alleles for each locus are in consecutive order in the same row, rather than being arranged in the same column, in two consecutive rows "/>


            <param argument="LABEL" checked="true" type="boolean" label="Input file contains individual labels" truevalue="1" falsevalue="0" />
            <param argument="POPDATA" checked="true" type="boolean" label="Input file contains a user-defined population-of-origin for each individual" truevalue="1" falsevalue="0" />
            <param argument="LOCDATA" checked="false" type="boolean" label="Input file contains a location identifier" truevalue="1" falsevalue="0" />

            <param argument="PHENOTYPE" checked="false" type="boolean" label="Input file contains phenotype information" truevalue="1" falsevalue="0" />
            <param argument="EXTRACOLS" value="0" type="integer" label="Number of additional columns of data before the genotype data start." />

            <param argument="MARKERNAMES" checked="true" type="boolean" label="Data file contains row of marker names" truevalue="1" falsevalue="0" />
            <conditional name="recessivealleles_cond">
                <param argument="RECESSIVEALLELES" type="select" label="Data file contains dominant markers (eg AFLPs) and a row to indicate which alleles are recessive" >
                    <option value="0" selected="True">No</option>
                    <option value="1">Yes</option>
                </param>
                <when value="0">
                    <param argument="NOTAMBIGUOUS" value="-999" type="hidden" label="Defines the code indicating that genotype data at a marker are unambiguous." help="For use with polyploids when RECESSIVEALLELES=1/True. Must not match MISSING or any allele value in the data." />
                </when>
                <when value="1">
                    <param argument="NOTAMBIGUOUS" value="-999" type="integer" label="Defines the code indicating that genotype data at a marker are unambiguous." help="For use with polyploids when RECESSIVEALLELES=1/True. Must not match MISSING or any allele value in the data." />
                </when>
            </conditional>
            <param argument="MAPDISTANCES" checked="false" type="boolean" label="Data file contains row of map distances between loci" truevalue="1" falsevalue="0" />


            <!--Advanced data file options-->

            <param argument="PHASED" checked="false" type="boolean" label="Data are in correct phase (relevant for linkage model only)" truevalue="1" falsevalue="0" />
            <param argument="PHASEINFO" checked="false" type="boolean" label="The data for each individual contains a line indicating phase (linkage model)" truevalue="1" falsevalue="0" />
            <param argument="MARKOVPHASE" checked="false" type="boolean" label="The phase info follows a Markov model." truevalue="1" falsevalue="0" />
        </section>
        <section name="extra" title="extraparams" expanded="False">

            <param argument="NOADMIX" checked="false" type="boolean" label="Use no admixture model" help="(0/False=admixture model, 1/True=no-admix)" truevalue="1" falsevalue="0" />
            <param argument="LINKAGE" checked="false" type="boolean" label="Use the linkage model model" truevalue="1" falsevalue="0" />
            <conditional name="usepopinfo_cond">
                <param argument="USEPOPINFO" type="select" label="Use prior population information to pre-assign individuals to clusters">
                    <option value="0" selected="True">No</option>
                    <option value="1">Yes</option>
                </param>
                <when value="0">
                    <param argument="POPFLAG" value="0" type="hidden" label="Input file contains a flag which says whether to use popinfo" help="[mainparams] when USEPOPINFO is 1/True" />
                    <param argument="MIGRPRIOR" value="0.01" type="hidden" label="Prior prob that an individual is a migrant" help="(used only when USEPOPINFO==1/True). This should be small, eg 0.01 or 0.1." />
                </when>
                <when value="1">
                    <param argument="POPFLAG" checked="false" type="boolean" label="Input file contains a flag which says whether to use popinfo" help="[mainparams] when USEPOPINFO is 1/True" truevalue="1" falsevalue="0" />
                    <param argument="MIGRPRIOR" value="0.01" type="float" label="Prior prob that an individual is a migrant" help="(used only when USEPOPINFO==1/True). This should be small, eg 0.01 or 0.1." />
                </when>
            </conditional>
            <param argument="LOCPRIOR" checked="false" type="boolean" label="Use location information to improve weak data" truevalue="1" falsevalue="0" />

            <param argument="FREQSCORR" checked="true" type="boolean" label="Allele frequencies are correlated among pops" truevalue="1" falsevalue="0" />
            <param argument="ONEFST" checked="false" type="boolean" label="Assume same value of Fst for all subpopulations" truevalue="1" falsevalue="0" />

            <conditional name="inferalpha_cond">
                <param argument="INFERALPHA" type="select" label="Infer ALPHA (the admixture parameter)">
                    <option value="1" selected="True">Yes</option>
                    <option value="0">No</option>
                </param>
                <when value="1">
                    <param argument="ALPHA" value="1.0" type="float" label="Dirichlet parameter for degree of admixture" help="this is the initial value if INFERALPHA is 1/True." />
                </when>
                <when value="0">
                    <param argument="ALPHA" value="1.0" type="hidden" label="Dirichlet parameter for degree of admixture" help="this is the initial value if INFERALPHA is 1/True." />
                </when>
            </conditional>
            <param argument="POPALPHAS" checked="false" type="boolean" label="Individual alpha for each population" truevalue="1" falsevalue="0" />

            <conditional name="inferlambda_cond">
                <param argument="INFERLAMBDA" type="select" label="Infer LAMBDA (the allele frequencies parameter)">
                    <option value="0" selected="True">No</option>
                    <option value="1">Yes</option>
                </param>
                <when value="0">
                    <param argument="POPSPECIFICLAMBDA" value="0" type="hidden" label="Infer a separate lambda for each pop" help="(only if INFERLAMBDA=1/True)." />
                </when>
                <when value="1">
                    <param argument="POPSPECIFICLAMBDA" checked="false" type="boolean" label="Infer a separate lambda for each pop" help="(only if INFERLAMBDA=1/True)." truevalue="1" falsevalue="0" />
                </when>
            </conditional>
            <param argument="LAMBDA" value="1.0" type="float" label="Dirichlet parameter for allele frequencies" />


            <!-- PRIORS -->

            <param argument="FPRIORMEAN" value="0.01" type="float" label="The Prior (Gamma distribution) mean of Fst for pops." />
            <param argument="FPRIORSD" value="0.05" type="float" label="The Prior (Gamma distribution) Standard Deviation of Fst for pops." />

            <conditional name="unifprioralpha_cond">
                <param argument="UNIFPRIORALPHA" type="select" label="Use a uniform prior for alpha; otherwise gamma prior">
                    <option value="1" selected="True">Yes</option>
                    <option value="0">No</option>
                </param>
                <when value="1">
                    <param argument="ALPHAPRIORA" value="1.0" type="hidden" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)" />
                    <param argument="ALPHAPRIORB" value="2.0" type="hidden" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)" />
                </when>
                <when value="0">
                    <param argument="ALPHAPRIORA" value="1.0" type="float" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)"/>
                    <param argument="ALPHAPRIORB" value="2.0" type="float" label="Alpha has a gamma prior with mean A*B, and variance A*B^2." help="(only if UNIFPRIORALPHA==0/False)"/>
                </when>
            </conditional>
            <param argument="ALPHAMAX" value="10.0" type="float" label="Max value of alpha if uniform prior" />


            <param argument="LOG10RMIN" value="-4.0" type="float" label="Log10 of minimum allowed value of r under linkage model" />
            <param argument="LOG10RMAX" value="1.0" type="float" label="Log10 of maximum allowed value of r" />
            <param argument="LOG10RPROPSD" value="0.1" type="float" label="Standard deviation of log r in update" />
            <param argument="LOG10RSTART" value="-2.0" type="float" label="Initial value of log10 r" />


            <!-- USING PRIOR POPULATION INFO (USEPOPINFO) -->

            <param argument="GENSBACK" value="2" type="integer" label="For use when inferring whether an individual is an immigrant, or has an immigrant an cestor in the past GENSBACK generations." help="eg, if GENSBACK==2, it tests for immigrant ancestry back to grandparents." />
            <param argument="PFROMPOPFLAGONLY" checked="false" type="boolean" label="Only use individuals with POPFLAG=1 to update P." help="This is to enable use of a reference set of individuals for clustering additional 'test' individuals." truevalue="1" falsevalue="0" />

            <!-- LOCPRIOR MODEL FOR USING LOCATION INFORMATION -->

            <param argument="LOCISPOP" checked="true" type="boolean" label="Use POPDATA for location information" truevalue="1" falsevalue="0" />
            <param argument="LOCPRIORINIT" value="1.0" type="float" label="Initial value for r, the location prior" />
            <param argument="MAXLOCPRIOR" value="20.0" type="float" label="Max allowed value for r" />

            <!-- OUTPUT OPTIONS -->

            <param argument="PRINTNET" checked="true" type="boolean" label="Print the 'net nucleotide distance' to screen during the run" truevalue="1" falsevalue="0" />
            <param argument="PRINTLAMBDA" checked="true" type="boolean" label="Print current value(s) of lambda to screen" truevalue="1" falsevalue="0" />
            <param argument="PRINTQSUM" checked="true" type="boolean" label="Print summary of current population membership to screen" truevalue="1" falsevalue="0" />

            <param argument="SITEBYSITE" checked="false" type="boolean" label="whether or not to print site by site results." help="(Linkage model only) This is a large file!" truevalue="1" falsevalue="0" />
            <param argument="PRINTQHAT" checked="false" type="boolean" label="Q-hat printed to a separate file." help="Turn this on before using STRAT." truevalue="1" falsevalue="0" />
            <param argument="UPDATEFREQ" value="100" type="integer" label="Frequency of printing update on the screen." help="Set automatically if this is 0/False." />
            <param argument="PRINTLIKES" checked="false" type="boolean" label="Print current likelihood to screen every rep" truevalue="1" falsevalue="0" />
            <param argument="INTERMEDSAVE" value="0" type="integer" label="Number of saves to file during run" />

            <param argument="ECHODATA" checked="false" type="boolean" label="Print some of data file to screen to check that the data entry is correct." help="(NEXT 3 ARE FOR COLLECTING DISTRIBUTION OF Q:)" truevalue="1" falsevalue="0" />
            <param argument="ANCESTDIST" checked="false" type="boolean" label="Collect data about the distribution of ancestry coefficients (Q) for each individual" truevalue="1" falsevalue="0" />
            <param argument="NUMBOXES" value="1000" type="integer" label="The distribution of Q values is stored as a histogram with this number of boxes." />
            <param argument="ANCESTPINT" value="0.90" type="float" label="The size of the displayed probability interval on Q (values between 0.0--1.0)" />


            <!-- MISCELLANEOUS -->

            <param argument="COMPUTEPROB" checked="true" type="boolean" label="Estimate the probability of the Data under the model." help="This is used when choosing the best number of subpopulations." truevalue="1" falsevalue="0" />
            <param argument="ADMBURNIN" value="500" type="integer" label="Initial period of burnin with admixture model" help="[only relevant for linkage model] see Documentation" />
            <param argument="ALPHAPROPSD" value="0.025" type="float" label="SD of proposal for updating alpha" />
            <param argument="STARTATPOPINFO" checked="false" type="boolean" label="Use given populations as the initial condition for population origins." help="(Need POPDATA==1). It is assumed that the PopData in the input file are between 1 and k where k is less or equal MAXPOPS." truevalue="1" falsevalue="0" />
            <conditional name="randomize_cond">
                <param argument="RANDOMIZE" type="select" label="=use new random seed for each run">
                    <option value="1" selected="True">Yes</option>
                    <option value="0">No</option>
                </param>
                <when value="1">
                    <param argument="SEED" value="2245" type="hidden" label="Seed value for random number generator" help="(must set RANDOMIZE=0)" />
                </when>
                <when value="0">
                    <param argument="SEED" value="2245" type="integer" label="seed value for random number generator" help="(must set RANDOMIZE=0)" />
                </when>
            </conditional>
            <param argument="METROFREQ" value="10" type="integer" label="Frequency of using Metropolis step to update Q under admixture model" help="(ie use the metr. move every i steps).  If this is set to 0, it is never used. (Proposal for each q^(i) sampled from prior. The goal is to improve mixing for small alpha.)" />
            <param argument="REPORTHITRATE" checked="false" type="boolean" label="Report hit rate if using METROFREQ" truevalue="1" falsevalue="0" />
        </section>
    </inputs>
    <outputs>
        <data name="out_mainparams" format="txt" label="run_K_${main.MAXPOPS}.mainparams" />
        <data name="out_extraparams" format="txt" label="run_K_${main.MAXPOPS}.extraparams" />
        <collection name="out" type="list" label="run_K_${main.MAXPOPS}.out">
            <discover_datasets pattern="__name__" format="txt" directory="out" />
        </collection>
        <collection name="log" type="list" label="run_K_${main.MAXPOPS}.log">
            <discover_datasets pattern="__name__" format="txt" directory="log" />
        </collection>
    </outputs>
    <tests>
        <test>
            <!-- https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure-data.html -->
            <param name="infile" value="testdata1" />
            <param name="nb_run" value="2" />
            <section name="main">
                <param name="NUMINDS" value="200" />
                <param name="MAXPOPS" value="2" />
                <param name="LABEL" value="1" />
                <param name="POPDATA" value="1" />
                <param name="NUMLOCI" value="5" />
                <param name="LOCDATA" value="1" />
                <param name="PLOIDY" value="2" />
                <param name="MISSING" value="-999" />
                <param name="ONEROWPERIND" value="0" />
                <param name="MARKERNAMES" value="0" />
            </section>
            <section name="extra">
                <conditional name="randomize_cond">
                    <param name="RANDOMIZE" value="0" />
                </conditional>
            </section>
            <output_collection name="out" type="list">
                <element name="K_2_run_1_f" value="testdata1_f"  lines_diff="6" />
                <element name="K_2_run_2_f" value="testdata1_f"  lines_diff="6" />
            </output_collection>
            <output_collection name="log" type="list">
                <element name="K_2_run_1_f.log">
                    <assert_contents>
                        <has_line line="Final results printed to file outfile_f" />
                    </assert_contents>
                </element>
                <element name="K_2_run_2_f.log">
                    <assert_contents>
                        <has_line line="Final results printed to file outfile_f" />
                    </assert_contents>
                </element>
            </output_collection>
        </test>
    </tests>
    <help><![CDATA[
**Introduction**

The program structure_ implements a model-based clustering method for inferring population structure
using genotype data consisting of unlinked markers. The method was introduced in a paper
by Pritchard, Stephens and Donnelly (2000a) and extended in sequels by Falush, Stephens and
Pritchard (2003a, 2007). Applications of our method include demonstrating the presence of population
structure, identifying distinct genetic populations, assigning individuals to populations, and
identifying migrants and admixed individuals.

Briefly, we assume a model in which there are K populations (where K may be unknown),
each of which is characterized by a set of allele frequencies at each locus. Individuals in the
sample are assigned (probabilistically) to populations, or jointly to two or more populations if their
genotypes indicate that they are admixed. It is assumed that within populations, the loci are at
Hardy-Weinberg equilibrium, and linkage equilibrium. Loosely speaking, individuals are assigned
to populations in such a way as to achieve this.

Our model does not assume a particular mutation process, and it can be applied to most of the
commonly used genetic markers including microsatellites, SNPs and RFLPs. The model assumes
that markers are not in linkage disequilibrium (LD) within subpopulations, so we can’t handle
markers that are extremely close together. Starting with version 2.0, we can now deal with weakly
linked markers.

While the computational approaches implemented here are fairly powerful, some care is needed
in running the program in order to ensure sensible answers. For example, it is not possible to
determine suitable run-lengths theoretically, and this requires some experimentation on the part of
the user. This document describes the use and interpretation of the software and supplements the
published papers, which provide more formal descriptions and evaluations of the methods.

.. _structure: https://web.stanford.edu/group/pritchardlab/structure.html

**Documentation**

Please see the full Sructure documentation_

.. _documentation: https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/structure_doc.pdf

**Upstream**

Inputs can be produced from:

- Microsatellite analysis
- RADSeq analysis (eg: using populations_ from Stacks suite)

.. _populations: http://catchenlab.life.illinois.edu/stacks/manual/#export

**Input**

=======  ===  =====  =====  =====  =====  =====
              loc_a  loc_b  loc_c  loc_d  loc_e
=======  ===  =====  =====  =====  =====  =====
George   1    -9     145    66     0      92
George   1    -9     -9     64     0      94
Paula    1    106    142    68     1      92
Paula    1    106    148    64     0      94
Matthew  2    110    145    -9     0      92
Matthew  2    110    148    66     1      -9
Bob      2    108    142    64     1      94
Bob      2    -9     142    -9     0      94
Anja     1    112    142    -9     1      -9
Anja     1    114    142    66     1      94
Peter    1    -9     145    66     0      -9
Peter    1    110    145    -9     1      -9
Carsten  2    108    145    62     0      -9
Carsten  2    110    145    64     1      92
=======  ===  =====  =====  =====  =====  =====

You will find other sample data sets: here_

.. _here: https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure-data.html

**Downstream**

- Clumpp_
- Distruct_
- Structure-harvester_

.. _Clumpp: https://rosenberglab.stanford.edu/clumpp.html
.. _Distruct: https://rosenberglab.stanford.edu/distruct.html
.. _Structure-harvester: http://taylor0.biology.ucla.edu/structureHarvester/

    ]]></help>
    <citations>
        <citation type="doi">10.1111/j.1471-8286.2007.01758.x</citation>
        <citation type="doi">10.1111/j.1755-0998.2009.02591.x</citation>
    </citations>
</tool>
author	iuc
date	Sun, 20 Mar 2022 10:57:34 +0000
parents	0080ad37f8a0
children