view prepare_population_structure.xml @ 21:d6b961721037

Miller Lab Devshed version 4c04e35b18f6
author Richard Burhans <burhans@bx.psu.edu>
date Mon, 05 Nov 2012 12:44:17 -0500
parents f04f40a36cc8
children 248b06e86022
line wrap: on
line source

<tool id="gd_prepare_population_structure" name="Prepare Input" version="1.0.0">
  <description>: Filter and convert to the format needed for these tools</description>

  <command interpreter="python">
    prepare_population_structure.py "$input" "$min_reads" "$min_qual" "$min_spacing" "$output" "$output.files_path"
    #if $individuals.choice == '0'
        "all_individuals"
    #else if $individuals.choice == '1'
        #for $population in $individuals.populations
          #set $pop_arg = 'population:%s:%s' % (str($population.p_input), str($population.p_input.name))
          "$pop_arg"
        #end for
    #end if
    #for $individual, $individual_col in zip($input.dataset.metadata.individual_names, $input.dataset.metadata.individual_columns)
        #set $arg = 'individual:%s:%s' % ($individual_col, $individual)
        "$arg"
    #end for
  </command>

  <inputs>
    <param name="input" type="data" format="gd_snp" label="SNP dataset" />
    <conditional name="individuals">
      <param name="choice" type="select" label="Individuals">
        <option value="0" selected="true">All individuals</option>
        <option value="1">Specified populations</option>
      </param>
      <when value="0" />
      <when value="1">
        <repeat name="populations" title="Population" min="1">
          <param name="p_input" type="data" format="gd_indivs" label="Individuals" />
        </repeat>
      </when>
    </conditional>
    <param name="min_reads" type="integer" min="0" value="0" label="Minimum SNP coverage" />
    <param name="min_qual" type="integer" min="0" value="0" label="Minimum SNP quality" />
    <param name="min_spacing" type="integer" min="0" value="0" label="Minimum spacing between SNPs" />
  </inputs>

  <outputs>
    <data name="output" format="gd_ped">
      <actions>
        <action type="metadata" name="base_name" default="admix" />
      </actions>
    </data>
  </outputs>

  <tests>
    <test>
      <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" />
      <param name="min_reads" value="3" />
      <param name="min_qual" value="30" />
      <param name="min_spacing" value="0" />
      <param name="choice" value="0" />
      <output name="output" file="test_out/prepare_population_structure/prepare_population_structure.html" ftype="html" compare="diff" lines_diff="2">
        <extra_files type="file" name="admix.map" value="test_out/prepare_population_structure/admix.map" />
        <extra_files type="file" name="admix.ped" value="test_out/prepare_population_structure/admix.ped" />
      </output>
    </test>
  </tests>

  <help>

**Dataset formats**

The input datasets are in gd_snp_ and gd_indivs_ formats.
The output dataset is in gd_ped_ format.  (`Dataset missing?`_)

.. _gd_snp: ./static/formatHelp.html#gd_snp
.. _gd_indivs: ./static/formatHelp.html#gd_indivs
.. _gd_ped: ./static/formatHelp.html#gd_ped
.. _Dataset missing?: ./static/formatHelp.html

-----

**What it does**

This tool converts a gd_snp dataset into the format needed for estimating
the population structure.  You can select the individuals to be included,
by using "population" datasets created via the Specify Individuals tool.
(It is important for these population datasets to have distinguishable names,
since they will be stored in the output's metadata so that subsequent tools
can use them as labels.  If necessary, rename the datasets to give them
distinct and meaningful names before running this tool.)

You can also filter the SNPs, based on criteria such as minimum coverage
(a qualifying SNP must have at least this many reads for every included
individual), minimum quality score (for every included individual), and/or
minimum spacing (SNPs that are too close together on the same chromosome or
scaffold are discarded).  In addition to producing the filtered and formatted
.map and .ped files for subsequent analysis, the tool reports the number of
SNPs meeting these conditions, which can be seen by clicking on the eye icon
in the history panel after the program runs.

-----

**Example**

- input::

    Contig161_chr1_4641264_4641879   115  C  T  73.5   chr1   4641382  C   6  0  2  45   8  0  2  51   15  0  2  72   5  0  2  42   6  0  2  45  10  0  2  57   Y  54  0.323  0
    Contig48_chr1_10150253_10151311   11  A  G  94.3   chr1  10150264  A   1  0  2  30   1  0  2  30    1  0  2  30   3  0  2  36   1  0  2  30   1  0  2  30   Y  22  +99.   0
    Contig20_chr1_21313469_21313570   66  C  T  54.0   chr1  21313534  C   4  0  2  39   4  0  2  39    5  0  2  42   4  0  2  39   4  0  2  39   5  0  2  42   N   1  +99.   0
    etc.

- output cover page::

    Prepare to look for population structure Galaxy Composite Dataset
    Output completed: 2012-10-01 04:09:36 PM

    Outputs
        * admix.ped (link)
        * admix.map (link)
        * Using 222 of 400 SNPs

    Inputs
        * Minimum reads covering a SNP, per individual: 6
        * Minimum quality value, per individual: 0
        * Minimum spacing between SNPs on the same scaffold: 0

    Populations
        * Pop. A
             1. PB1
             2. PB2
        * Pop. B
             1. PB3
             2. PB4
        * Pop. C
             1. PB6
             2. PB8

  </help>
</tool>