view extract_flanking_dna.xml @ 32:03c22b722882

remove BeautifulSoup dependency
author Richard Burhans <burhans@bx.psu.edu>
date Fri, 20 Sep 2013 13:54:23 -0400
parents d6b961721037
children
line wrap: on
line source

<tool id="gd_extract_flanking_dna" name="Flanking Sequence" version="1.0.0">
  <description>: Fetch DNA sequence for intervals surrounding the given SNPs</description>

  <command interpreter="python">
    extract_flanking_dna.py "--input=$input" "--output=$output" "--snps_loc=${GALAXY_DATA_INDEX_DIR}/gd.snps.loc"
    #if $override_metadata.choice == "0":
      "--scaffold_col=${input.metadata.scaffold}" "--pos_col=${input.metadata.pos}" "--species=${input.metadata.species}"
    #else
      "--scaffold_col=$scaf_col" "--pos_col=$pos_col" "--species=$species"
    #end if
    "--output_format=$output_format"
  </command>

  <inputs>
    <param format="tabular" name="input" type="data" label="SNP dataset"/>
    <param name="output_format" type="select" format="integer" label="Output format">
        <option value="fasta" selected="true">FastA format</option>
        <option value="primer3">Boulder-IO (for Primer3)</option>
    </param>
    <conditional name="override_metadata">
      <param name="choice" type="select" format="integer" label="Choose columns" help="Datasets in gd_snp format have the columns in the metadata, all others need the columns chosen." >
        <option value="0" selected="true">No, get columns from metadata</option>
        <option value="1" >Yes, choose columns</option>
      </param>
      <when value="0" />
      <when value="1">
        <param name="scaf_col" type="data_column" data_ref="input" numerical="false" label="Column with scaffold"/>
        <param name="pos_col" type="data_column" data_ref="input" numerical="true" label="Column with position"/>
        <param name="species" type="select" label="Choose species">
          <options from_file="gd.species.txt">
            <column name="name" index="1"/>
            <column name="value" index="0"/>
          </options>
        </param>
      </when>
    </conditional>
  </inputs>

  <outputs>
    <data format="txt" name="output"/>
  </outputs>

  <!-- Need snpcalls files from Webb before uncommenting
  <tests>
    <test>
      <param name="input" value="test_out/select_snps/select_snps.gd_snp" ftype="gd_snp" />
      <param name="output_format" value="primer3" />
      <param name="choice" value="0" />
      <output name="output" file="test_out/extract_flanking_dna/extract_flanking_dna.txt" />
    </test>
  </tests>
  -->

  <help>

**Dataset formats**

The input dataset is in tabular_ format and must contain a scaffold or 
chromosome column and a position column.  The output is in fasta_ format or
Boulder-IO_ format used by Primer3.
(`Dataset missing?`_)

.. _tabular: ./static/formatHelp.html#tab
.. _fasta: ./static/formatHelp.html#fasta
.. _Boulder-IO: ./static/formatHelp.html#boulder
.. _Dataset missing?: ./static/formatHelp.html

-----

**What it does**

This tool reports a DNA segment containing each SNP, with up to 200 nucleotides 
on either side of the SNP position, which is indicated by "n". Fewer nucleotides
are reported if the SNP is near an end of the assembled genome fragment.

-----

**Example**

- input (gd_snp format)::

    chr2_75111355_75112576    314  A  C  L  F  chr2   75111676  C  F  15  4  53   2   9  48   Y  96   0.369  0.355  0.396  0
    chr8_93901796_93905612   2471  A  C  A  A  chr8   93904264  A  A  8   0  51   10  2  14   Y  961  0.016  0.534  0.114  2
    chr10_7434473_7435447    524   T  C  S  S  chr10  7435005   T  S  11  5  90   14  0  69   Y  626  0.066  0.406  0.727  0
    chr14_80021455_80022064  138   G  A  H  H  chr14  80021593  G  H  14  0  69   9   6  124  Y  377  0.118  0.997  0.195  1
    chr15_64470252_64471048  89    G  A  Y  Y  chr15  64470341  G  Y  5   6  109  14  0  69   Y  312  0.247  0.998  0.393  0
    chr18_48070585_48071386  514   C  T  E  K  chr18  48071100  T  K  7   7  46   14  0  69   Y  2    0.200  0.032  0.163  0
    chr18_50154905_50155664  304   A  G  Y  C  chr18  50155208  A  Y  4   2  17   5   1  22   Y  8    0.022  0.996  0.128  0
    chr18_57379354_57380496  315   C  T  V  V  chr18  57379669  G  V  11  0  60   9   6  62   Y  726  0.118  0.048  0.014  1
    chr19_14240610_14242055  232   C  T  A  V  chr19  14240840  C  A  18  8  56   15  5  42   Y  73   0.003  0.153  0.835  0
    chr19_39866997_39874915  3117  C  T  P  P  chr19  39870110  C  P  3   7  65   14  2  32   Y  6    0.321  0.911  0.462  4
    etc.

- output (FastA format)::

    > chr2_75111355_75112576 314 A C
    TATCTTCATTTTTATTATAGACTCTCTGAACCAATTTGCCCTGAGGCAGACTTTTTAAAGTACTGTGTAATGTATGAAGTCCTTCTGCTCAAGCAAATCATTGGCATGAAAACAGTTGCAAACTTATTGTGAGAGAAGAGTCCAAGAGTTTTAACAGTCTGTAAGTATATAGCCTGTGAGTTTGATTTCCTTCTTGTTTTTnTTCCAGAAACATGATCAGGGGCAAGTTCTATTGGATATAGTCTTCAAGCATCTTGATTTGACTGAGCGTGACTATTTTGGTTTGCAGTTGACTGACGATTCCACTGATAACCCAGTAAGTTTAAGCTGTTGTCTTTCATTGTCATTGCAATTTTTCTGTCTTTATACTAGGTCCTTTCTGATTTACATTGTTCACTGATT
    > chr8_93901796_93905612 2471 A C
    GCTGCCGCTGGATTTACTTCTGCTTGGGTCGAGAGCGGGCTGGATGGGTGAAGAGTGGGCTCCCCGGCCCCTGACCAGGCAGGTGCAGACAAGTCGGAAGAAGGCCCGCCGCATCTCCTTGCTGGCCAGCGTGTAGATGACGGGGTTCATGGCAGAGTTGAGCACGGCCAGCACGATGAACCACTGGGCCTTGAACAGGATnGCGCACTCCTTCACCTTGCAGGCCACATCCACAAGGAAAAGGATGAAGAGTGGGGACCAGCAGGCGATGAACACGCTCACCACGATCACCACGGTCCGCAGCAGGGCCATGGACCGCTCTGAGTTGTGCGGGCTGGCCACCCTGCGGCTGCTGGACTTCACCAGGAAGTAGATGCGTGCGTACAGGATCACGATGGTCAC
    > chr10_7434473_7435447 524 T C
    ATTATTAACAGAAACATTTCTTTTTCATTACCCAGGGGTTACACTGGTCGTTGATGTTAATCAGTTTTTGGAGAAGGAGAAGCAAAGTGATATTTTGTCTGTTCTGAAGCCTGCCGTTGGTAATACAAATGACGTAATCCCTGAATGTGCTGACAGGTACCATGACGCCCTGGCAAAAGCAAAAGAGCAAAAATCTAGAAGnGGTAAGCATCTTCACTGTTTAGCACAAATTAAATAGCACTTTGAATATGATGATTTCTGTGGTATTGTGTTATCTTACTTTTGAGACAAATAATCGCTTTCAAATGAATATTTCTGAATGTTTGTCATCTCTGGCAAGGAAATTTTTTAGTGTTTCTTTTCCTTTTTTGTCTTTTGGAAATCTGTGATTAACTTGGTGGC
    > chr14_80021455_80022064 138 G A
    ACCCAGGGATCAAACCCAGGTCTCCCGCATTGCAGGCGGATTCTTTACTGTCTGAGCCTCCAGGGAAGCCCTCGGGGCTGAAGGGATGGTTATGAAGGTGAGAAACAGGGGCCACCTGTCCCCAAGGTACCTTGCGACnTGCCATCTGCGCTCCACCAGTAAATGGACGTCTTCGATCCTTCTGTTGTTGGCGTAGTGCAAACGTTTGGGAAGGTGCTGTTTCAAGTAAGGCTTAAAGTGCTGGTCTGGTTTTTTACACTGAAATATAAATGGACATTGGATTTTGCAATGGAGAGTCTTCTAGAAGAGTCCAAGACATTCTCTCCAGAAAGCTGAAGG
    > chr15_64470252_64471048 89 G A
    TGTGTGTGTGTGTGTGTGTGTGTGCCTGTGTCTGTACATGCACACCACGTGGCCTCACCCAGTGCCCTCAGCTCCATGGTGATGTCCACnTAGCCGTGCTCCGCGCTGTAGTACATGGCCTCCTGGAGGGCCTTGGTGCGCGTCCGGCTCAGGCGCATGGGCCCCTCGCTGCCGCTGCCCTGGCTGGATGCATCGCTCTCTTCCACGCCCTCAGCCAGGATCTCCTCCAGGGACAGCACATCTGCTTTGGCCTGCTGTGGCTGAGTCAGGAGCTTCCTCAGGACGTTCCT
    etc.

  </help>
</tool>