Mercurial > repos > pjbriggs > motif_tools

<?xml version="1.0" encoding="utf-8"?>
<tool id="fasta_scan_iupac_each" name="IUPAC scan and output each match" version="@VERSION@">
  <description>Returns all matches to a given IUPAC in GFF format</description>
  <macros>
    <import>motif_tools_macros.xml</import>
  </macros>
  <expand macro="requirements" />
  <command><![CDATA[
  perl $__tool_directory__/Scan_IUPAC_output_each_match.pl $iupac $fasta $output $label $strand
  ]]></command>
  <inputs>
    <param name="iupac" type="text" label="IUPAC string" value="e.g. WGATAR" help="Enter an IUPAC string." size="20"/>
    <param format="fasta" name="fasta" type="data" label="FASTA file" help="Select a FASTA file containing the sequences to be scanned."/>
    <param name="label" type="text" label="Attribute in GFF output" value="IUPAC_or_name" help="The label will be included at the end (attibute) section of each GFF line. This could be the IUPAC string used or the name of the motif." size="20"/>
    <param name="strand" type="select" label="Select sequence strands to scan" help="Scan either both strands or only the forward strand.">
      <option value="0">Scan both strands</option>
      <option value="1">Only scan forward strand</option>
    </param>
  </inputs>
  <outputs>
    <data format="gff" name="output" />
  </outputs>
  <tests>
    <test>
      <param name="iupac" value="WGATAR" />
      <param name="fasta" value="phix.fa" />
      <param name="label" value="IUPAC_or_name" />
      <param name="strand" value="0" />
      <output name="output" file="iupac_each_match.gff" />
    </test>
  </tests>

  <help>
.. class:: infomark

**What it does**

This tool will find all matches to a DNA pattern in the input DNA sequence, represented by an IUPAC string. The matches are non-overlapping, so searching with 'TTTT' in 'TTTTTTTT' will find two hits to the IUPAC. The output is in GFF format and the last 'attribute' field can be specified using the 'Label' option.

IUPAC = Nucleotide(s):

A = A

C = C

G = G

T = T

M = A/C

R = A/G

W = A/T

S = C/G

Y = C/T

K = G/T

V = A/C/G

H = A/C/T

D = A/G/T

B = C/G/T

N = A/C/G/T

----

.. class:: infomark

**Options**

'IUPAC string' - can be entered as upper- or lower-case as the tool will force them to become upper-case, but will only accept the IUPAC codes listed above.

'Attribute in GFF output' - the last field of each GFF line 'attribute' can be specified using the 'Label' option, this should only include letters/numbers, but without spaces.

'Select sequence strands to scan' - Only scanning the forward strand of the input sequence is useful if the IUPAC is a palindrome (e.g. CANNTG).

----

.. class:: infomark

**Credits**

This Galaxy tool has been developed within the Bioinformatics Core Facility at the University of Manchester. It runs the Scan_IUPAC_output_each_match.pl Perl script that was written by Ian Donaldson.

Please kindly acknowledge both this Galaxy tool and Scan_IUPAC_output_each_match.pl if you use it.
  </help>

</tool>
author	pjbriggs
date	Fri, 05 Oct 2018 05:33:31 -0400
parents	b67ea47730d3
children