view Scan_IUPAC_output_each_match.xml @ 3:856008c4a5f3 draft default tip

Version 1.0.2 (updates bioperl to 1.7.2)
author pjbriggs
date Fri, 05 Oct 2018 05:33:31 -0400
parents b67ea47730d3
children
line wrap: on
line source

<?xml version="1.0" encoding="utf-8"?>
<tool id="fasta_scan_iupac_each" name="IUPAC scan and output each match" version="@VERSION@">
  <description>Returns all matches to a given IUPAC in GFF format</description>
  <macros>
    <import>motif_tools_macros.xml</import>
  </macros>
  <expand macro="requirements" />
  <command><![CDATA[
  perl $__tool_directory__/Scan_IUPAC_output_each_match.pl $iupac $fasta $output $label $strand
  ]]></command>
  <inputs>
    <param name="iupac" type="text" label="IUPAC string" value="e.g. WGATAR" help="Enter an IUPAC string." size="20"/>
    <param format="fasta" name="fasta" type="data" label="FASTA file" help="Select a FASTA file containing the sequences to be scanned."/>
    <param name="label" type="text" label="Attribute in GFF output" value="IUPAC_or_name" help="The label will be included at the end (attibute) section of each GFF line. This could be the IUPAC string used or the name of the motif." size="20"/>
    <param name="strand" type="select" label="Select sequence strands to scan" help="Scan either both strands or only the forward strand.">
      <option value="0">Scan both strands</option>
      <option value="1">Only scan forward strand</option>
    </param>
  </inputs>
  <outputs>
    <data format="gff" name="output" />
  </outputs>
  <tests>
    <test>
      <param name="iupac" value="WGATAR" />
      <param name="fasta" value="phix.fa" />
      <param name="label" value="IUPAC_or_name" />
      <param name="strand" value="0" />
      <output name="output" file="iupac_each_match.gff" />
    </test>
  </tests>

  <help>
.. class:: infomark

**What it does**

This tool will find all matches to a DNA pattern in the input DNA sequence, represented by an IUPAC string. The matches are non-overlapping, so searching with 'TTTT' in 'TTTTTTTT' will find two hits to the IUPAC. The output is in GFF format and the last 'attribute' field can be specified using the 'Label' option.

IUPAC = Nucleotide(s):

A = A
 
C = C

G = G

T = T

M = A/C

R = A/G

W = A/T

S = C/G

Y = C/T

K = G/T

V = A/C/G

H = A/C/T

D = A/G/T

B = C/G/T

N = A/C/G/T
 
----

.. class:: infomark

**Options**

'IUPAC string' - can be entered as upper- or lower-case as the tool will force them to become upper-case, but will only accept the IUPAC codes listed above.

'Attribute in GFF output' - the last field of each GFF line 'attribute' can be specified using the 'Label' option, this should only include letters/numbers, but without spaces.

'Select sequence strands to scan' - Only scanning the forward strand of the input sequence is useful if the IUPAC is a palindrome (e.g. CANNTG).

----

.. class:: infomark

**Credits**

This Galaxy tool has been developed within the Bioinformatics Core Facility at the University of Manchester. It runs the Scan_IUPAC_output_each_match.pl Perl script that was written by Ian Donaldson.

Please kindly acknowledge both this Galaxy tool and Scan_IUPAC_output_each_match.pl if you use it.
  </help>

</tool>