view tools/mytools/dreme.xml @ 1:cdcb0ce84a1b

author xuebing
date Fri, 09 Mar 2012 19:45:15 -0500
parents 9071e359b9a3
line wrap: on
line source

<tool id="dreme" name="DREME">
  <description>short motif discovery</description>
  <command interpreter="python">/Users/xuebing/bin/ -p $input -png     -e $ethresh
    #if $background_select.bg_select == "fromfile":
        -n "${bgfile}"
    #end if

  &amp;&amp; mv dreme_out/dreme.html ${html_outfile}
  &amp;&amp; mv dreme_out/dreme.txt ${txt_outfile}

  &amp;&amp; mv dreme_out/dreme.xml ${xml_outfile}
  &amp;&amp; rm -rf dreme_out
      <param name="input" type="data" format="fasta" label="Sequence file (FASTA)"/>      
     <conditional name="background_select">
    	<param name="bg_select" type="select" label="Background sequence" >
		<option value="shuffle" selected="true">shuffle the orignal sequence</option>
		<option value="fromfile">load from file</option>
	    <when value="fromfile">
		    <param name="bgfile" type="data" format="fasta" label="Background sequence file (FASTA)"/>
      <param name="ethresh" size="10" type="float" value="0.05" label="E-value threshold"/>

    <data format="xml" name="xml_outfile" label="${} on ${on_string} (xml)"/>
    <data format="txt" name="txt_outfile" label="${} on ${on_string} (motif)"/>
    <data format="html" name="html_outfile" label="${} on ${on_string} (html)"/>    

**What it does**

DREME (Discriminative Regular Expression Motif Elicitation) finds relatively short motifs (up to 8 bases) fast, and can perform discriminative motif discovery if given a negative set, consisting of sequences unlikely to contain a motif of interest that is however likely to be found in the main ("positive") sequence set. If you do not provide a negative set the program shuffles the positive set to provide a background (in the role of the negative set).

The input to DREME is one or two sets of DNA sequences. The program uses a Fisher Exact Test to determine significance of each motif found in the postive set as compared with its representation in the negative set, using a significance threshold that may be set on the command line.

DREME achieves its high speed by restricting its search to regular expressions based on the IUPAC alphabet representing bases and ambiguous characters, and by using a heuristic estimate of generalised motifs' statistical significance.