Galaxy | Tool Preview

MUMmer MaxMatch (version 0.9.alx)
Algorithms are run with default parameters (none). For specific args see help below
See specific cmd line options below for each tool

Reference

Please do not use any of the command line options that modify prefixes or file names. As obvious they are quite useless within galaxy and are likely to fail the routine!

If you found these tools/wrappers usefull in your research, please acknowledge our work. If you improve or modify the wrappers please add instead of substitute yourself into the acknowlegement section :)

MUMmer Maximal exact matching

The heart of the MUMmer package is its suffix tree based maximal matching routines. These can be used for repeat detection within a single sequence as is done by repeat-match and exact-tandems, or can be used for the alignment of two or more sequences as is done by mummer.

Mummer

mummer is a suffix tree algorithm designed to find maximal exact matches of some minimum length between two input sequences. by default mummer will only find maximal matches that are unique in the entire set of reference sequences. The match lists produced by mummer can be used alone to generate alignment dot plots, or can be passed on to the clustering algorithms for the identification of longer non-exact regions of conservation. These match lists have great versatility because they contain huge amounts of information and can be passed forward to other interpretation programs for clustering, analysis, searching, etc.

Repeat-match

repeat-match is a suffix tree algorithm designed to find maximal exact repeats within a single input sequence. It uses a similar algorithm to mummer, but altered slightly to find maximal exact matches within a single sequence.

Output formatting varies depending on the command line parameters and the output can be quite large. The standard output format that results from running repeat-match with default parameters is as follows:

Long Exact Matches:
   Start1     Start2    Length
  4919485    4919506r       22

The three columns are the first position of the repeat, the second position of the repeat, and the length of the repeat respectively. Reverse complement repeat positions are denoted by an 'r' following the Start2 position, and are relative to the forward strand of the sequence.

Exact-tandems

exact-tandems is a wrapper script for the repeat-match program. It provides a list of exact tandem repeats within a single input sequence. As with repeat-match the sequence file should contain only one sequence in FastA format, however if multiple sequences exist the first one will be used. The sequence may contain any set of upper and lowercase characters, thus DNA and protein sequence are both allowed and matching is case insensitive. The minimum match length parameter should be a positive integer, this value will be passed to the repeat-match program via the -n option.

The output format of exact-tandems is as follows:

Finding matches
Tandem repeats
   Start   Extent  UnitLen     Copies
  416173      150       45        3.3

The four columns are the first position of the tandem, the extent of the repeat region, the length of each tandem repeat unit, and the number of repeat units respectively.

Manuals and CMD line options (specific for each tool!):

Mummer

http://mummer.sourceforge.net/manual/#mummer

Repeat-match

http://mummer.sourceforge.net/manual/#repeat

exact-tandems

http://mummer.sourceforge.net/manual/#exact