Galaxy | Tool Preview

hmmemit (version 0.1.0)
(-N)
(--seed)

What it does

The hmmemit program samples (emits) sequences from the profile HMM(s) in hmmfile, and writes them to output. Sampling sequences may be useful for a variety of purposes, including creating synthetic true positives for benchmarks or tests.

The default is to sample one unaligned sequence from the core probability model, which means that each sequence consists of one full-length domain. Alternatively, with the -c option, you can emit a simple majority-rule consensus sequence; or with the -a option, you can emit an alignment (in which case, you probably also want to set -N to something other than its default of 1 sequence per model).

As another option, with the -p option you can sample a sequence from a fully configured HMMER search profile. This means sampling a ‘homologous sequence’ by HMMER’s definition, including nonhomologous flanking sequences, local alignments, and multiple domains per sequence, depending on the length model and alignment mode chosen for the profile.

The hmmfile may contain a library of HMMs, in which case each HMM will be used in turn.

Options

Output Formats

Several output formats are available, each with different options.

Fasta

Fasta option is the easiest to understand, given an input model, it will produce N sequences in fasta format from that model.

Alignment

Produces a stockholm alignment, of what the Fasta output would have produced.

Majority-Rule Concensus Sequence

Emit a plurality-rule consensus sequence, instead of sampling a sequence from the profile HMM’s probability distribution. The consensus sequence is formed by selecting the maximum probability residue at each match state.

Fancier Concensus Sequence

Emit a fancier plurality-rule consensus sequence than the -c option. If the maximum probability residue has p < minl show it as a lower case ’any’ residue (n or x); if p >= minl and < minu show it as a lower case residue; and if p >= minu show it as an upper case residue. The default settings of minu and minl are both 0.0, which means -C gives the same output as -c unless you also set minu and minl to what you want.

Sample

Sample unaligned sequences from the implicit search profile, not from the core model. The core model consists only of the homologous states (between the begin and end states of a HMMER Plan7 model). The profile includes the nonhomologous N, C, and J states, local/glocal and uni/multihit algorithm configuration, and the target length model. Therefore sequences sampled from a profile may in- clude nonhomologous as well as homologous sequences, and may contain more than one homologous sequence segment. By default, the profile is in multihit local mode, and the target sequence length is configured for L=400.

Attribution

This Galaxy tool relies on HMMER3 from http://hmmer.janelia.org/ Internally the software is cited as:

# hmmscan :: search sequence(s) against a profile database
# HMMER 3.1 (February 2013); http://hmmer.org/
# Copyright (C) 2011 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The wrappers were written by Eric Rasche and is licensed under Apache2. The documentation is copied from the HMMER3 documentation.