Galaxy |

cmsearch (version 1.1.4+galaxy0)

Sequence database:

Subject covariance models:

Covariance models:

Turn on the glocal alignment algorithm:

... global with respect to the query model and local with respect to the target database.

Calculate E-values as if the search space size is 'x' megabases (Mb):

Only search the bottom (Crick) strand of target sequences:

in the sequence database

Only search the top (Watson) strand of target sequences:

in the sequence database

Use the CYK algorithm, not Inside, to determine the final score of all hits:

Use the CYK algorithm to align hits:

By default, the Durbin/Holmes optimal accuracy algorithm is used, which finds the alignment that maximizes the expected accuracy of all aligned residues.

Skip truncated hit detection:

Allow full and truncated hits anywhere within sequences:

Turn off the null3 CM score corrections for biased composition:

This correction is not used during the HMM filter stages.

Set the maximum allowable CM DP matrix size to 'x' megabytes:

Set the maximum allowable CM search DP matrix size to 'x' megabytes.:

Options controlling acceleration heuristics:

These options are, in order from least strict (slowest but most sensitive) to most strict (fastest but least sensitive)

Options controlling model-specific reporting thresholds

Options controlling model-specific reporting thresholds 0

Inclusion thresholds:

Inclusion thresholds are stricter than reporting thresholds. Inclusion thresholds control which hits are considered to be reliable enough to be included in an output alignment or in a possible subsequent search round, or marked as significant (”!”) as opposed to questionable (”?”) in hit output.

reporting thresholds:

Reporting thresholds control which hits are reported in output files

Save a multiple alignment of all significant hits:

... those satisfying inclusion thresholds

Omit the alignment section from the main input:

This can greatly reduce the output volume

Include extra search pipeline statistics in the main output:

They include filter survival statistics for truncated hit detection and number of envelopes discarded due to matrix size overflows.

What it does

cmsearch belongs to the INFERNAL software package that allows you to make consensus RNA secondary structure profiles, and use them to search nucleic acid sequence databases for homologous RNAs, or to create new structure-based multiple sequence alignments. You can use your model to search for new homologues of your RNA family. cmsearch is used to search one or more covariance models (CMs) against a sequence database. cmsearch searches both strands of each sequence in the target database, and returns alignments for high scoring hits.

To build CMs from multiple alignments, see cmbuild (build covariance models).

Input

The CM query file must have been calibrated for E-values with cmcalibrate. As a special exception, any models CM query files that have zero basepairs need not be calibrated.

Options

Turn on the glocal alignment algorithm: global with respect to the query model and local with respect to the target database. By default, the local alignment algorithm is used which is local with respect to both the target sequence and the model. In local mode, the alignment to span two or more subsequences if necessary (e.g. if the structures of the query model and target sequence are only partially shared), allowing certain large insertions and deletions in the structure to be penalized differently than normal indels. Local mode performs better on empirical benchmarks and is significantly more sensitive for remote homology detection. Empirically, glocal searches return many fewer hits than local searches, so glocal may be desired for some applications. With Turn on the glocal alignment algorithm, all models must be calibrated, even those with zero basepairs.
Only search the bottom (Crick) strand of target sequences: Hits can occur on either the top (Watson) or bottom (Crick) strand of the target sequence. By default, both strands are searched.
Only search the top (Watson) strand of target sequences: Hits can occur on either the top (Watson) or bottom (Crick) strand of the target sequence. By default, both strands are searched.
Use the CYK algorithm, not Inside, to determine the final score of all hits: If selecting "yes", the CYK algorithm instead of the CM Inside algorithm (the SCFG analog of the HMM Forward algorithm) is used.
Use the CYK algorithm to align hits: By default, the Durbin/Holmes optimal accuracy algorithm is used, which finds the alignment that maximizes the expected accuracy of all aligned residues.
Turn off truncated hit detection: Turns off truncated hit detection and will reduce the running time most significantly for target files that include many short sequences.
Turn off all filters, and run non-banded Inside on every full-length target sequence: This increases sensitivity somewhat, at an extremely large cost in speed.
Turn off all HMM filter stages: The CYK filter, using QDBs, will be run on every full-length target sequence and will enforce a P-value threshold of 0.0001. Each subsequence that survives CYK will be passed to Inside, which will also use QDBs (but a looser set). This increases sensitivity somewhat, at a very large cost in speed.
Turn off the HMM SSV and Viterbi filter stages:Sets remaining HMM filter thresholds to 0.02 by default. This may increase sensitivity, at a significant cost in speed.
Inclusion thresholds: Use E-value - Use an E-value as the hit inclusion threshold. The default is 0.01, meaning that on average, about 1 false positive would be expected in every 100 searches with different query sequences. Use Bit Score - Instead of using E-values for setting the inclusion threshold, instead use a bit score as the hit inclusion threshold. By default this option is unset.

Output Options

reporting thresholds: Hits are ranked by statistical significance (E-value). By default, all hits with an E-value <= 10 are reported. The following options allow you to change the default E-value reporting thresholds, or to use bit score thresholds instead.

Output columns:

rank
E-value
score
bias
sequence
start
end
mdl
trunc
gc
description

---- --------- ------ ----- ----------- ------- ------- --- ----- ---- -----------

! 1.3e-18 71.5 0.0 NC_013790.1 362026 361955 - cm no 0.50 Methanobrevibacter ruminantium M1

! 3.3e-18 70.2 0.0 NC_013790.1 2585265 2585193 - cm no 0.60 Methanobrevibacter ruminantium M1

For further questions please refere to the Infernal Userguide.