Mercurial > repos > bgruening > rnaz_window

<tool id="rnaz_window" name="RNAz windows" version="2.1">
    <requirements>
        <requirement type="package" version="2.1">rnaz</requirement>
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
    rnazWindow.pl
        --window $window
        --slide $slide
        --max-length $maxlength
        --max-gap $maxgap
        --max-masked $maxmask
        --min-id $minid
        --min-seqs $minseqs
        --max-seqs $maxseqs
        --num-samples $numsamples
        --min-length $minlength
        --opt-id $optid
        --$forward_or_reverse
        #if $noref:
            $noref
        #end ifx
        '$input'
    > '$output'
    ]]></command>
    <inputs>
        <param format="txt" name="input" type="data" label="Input Alignment File" />
        <param argument="--window" name="window" type="integer" value="120" min="80" label="Window size, default 120" />
        <param argument="--slide" name="slide" type="integer" value="120" min="1" label="Window step size, default 120" />
        <param argument="--max-length" name="maxlength" type="integer" value="120" label="Min size of block before slicing, default is window size 120" />
        <param argument="--max-gap" name="maxgap" type="float" value="0.25" label="Maximum fraction of gaps, default 0.25" />
        <param argument="--max-masked" name="maxmask" type="float" value="0.1" label="Maximum fraction of masked letters in sequence, default 0.1" />
        <param argument="--min-id" name="minid" type="integer" value="50"
            label="Discard alignment windows with an overall mean pairwise identity smaller than X%. (Default: 50)" />
        <param argument="--min-seqs" name="minseqs" type="integer" value="2"
            label="Minimum number of sequences in an alignment. Discard any windows with less than N sequences (Default:2)" />
        <param argument="--max-seqs" name="maxseqs" type="integer" value="6"
            label="Maximum number of sequences in an alignment. Discard any windows with less than N sequences (Default:6)" />
        <param argument="--num-samples" name="numsamples" type="integer" value="1"
            label="Number of different subsets of sequences that is sampled if there are more sequences in the alignment than --max-seqs (Default: 1)" />
        <param argument="--min-length" name="minlength" type="integer" value="0"
            label="Minimum number of columns of an alignment slice. After removing sequences from the alignment, all-gap columns are removed. If the resulting alignment has fewer than N columns, the complete alignment is discarded." />
        <param argument="--opt-id" name="optid" type="integer" value="80"
            label="If the number of sequences has to be reduced (see --max-seqs) a subset of sequences is chosen which is optimized for this value of mean pairwise identity. (In percent, default: 80)" />
        <param name="forward_or_reverse" type="select" label="Scored strand">
            <option value="forward">Score forward strand (-f)</option>
            <option value="reverse">Score reverse strand (-r)</option>
            <option value="both-strands" selected="true">Score both strands (-b)</option>
        </param>
        <param argument="--no-reference" name="noref" type="boolean" checked="false" truevalue="--no-reference" falsevalue=""
            label="By default the first sequence is interpreted as reference sequence. This means, for example, that if the reference sequence is removed during filtering steps the complete alignment is discarded. Also, if there are too many sequences in the alignment, the reference sequence is never removed when choosing an appropriate subset. Having a reference sequence is crucial if you are doing screens of genomic regions. For some other applications it might not be necessary and in such cases you can change the default behaviour by setting this option." />
    </inputs>

    <outputs>
        <data name="output" format="txt" />
    </outputs>

    <tests>
        <test>
            <param name="input" value="unknown.aln"/>
            <output name="output" file="unknown.aln.window"/>
        </test>
        <test>
            <param name="input" value="tRNA.maf"/>
            <output name="output" file="tRNA.maf.window"/>
        </test>
    </tests>

    <help>
        <![CDATA[

RNAz cannot score alignments longer than 400
columns. In practice, it is generally advisable that you score
long alignments, say more than 200 columns, in shorter, overlapping
windows.  For general purpose screens we recommend a window
size of 120.  This window size appears large enough to detect
local secondary structures within long ncRNAs and, on the
other hand, small enough to find short secondary structures
without loosing the signal in a much too long window

Usage: rnazWindow.pl [options] [file]
Options:
-w, --window=N Size of the window (Default: 120)

-s, --slide=N Step size (Default: 120)

-m, --max-length Slice only alignments longer than N columns. This
means blocks longer than the window size given by --window but shorter
than N are kept intact and not sliced. Per default this length is set
to the window size given by --window (or 120 by default).

--max-gap=X Maximum fraction of gaps. If a reference sequence is used
(i.e.  "--no-reference" is not set), each sequence is compared to the
reference sequence and if in the pairwise comparison the fraction of
columns with gaps is higher than X the sequence is discarded. If no
reference sequence is used, all sequences with a fraction of gaps
higher than X are discarded. (Default: 0.25)

--max-masked=X Maximum fraction of masked (=lowercase letters) in a
sequence.  All sequences with a fraction of more than X lowercase
letters are discarded. This is usually used for excluding repeat
sequences marked by RepeatMasker but any other information can be
encoded by using lowercase letters. (Default: 0.1)

--min-id=X Discard alignment windows with an overall mean pairwise
identity smaller than X%. (Default: 50)

--min-seqs=N Minimum number of sequences in an alignment. Discard any
windows with less than N sequences (Default:2).

--max-seqs=N Maximum number of sequences in an alignment. If the
number of sequences in a window is higher than N, a subset of
sequences is used with exactly N sequences. The greedy algorithm of
the program "rnazSelectSeqs.pl" is used which optimizes for a user
specified mean pairwise identity (see "--opt-id"). (Default: 6)

--num-samples=N Number of different subsets of sequences that is
sampled if there are more sequences in the alignment than
"--max-seqs".  (Default: 1)

--min-length=N Minimum number of columns of an alignment slice. After
removing sequences from the alignment, all-gap columns are
removed. If the resulting alignment has fewer than N columns, the
complete alignment is discarded.

--opt-id=X If the number of sequences has to be reduced (see
"--max-seqs") a subset of sequences is chosen which is optimized for
this value of mean pairwise identity. (In percent, default: 80)

--max-id=X One sequence from pairs with pairwise identity higher than
X % this is removed (default: 99, i.e. only almost identical sequences
are removed) NOT IMPLEMENTED

--forward --reverse --both-strands Output forward, reverse complement
or both of the sequences in the windows. Please note: "RNAz" has the
same options, so if you use "rnazWindow.pl" for an RNAz screen, we
recommend to set the option directly in "RNAz" and leave the default
here. (Default: ---forward)

--no-reference By default the first sequence is interpreted as
reference sequence. This means, for example, that if the reference
sequence is removed during filtering steps the complete alignment is
discarded. Also, if there are too many sequences in the alignment, the
reference sequence is never removed when choosing an appropriate
subset. Having a reference sequence is crucial if you are doing
screens of genomic regions. For some other applications it might not
be necessary and in such cases you can change the default behaviour by
setting this option.

        ]]>
    </help>
    <citations>
        <citation type="doi">10.1142/9789814295291_0009</citation>
    </citations>
</tool>
author	bgruening
date	Wed, 30 Jan 2019 04:12:58 -0500
parents
children