Mercurial > repos > bgruening > rnaz_window
view rnazWindow.xml @ 0:fdfe3dcf8fc4 draft default tip
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/rna_team/rnaz commit d261ddb93500e1ea309845fa3989c87c6312583d-dirty
author | bgruening |
---|---|
date | Wed, 30 Jan 2019 04:12:58 -0500 |
parents | |
children |
line wrap: on
line source
<tool id="rnaz_window" name="RNAz windows" version="2.1"> <requirements> <requirement type="package" version="2.1">rnaz</requirement> </requirements> <command detect_errors="exit_code"><![CDATA[ rnazWindow.pl --window $window --slide $slide --max-length $maxlength --max-gap $maxgap --max-masked $maxmask --min-id $minid --min-seqs $minseqs --max-seqs $maxseqs --num-samples $numsamples --min-length $minlength --opt-id $optid --$forward_or_reverse #if $noref: $noref #end ifx '$input' > '$output' ]]></command> <inputs> <param format="txt" name="input" type="data" label="Input Alignment File" /> <param argument="--window" name="window" type="integer" value="120" min="80" label="Window size, default 120" /> <param argument="--slide" name="slide" type="integer" value="120" min="1" label="Window step size, default 120" /> <param argument="--max-length" name="maxlength" type="integer" value="120" label="Min size of block before slicing, default is window size 120" /> <param argument="--max-gap" name="maxgap" type="float" value="0.25" label="Maximum fraction of gaps, default 0.25" /> <param argument="--max-masked" name="maxmask" type="float" value="0.1" label="Maximum fraction of masked letters in sequence, default 0.1" /> <param argument="--min-id" name="minid" type="integer" value="50" label="Discard alignment windows with an overall mean pairwise identity smaller than X%. (Default: 50)" /> <param argument="--min-seqs" name="minseqs" type="integer" value="2" label="Minimum number of sequences in an alignment. Discard any windows with less than N sequences (Default:2)" /> <param argument="--max-seqs" name="maxseqs" type="integer" value="6" label="Maximum number of sequences in an alignment. Discard any windows with less than N sequences (Default:6)" /> <param argument="--num-samples" name="numsamples" type="integer" value="1" label="Number of different subsets of sequences that is sampled if there are more sequences in the alignment than --max-seqs (Default: 1)" /> <param argument="--min-length" name="minlength" type="integer" value="0" label="Minimum number of columns of an alignment slice. After removing sequences from the alignment, all-gap columns are removed. If the resulting alignment has fewer than N columns, the complete alignment is discarded." /> <param argument="--opt-id" name="optid" type="integer" value="80" label="If the number of sequences has to be reduced (see --max-seqs) a subset of sequences is chosen which is optimized for this value of mean pairwise identity. (In percent, default: 80)" /> <param name="forward_or_reverse" type="select" label="Scored strand"> <option value="forward">Score forward strand (-f)</option> <option value="reverse">Score reverse strand (-r)</option> <option value="both-strands" selected="true">Score both strands (-b)</option> </param> <param argument="--no-reference" name="noref" type="boolean" checked="false" truevalue="--no-reference" falsevalue="" label="By default the first sequence is interpreted as reference sequence. This means, for example, that if the reference sequence is removed during filtering steps the complete alignment is discarded. Also, if there are too many sequences in the alignment, the reference sequence is never removed when choosing an appropriate subset. Having a reference sequence is crucial if you are doing screens of genomic regions. For some other applications it might not be necessary and in such cases you can change the default behaviour by setting this option." /> </inputs> <outputs> <data name="output" format="txt" /> </outputs> <tests> <test> <param name="input" value="unknown.aln"/> <output name="output" file="unknown.aln.window"/> </test> <test> <param name="input" value="tRNA.maf"/> <output name="output" file="tRNA.maf.window"/> </test> </tests> <help> <![CDATA[ RNAz cannot score alignments longer than 400 columns. In practice, it is generally advisable that you score long alignments, say more than 200 columns, in shorter, overlapping windows. For general purpose screens we recommend a window size of 120. This window size appears large enough to detect local secondary structures within long ncRNAs and, on the other hand, small enough to find short secondary structures without loosing the signal in a much too long window Usage: rnazWindow.pl [options] [file] Options: -w, --window=N Size of the window (Default: 120) -s, --slide=N Step size (Default: 120) -m, --max-length Slice only alignments longer than N columns. This means blocks longer than the window size given by --window but shorter than N are kept intact and not sliced. Per default this length is set to the window size given by --window (or 120 by default). --max-gap=X Maximum fraction of gaps. If a reference sequence is used (i.e. "--no-reference" is not set), each sequence is compared to the reference sequence and if in the pairwise comparison the fraction of columns with gaps is higher than X the sequence is discarded. If no reference sequence is used, all sequences with a fraction of gaps higher than X are discarded. (Default: 0.25) --max-masked=X Maximum fraction of masked (=lowercase letters) in a sequence. All sequences with a fraction of more than X lowercase letters are discarded. This is usually used for excluding repeat sequences marked by RepeatMasker but any other information can be encoded by using lowercase letters. (Default: 0.1) --min-id=X Discard alignment windows with an overall mean pairwise identity smaller than X%. (Default: 50) --min-seqs=N Minimum number of sequences in an alignment. Discard any windows with less than N sequences (Default:2). --max-seqs=N Maximum number of sequences in an alignment. If the number of sequences in a window is higher than N, a subset of sequences is used with exactly N sequences. The greedy algorithm of the program "rnazSelectSeqs.pl" is used which optimizes for a user specified mean pairwise identity (see "--opt-id"). (Default: 6) --num-samples=N Number of different subsets of sequences that is sampled if there are more sequences in the alignment than "--max-seqs". (Default: 1) --min-length=N Minimum number of columns of an alignment slice. After removing sequences from the alignment, all-gap columns are removed. If the resulting alignment has fewer than N columns, the complete alignment is discarded. --opt-id=X If the number of sequences has to be reduced (see "--max-seqs") a subset of sequences is chosen which is optimized for this value of mean pairwise identity. (In percent, default: 80) --max-id=X One sequence from pairs with pairwise identity higher than X % this is removed (default: 99, i.e. only almost identical sequences are removed) NOT IMPLEMENTED --forward --reverse --both-strands Output forward, reverse complement or both of the sequences in the windows. Please note: "RNAz" has the same options, so if you use "rnazWindow.pl" for an RNAz screen, we recommend to set the option directly in "RNAz" and leave the default here. (Default: ---forward) --no-reference By default the first sequence is interpreted as reference sequence. This means, for example, that if the reference sequence is removed during filtering steps the complete alignment is discarded. Also, if there are too many sequences in the alignment, the reference sequence is never removed when choosing an appropriate subset. Having a reference sequence is crucial if you are doing screens of genomic regions. For some other applications it might not be necessary and in such cases you can change the default behaviour by setting this option. ]]> </help> <citations> <citation type="doi">10.1142/9789814295291_0009</citation> </citations> </tool>