Mercurial > repos > bgruening > infernal

--- a/cmbuild.xml	Fri Mar 04 07:24:53 2016 -0500
+++ b/cmbuild.xml	Mon Dec 19 15:27:06 2016 -0500
@@ -1,4 +1,4 @@
-<tool id="infernal_cmbuild" name="Build covariance models" version="1.1.0.1">
+<tool id="infernal_cmbuild" name="Build covariance models" version="1.1.0.2">
     <description>from sequence alignments (cmbuild)</description>
     <parallelism method="multi" split_inputs="alignment_infile" split_mode="to_size" split_size="10" shared_inputs="" merge_outputs="cmfile_outfile"></parallelism>
     <requirements>
@@ -8,14 +8,11 @@
     </requirements>
     <command>
 <![CDATA[
-        cmbuild
+        cmbuild -F
             #if $is_summery_output:
-                -o $summary_outfile
+                -o '$summary_outfile'
             #end if

-            ## to many outputs, is that one really needed?
-            ##-O $annotated_source_alignment_outfile
-
             $model_construction_opts.model_construction_opts_selector
             #if $model_construction_opts.model_construction_opts_selector == '--fast':
                 --symfrac $model_construction_opts.symfrac
@@ -54,9 +51,12 @@
                 $cyk
             #end if

-            $cmfile_outfile
-            $alignment_infile
-
+            '$cmfile_outfile'
+            '$alignment_infile'
+      &&
+      cmcalibrate
+        -L 0.01 --cpu \${GALAXY_SLOTS:-2}
+        '$cmfile_outfile'
 ]]>
     </command>
         <inputs>
@@ -98,11 +98,10 @@
                 </when>
             </conditional>

-
             <conditional name="effective_opts">
                 <param name="effective_opts_selector" type="select" label="Options controlling effective sequence number" help="">
-                    <option value="--eent" selected="true">entropy weighting strategy (--eent)</option>
-                    <option value="--enone">Turn off the entropy weighting strategy (--enone)</option>
+                    <option value="--eent" >entropy weighting strategy (--eent)</option>
+                    <option value="--enone" selected="true">Turn off the entropy weighting strategy (--enone)</option>
                 </param>
                 <when value="--enone"/>
                 <when value="--eent">
@@ -120,7 +119,6 @@
                 </when>
             </conditional>

-
             <conditional name="refining_opts">
                 <param name="refining_opts_selector" type="select" label="Options for refining the input alignment" help="">
                     <option value="" selected="true">No refinement</option>
@@ -157,13 +155,11 @@
                 </when>
             </conditional>

-
             <param name="is_summery_output" truevalue="" falsevalue="" checked="False" type="boolean"
                 label="Output a summery file?" help=""/>

         </inputs>
     <outputs>
-
         <data format="text" name="summary_outfile" label="cmbuild summary on ${on_string}">
             <filter>is_summery_output is True</filter>
         </data>
@@ -183,14 +179,12 @@
     <help>
 <![CDATA[

-
 **What it does**

 cmbuild belongs to the INFERNAL software package that allows you to make consensus RNA secondary structure profiles, and use them to search nucleic acid sequence databases for homologous RNAs, or to create new structure-based multiple sequence alignments.

 cm build builds a covariance model of an RNA multiple alignment. cmbuild uses the consensus structure to determine the architecture of the CM.

-
 **Input**

 Input file is a multiple sequence alignment file in Stockholm or SELEX format, and must contain consensus secondary structure annotation.
@@ -199,11 +193,17 @@
 Example: simple example of a multiple RNA sequence alignment with secondary structure annotation

 # STOCKHOLM 1.0
+
 tRNA1             GCGGAUUUAGCUCAGUUGGG.AGAGCGCCAGACUGAAGAUCUGGAGGUCC
+
 tRNA2             UCCGAUAUAGUGUAAC.GGCUAUCACAUCACGCUUUCACCGUGGAGA.CC
+
 tRNA3             UCCGUGAUAGUUUAAU.GGUCAGAAUGGGCGCUUGUCGCGUGCCAGA.UC
+
 tRNA4             GCUCGUAUGGCGCAGU.GGU.AGCGCAGCAGAUUGCAAAUCUGUUGGUCC
+
 tRNA5             GGGCACAUGGCGCAGUUGGU.AGCGCGCUUCCCUUGCAAGGAAGAGGUCA
+
 #=GC SS_cons      <<<<<<<..<<<<.........>>>>.<<<<<.......>>>>>.....<


@@ -212,9 +212,9 @@
 The output of cmbuild contains information about the size of your input alignment (in aligned columns
 and # of sequences), and about the size of the resulting model.

-In addition to writing CM(s) to the output file, cmbuild also outputs a single line for each model created to stdout.
-Each line has the following fields:
-- aln: the index of the alignment used to build the CM
+In addition to writing CM(s) to the output file, cmbuild also outputs a single line for each model created to stdout.
+Each line has the following fields:
+- aln: the index of the alignment used to build the CM
 - idx: the index of the CM in the output file
 - name: the name of the CM
 - nseq: the number of sequences in the alignment used to build the CM
@@ -230,7 +230,6 @@

 **Options controlling model construction**

-
 These options control how consensus columns are defined in an alignment.

   - *--fast*: Define consensus columns automatically as those that have a fraction >= symfrac of residues as opposed to gaps. (See below for the --symfrac option.) This is the default.
@@ -263,8 +262,6 @@
   - *--ehmmre*: Set the target HMM mean match state relative entropy. Entropy for basepairing match states is calculated using marginalized basepair emission probabilities.
   - *--eset*: Set the effective sequence number for entropy weighting.

-
-
 **Options for refining the input alignment**

   - *--refine*: Attempt to refine the alignment before building the CM using expectation-maximization (EM). A CM is first built from the initial alignment as usual. Then, the sequences in the alignment are realigned optimally (with the HMM banded CYK algorithm, optimal means optimal given the bands) to the CM, and a new CM is built from the resulting alignment. The sequences are then realigned to the new CM, and a new CM is built from that alignment. This is continued until convergence, specifically when the alignments for two successive iterations are not significantly different (the summed bit scores of all the sequences in the alignment changes less than 1% between two successive iterations).
@@ -273,17 +270,13 @@
   - *--Random seed*: Seed the random number generator with an integer >= 0. This option can only be used in combination with --gibbs. If the given number is nonzero, stochastic sampling of alignments will be reproducible; the same command will give the same results. If the given number is 0, the random number generator is seeded arbitrarily, and stochastic samplings may vary from run to run of the same command. The default seed is 0.
    - *--Turn off the truncated alignment algorithm*: With --refine, turn off the truncated alignment algorithm. There is more information on this in the cmalign manual page.
   - *--cyk algorithm*: With --refine, align with the CYK algorithm. By default the optimal accuracy algorithm is used. There is more information on this in the cmalign manual page.
-
-

 For further questions please refere to the Infernal Userguide_.

 .. _Userguide: http://selab.janelia.org/software/infernal/Userguide.pdf

-
 ]]>
     </help>
-
     <citations>
         <citation type="doi">10.1093/bioinformatics/btt509</citation>
         <citation type="bibtex">
@@ -295,5 +288,4 @@
             }
         </citation>
     </citations>
-
 </tool>