comparison lastz_d.xml @ 8:e7f19d6a9af8 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/lastz commit a7e9d5b3906b7ebb35b1c29c3a8e8203b2cefccd
author iuc
date Fri, 18 May 2018 16:58:38 -0400
parents 10aca14c2332
children 2ff111fac1d7
comparison
equal deleted inserted replaced
7:10aca14c2332 8:e7f19d6a9af8
1 <tool id="lastz_d_wrapper" name="LASTZ_D" version="1.3.1"> 1 <tool id="lastz_d_wrapper" name="LASTZ_D" version="1.3.2">
2 <description>: estimate substitution scores matrix</description> 2 <description>: estimate substitution scores matrix</description>
3 <macros> 3 <macros>
4 <import>lastz_macros.xml</import> 4 <import>lastz_macros.xml</import>
5 </macros> 5 </macros>
6 <requirements> 6 <requirements>
43 </test> 43 </test>
44 </tests> 44 </tests>
45 45
46 <help><![CDATA[ 46 <help><![CDATA[
47 47
48 **What is does** 48 **What is does**
49 49
50 LASTZ_D is a non-integer (**D** stands for Double) version of LASTZ that can be used to estimate substitution matrix that will be used to score alignments. It was developed by `Bob Harris <http://www.bx.psu.edu/~rsharris/>`_ in the lab of Webb Miller at Penn State as a part of LASTZ. Matrix computed by this tool is to be used by LASTZ (see below). 50 LASTZ_D is a non-integer (**D** stands for Double) version of LASTZ that can be used to estimate substitution matrix that will be used to score alignments. It was developed by `Bob Harris <http://www.bx.psu.edu/~rsharris/>`_ in the lab of Webb Miller at Penn State as a part of LASTZ. Matrix computed by this tool is to be used by LASTZ (see below).
51 51
52 .. class:: warningmark 52 .. class:: warningmark
53 53
54 **Read documentation** before proceeding. LASTZ is a complex tool with many parameter options. Fortunately, there is a `great manual <https://lastz.github.io/lastz/>`_ maintained by its author. The two sections that are particularly relevant to the inference of substitution matrix are `Inferring Score Sets <http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.00.html#adv_inference>`_ and `Inference Control File <http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.00.html#fmt_inference>`_. 54 **Read documentation** before proceeding. LASTZ is a complex tool with many parameter options. Fortunately, there is a `great manual <https://lastz.github.io/lastz/>`_ maintained by its author. The two sections that are particularly relevant to the inference of substitution matrix are `Inferring Score Sets <http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.00.html#adv_inference>`_ and `Inference Control File <http://www.bx.psu.edu/~rsharris/lastz/README.lastz-1.04.00.html#fmt_inference>`_.
55 55
56 **Notes on the inference** 56 **Notes on the inference**
57 57
58 Inference is achieved by computing the probability of each of the 18 different alignment events (gap open, gap extend, and 16 substitutions). These probabilities are estimated from alignments of the sequences. Of course, at first we don't have alignments, so the process begins by using a generic scoring set to create alignments, infer scores from those, then realign, and so on, until the scores stabilize or "converge". Ungapped alignments are performed until the substitution scores converge, then gapped alignments are performed (holding the substitution scores constant) until the gap penalties converge. In the end you get a matrix like this:: 58 Inference is achieved by computing the probability of each of the 18 different alignment events (gap open, gap extend, and 16 substitutions). These probabilities are estimated from alignments of the sequences. Of course, at first we don't have alignments, so the process begins by using a generic scoring set to create alignments, infer scores from those, then realign, and so on, until the scores stabilize or "converge". Ungapped alignments are performed until the substitution scores converge, then gapped alignments are performed (holding the substitution scores constant) until the gap penalties converge. In the end you get a matrix like this::
59 59
60 # (a LASTZ scoring set, created by "LASTZ --infer") 60 # (a LASTZ scoring set, created by "LASTZ --infer")
61 61
62 bad_score = X:-1781 # used for sub[X][*] and sub[*][X] 62 bad_score = X:-1781 # used for sub[X][*] and sub[*][X]
63 fill_score = -178 # used when sub[*][*] not otherwise defined 63 fill_score = -178 # used when sub[*][*] not otherwise defined
64 gap_open_penalty = 400 64 gap_open_penalty = 400
65 gap_extend_penalty = 30 65 gap_extend_penalty = 30
66 66
67 A C G T 67 A C G T
68 A 72 -79 -49 -97 68 A 72 -79 -49 -97
69 C -79 100 -178 -49 69 C -79 100 -178 -49
70 G -49 -178 100 -79 70 G -49 -178 100 -79
71 T -97 -49 -79 72 71 T -97 -49 -79 72
72 72
73 This dataset can then be used as an input to the **Read the substitution scores** parameter of LASTZ (Parameter section *Scoring*). 73 This dataset can then be used as an input to the **Read the substitution scores** parameter of LASTZ (Parameter section *Scoring*).
74 74
75 The iterative process can fail if there's not a lot of sequence to align. E.g. if after the 4th iteration there's nothing in the central 50% denominators go to zero and the process fails. 75 The iterative process can fail if there's not a lot of sequence to align. E.g. if after the 4th iteration there's nothing in the central 50% denominators go to zero and the process fails.
76 76
77 If the sequences you are aligning have GC content different than the usual ACGT 30-20-20-30 split, scoring inference should discover this and give you better alignments. 77 If the sequences you are aligning have GC content different than the usual ACGT 30-20-20-30 split, scoring inference should discover this and give you better alignments.
78 78
79 79
80 ]]> 80 ]]>
81 </help> 81 </help>
82 <expand macro="citations"/> 82 <expand macro="citations"/>
83 </tool> 83 </tool>