0
|
1 <tool id="MAF_To_Interval1" name="MAF to Interval" force_history_refresh="True">
|
|
2 <description>Converts a MAF formatted file to the Interval format</description>
|
|
3 <command interpreter="python">maf_to_interval.py $input1 $out_file1 $out_file1.id $__new_file_path__ $input1.dbkey $species $input1.metadata.species $complete_blocks $remove_gaps</command>
|
|
4 <inputs>
|
|
5 <param format="maf" name="input1" type="data" label="MAF file to convert"/>
|
|
6 <param name="species" type="select" label="Select additional species" display="checkboxes" multiple="true" help="The species matching the dbkey of the alignment is always included. A separate history item will be created for each species.">
|
|
7 <options>
|
|
8 <filter type="data_meta" ref="input1" key="species" />
|
|
9 <filter type="remove_value" meta_ref="input1" key="dbkey" />
|
|
10 </options>
|
|
11 </param>
|
|
12 <param name="complete_blocks" type="select" label="Exclude blocks which have a species missing">
|
|
13 <option value="partial_allowed">include blocks with missing species</option>
|
|
14 <option value="partial_disallowed">exclude blocks with missing species</option>
|
|
15 </param>
|
|
16 <param name="remove_gaps" type="select" label="Remove Gap characters from sequences">
|
|
17 <option value="keep_gaps">keep gaps</option>
|
|
18 <option value="remove_gaps">remove gaps</option>
|
|
19 </param>
|
|
20 </inputs>
|
|
21 <outputs>
|
|
22 <data format="interval" name="out_file1" />
|
|
23 </outputs>
|
|
24 <tests>
|
|
25 <test>
|
|
26 <param name="input1" value="4.maf" dbkey="hg17"/>
|
|
27 <param name="complete_blocks" value="partial_disallowed"/>
|
|
28 <param name="remove_gaps" value="keep_gaps"/>
|
|
29 <param name="species" value="panTro1" />
|
|
30 <output name="out_file1" file="maf_to_interval_out_hg17.interval"/>
|
|
31 <output name="out_file1" file="maf_to_interval_out_panTro1.interval"/>
|
|
32 </test>
|
|
33 </tests>
|
|
34 <help>
|
|
35
|
|
36 **What it does**
|
|
37
|
|
38 This tool converts every MAF block to a set of genomic intervals describing the position of that alignment block within a corresponding genome. Sequences from aligning species are also included in the output.
|
|
39
|
|
40 The interface for this tool contains several options:
|
|
41
|
|
42 * **MAF file to convert**. Choose multiple alignments from history to be converted to BED format.
|
|
43 * **Choose species**. Choose additional species from the alignment to be included in the output
|
|
44 * **Exclude blocks which have a species missing**. if an alignment block does not contain any one of the species found in the alignment set and this option is set to **exclude blocks with missing species**, then coordinates of such a block **will not** be included in the output (see **Example 2** below).
|
|
45 * **Remove Gap characters from sequences**. Gaps can be removed from sequences before they are output.
|
|
46
|
|
47
|
|
48 -----
|
|
49
|
|
50 **Example 1**: **Include only reference genome** (hg18 in this case) and **include blocks with missing species**:
|
|
51
|
|
52 For the following alignment::
|
|
53
|
|
54 ##maf version=1
|
|
55 a score=68686.000000
|
|
56 s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
57 s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
58 s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
59 s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
|
60 s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
|
61
|
|
62 a score=10289.000000
|
|
63 s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
64 s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
65 s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
66
|
|
67 the tool will create **a single** history item containing the following (**note** the name field is numbered iteratively: hg18_0_0, hg18_1_0 etc. where the first number is the block number and the second number is the iteration through the block (if a species appears twice in a block, that interval will be repeated) and sequences for each species are included in the order specified in the header: the field is left empty when no sequence is available for that species)::
|
|
68
|
|
69 #chrom start end strand score name canFam2 hg18 mm8 panTro2 rheMac2
|
|
70 chr20 56827368 56827443 + 68686.0 hg18_0_0 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
71 chr20 56827443 56827480 + 10289.0 hg18_1_0 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
72
|
|
73
|
|
74 -----
|
|
75
|
|
76 **Example 2**: **Include hg18 and mm8** and **exclude blocks with missing species**:
|
|
77
|
|
78 For the following alignment::
|
|
79
|
|
80 ##maf version=1
|
|
81 a score=68686.000000
|
|
82 s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
83 s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
84 s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
85 s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
|
86 s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
|
87
|
|
88 a score=10289.000000
|
|
89 s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
90 s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
91 s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
92
|
|
93 the tool will create **two** history items (one for hg18 and one for mm8) containing the following (**note** that both history items contain only one line describing the first alignment block. The second MAF block is not included in the output because it does not contain mm8):
|
|
94
|
|
95 History item **1** (for hg18)::
|
|
96
|
|
97 #chrom start end strand score name canFam2 hg18 mm8 panTro2 rheMac2
|
|
98 chr20 56827368 56827443 + 68686.0 hg18_0_0 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
99
|
|
100
|
|
101 History item **2** (for mm8)::
|
|
102
|
|
103 #chrom start end strand score name canFam2 hg18 mm8 panTro2 rheMac2
|
|
104 chr2 173910832 173910893 + 68686.0 mm8_0_0 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------- GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC- GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
105
|
|
106
|
|
107 -------
|
|
108
|
|
109 .. class:: infomark
|
|
110
|
|
111 **About formats**
|
|
112
|
|
113 **MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes.
|
|
114
|
|
115 - The .maf format is line-oriented. Each multiple alignment ends with a blank line.
|
|
116 - Each sequence in an alignment is on a single line.
|
|
117 - Lines starting with # are considered to be comments.
|
|
118 - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
|
|
119 - Some MAF files may contain two optional line types:
|
|
120
|
|
121 - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line;
|
|
122 - An "e" line containing information about the size of the gap between the alignments that span the current block.
|
|
123
|
|
124 ------
|
|
125
|
|
126 **Citation**
|
|
127
|
|
128 If you use this tool, please cite `Blankenberg D, Taylor J, Nekrutenko A; The Galaxy Team. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011 Sep 1;27(17):2426-2428. <http://www.ncbi.nlm.nih.gov/pubmed/21775304>`_
|
|
129
|
|
130
|
|
131 </help>
|
|
132 </tool>
|
|
133
|