Mercurial > repos > bgruening > join_files_on_column_fuzzy
comparison join_files_on_column_fuzzy.xml @ 3:22ec3c1a20cd draft default tip
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/join_files_on_column_fuzzy commit 3419a5a5e19a93369c8c20a39babe5636a309292
author | bgruening |
---|---|
date | Tue, 29 May 2018 15:34:31 -0400 |
parents | f2068690addc |
children |
comparison
equal
deleted
inserted
replaced
2:f2068690addc | 3:22ec3c1a20cd |
---|---|
34 <param argument="--header" type="boolean" checked="false" truevalue="--header" falsevalue="" label="Does the input files contain a header line" /> | 34 <param argument="--header" type="boolean" checked="false" truevalue="--header" falsevalue="" label="Does the input files contain a header line" /> |
35 <param argument="--add_distance" type="boolean" checked="false" truevalue="--add_distance" falsevalue="" label="Add an addional column with the calculated distance." /> | 35 <param argument="--add_distance" type="boolean" checked="false" truevalue="--add_distance" falsevalue="" label="Add an addional column with the calculated distance." /> |
36 | 36 |
37 <param name="merge_mode_select" type="select" label="Choose the mode of merging."> | 37 <param name="merge_mode_select" type="select" label="Choose the mode of merging."> |
38 <option value="closest" selected="True">Best match (in case of multiple best matches, only the first one is reported)</option> | 38 <option value="closest" selected="True">Best match (in case of multiple best matches, only the first one is reported)</option> |
39 <option value="distance">Matching with a defined distance</option> | 39 <option value="distance">All matches within the defined distance</option> |
40 </param> | 40 </param> |
41 <param name="units" display="radio" type="select" value="ppm_value" label="Choose the metrics of your distance" | 41 <param name="units" display="radio" type="select" value="ppm_value" label="Choose the metrics of your distance" |
42 help="ppm is useful for very small differences"> | 42 help="ppm is useful for very small differences"> |
43 <option value="absolute" selected="True">Absolute distance</option> | 43 <option value="absolute" selected="True">Absolute distance</option> |
44 <option value="ppm" >Distance in ppm</option> | 44 <option value="ppm" >Distance in ppm</option> |
115 </test> | 115 </test> |
116 </tests> | 116 </tests> |
117 <help> | 117 <help> |
118 <![CDATA[ | 118 <![CDATA[ |
119 | 119 |
120 Join two files on a common column. It is possible to provide an allowed difference between both values (currently only numbers) | 120 Join two files on a common column. It is necessary to provide an allowed difference between both values as the maximum absolute difference or maximum parts per million (ppm) for matching. |
121 as the absolute differece or as PPM. | |
122 | 121 |
123 Two modes are available: | 122 Two modes are available: |
124 | 123 |
125 1. In the **best match** mode only the rows are merged for the most similar (or identical) values. In case of multiple best matches, only the first one is reported. | 124 1) In the **best match** mode: For each value in file 1 only the best matching value of file 2 is reported. In case of multiple best matches, only the closest match is reported. |
125 2) In the **all matches** mode: All matches within the defined distance are reported. | |
126 | |
127 Be aware that file 1 is the template file and therefore the same value in file 2 can be matched to multiple values in file 1 | |
126 | 128 |
127 2. The **Matching with a defined distance** option will offer you the possibility | 129 |
128 to provide a distance between the two values of the columns. Is the calculates distance smaller or equal than the given distance the columns will be joined. You can specify the allowed distance as an absolute distance or as PPM. | 130 ------ |
131 | |
132 **Example** | |
133 | |
134 **Input file 1** :: | |
135 | |
136 1 | |
137 2 | |
138 3 | |
139 4 | |
140 5 | |
141 | |
142 | |
143 **Input file 2** :: | |
144 | |
145 1.1 | |
146 1.2 | |
147 2.2 | |
148 3.3 | |
149 4.4 | |
150 | |
151 | |
152 **Joined file1 and 2** with best match and absolute distance 0.3:: | |
153 | |
154 1 1.1 0.1 | |
155 2 2.2 0.2 | |
156 3 3.3 0.3 | |
157 | |
158 **Joined file1 and 2** with all matches and absolute distance 0.3:: | |
159 | |
160 1 1.1 0.1 | |
161 1 1.2 0.2 | |
162 2 2.2 0.2 | |
163 3 3.3 0.3 | |
129 | 164 |
130 | 165 |
131 ]]> | 166 ]]> |
132 </help> | 167 </help> |
133 <citations> | 168 <citations> |