Mercurial > repos > bgruening > join_files_on_column_fuzzy
changeset 3:22ec3c1a20cd draft default tip
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/text_processing/join_files_on_column_fuzzy commit 3419a5a5e19a93369c8c20a39babe5636a309292
author | bgruening |
---|---|
date | Tue, 29 May 2018 15:34:31 -0400 |
parents | f2068690addc |
children | |
files | join_files_on_column_fuzzy.xml |
diffstat | 1 files changed, 41 insertions(+), 6 deletions(-) [+] |
line wrap: on
line diff
--- a/join_files_on_column_fuzzy.xml Fri Apr 13 03:30:12 2018 -0400 +++ b/join_files_on_column_fuzzy.xml Tue May 29 15:34:31 2018 -0400 @@ -36,7 +36,7 @@ <param name="merge_mode_select" type="select" label="Choose the mode of merging."> <option value="closest" selected="True">Best match (in case of multiple best matches, only the first one is reported)</option> - <option value="distance">Matching with a defined distance</option> + <option value="distance">All matches within the defined distance</option> </param> <param name="units" display="radio" type="select" value="ppm_value" label="Choose the metrics of your distance" help="ppm is useful for very small differences"> @@ -117,15 +117,50 @@ <help> <![CDATA[ -Join two files on a common column. It is possible to provide an allowed difference between both values (currently only numbers) -as the absolute differece or as PPM. +Join two files on a common column. It is necessary to provide an allowed difference between both values as the maximum absolute difference or maximum parts per million (ppm) for matching. Two modes are available: - 1. In the **best match** mode only the rows are merged for the most similar (or identical) values. In case of multiple best matches, only the first one is reported. +1) In the **best match** mode: For each value in file 1 only the best matching value of file 2 is reported. In case of multiple best matches, only the closest match is reported. +2) In the **all matches** mode: All matches within the defined distance are reported. + +Be aware that file 1 is the template file and therefore the same value in file 2 can be matched to multiple values in file 1 + + +------ + +**Example** + +**Input file 1** :: + + 1 + 2 + 3 + 4 + 5 + - 2. The **Matching with a defined distance** option will offer you the possibility - to provide a distance between the two values of the columns. Is the calculates distance smaller or equal than the given distance the columns will be joined. You can specify the allowed distance as an absolute distance or as PPM. +**Input file 2** :: + + 1.1 + 1.2 + 2.2 + 3.3 + 4.4 + + +**Joined file1 and 2** with best match and absolute distance 0.3:: + + 1 1.1 0.1 + 2 2.2 0.2 + 3 3.3 0.3 + +**Joined file1 and 2** with all matches and absolute distance 0.3:: + + 1 1.1 0.1 + 1 1.2 0.2 + 2 2.2 0.2 + 3 3.3 0.3 ]]>