Mercurial > repos > bgruening > text_processing
comparison easyjoin.xml @ 0:5314e5d6f040 draft
Imported from capsule None
author | bgruening |
---|---|
date | Thu, 29 Jan 2015 07:53:17 -0500 |
parents | |
children | 43b1f073b693 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:5314e5d6f040 |
---|---|
1 <tool id="tp_easyjoin_tool" name="Join" version="@BASE_VERSION@.0"> | |
2 <description>two files</description> | |
3 <macros> | |
4 <import>macros.xml</import> | |
5 </macros> | |
6 <expand macro="requirements"> | |
7 <requirement type="set_environment">TP_SCRIPT_PATH</requirement> | |
8 </expand> | |
9 <version_command>join --version | head -n 1</version_command> | |
10 <command> | |
11 <![CDATA[ | |
12 cp \$TP_SCRIPT_PATH/sort-header ./ && | |
13 chmod +x sort-header && | |
14 perl \$TP_SCRIPT_PATH/easyjoin | |
15 $jointype | |
16 -t ' ' | |
17 $header | |
18 -e '$empty_string_filler' | |
19 -o auto | |
20 $ignore_case | |
21 -1 '$column1' | |
22 -2 '$column2' | |
23 "$infile1" | |
24 "$infile2" | |
25 > '$output' | |
26 ]]> | |
27 </command> | |
28 <inputs> | |
29 <param name="infile1" format="tabular" type="data" label="1st file" /> | |
30 <param name="column1" label="Column to use from 1st file" type="data_column" data_ref="infile1" accept_default="true" /> | |
31 | |
32 <param name="infile2" format="txt" type="data" label="2nd File" /> | |
33 <param name="column2" label="Column to use from 2nd file" type="data_column" data_ref="infile2" accept_default="true" /> | |
34 | |
35 <param name="jointype" type="select" label="Output lines appearing in"> | |
36 <option value=" " selected="True">Both 1st & 2nd file.</option> | |
37 <option value="-v 1">1st but not in 2nd file. (-v 1)</option> | |
38 <option value="-v 2">2nd but not in 1st file. (-v 2)</option> | |
39 <option value="-a 1">Both 1st & 2nd file, plus unpairable lines from 1st file. (-a 1)</option> | |
40 <option value="-a 2">Both 1st & 2nd file, plus unpairable lines from 2st file. (-a 2)</option> | |
41 <option value="-a 1 -a 2">All lines [-a 1 -a 2]</option> | |
42 <option value="-v 1 -v 2">All unpairable lines [-v 1 -v 2]</option> | |
43 </param> | |
44 | |
45 <param name="header" type="boolean" checked="false" truevalue="--header" falsevalue="" | |
46 label="First line is a header line" help="Use if first line contains column headers. It will not be sorted." /> | |
47 <param name="ignore_case" type="boolean" checked="false" truevalue="-i" falsevalue="" | |
48 label="Ignore case" help="Sort and Join key column values regardless of upper/lower case letters." /> | |
49 <param name="empty_string_filler" type="text" size="20" value="0" label="Value to put in unpaired (empty) fields"> | |
50 <sanitizer> | |
51 <valid initial="string.printable"> | |
52 <remove value="'"/> | |
53 </valid> | |
54 </sanitizer> | |
55 </param> | |
56 </inputs> | |
57 <outputs> | |
58 <data name="output" format_source="infile1" metadata_source="infile1"/> | |
59 </outputs> | |
60 <tests> | |
61 <test> | |
62 <param name="infile1" value="easyjoin1.tabular" /> | |
63 <param name="column1" value="1" /> | |
64 <param name="infile2" value="easyjoin2.tabular" /> | |
65 <param name="column2" value="1" /> | |
66 <param name="header" value="True" /> | |
67 <param name="jointype" value="-a 1 -a 2" /> | |
68 <output name="output" file="easyjoin_result1.tabular" /> | |
69 </test> | |
70 </tests> | |
71 <help> | |
72 <![CDATA[ | |
73 **What it does** | |
74 | |
75 This tool joins two tabular files based on a common key column. | |
76 | |
77 ----- | |
78 | |
79 **Example** | |
80 | |
81 **First file**:: | |
82 | |
83 Fruit Color | |
84 Apple red | |
85 Banana yellow | |
86 Orange orange | |
87 Melon green | |
88 | |
89 **Second File**:: | |
90 | |
91 Fruit Price | |
92 Orange 7 | |
93 Avocado 8 | |
94 Apple 4 | |
95 Banana 3 | |
96 | |
97 **Joining** both files, using **key column 1** and a **header line**, will return:: | |
98 | |
99 Fruit Color Price | |
100 Apple red 4 | |
101 Avocado . 8 | |
102 Banana yellow 3 | |
103 Melon green . | |
104 Orange orange 7 | |
105 | |
106 .. class:: infomark | |
107 | |
108 * Input files need not be sorted. | |
109 * The header line (**Fruit Color Price**) was joined and kept as first line. | |
110 * Missing values ( Avocado's color, missing from the first file ) are replaced with a period character. | |
111 | |
112 @REFERENCES@ | |
113 ]]> | |
114 </help> | |
115 </tool> |