annotate tools/unix_tools/awk_tool.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="cshl_awk_tool" name="awk">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description></description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <command interpreter="sh">awk_wrapper.sh $input $output '$file_data' '$FS' '$OFS'</command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 <param format="txt" name="input" type="data" label="File to process" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 <param name="FS" type="select" label="Input field-separator">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 <option value=",">comma (,)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 <option value=":">colons (:) </option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 <option value=" ">single space</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 <option value=".">dot (.)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 <option value="-">dash (-)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 <option value="|">pipe (|)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 <option value="_">underscore (_)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 <option selected="True" value="tab">tab</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 <param name="OFS" type="select" label="Output field-separator">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 <option value=",">comma (,)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20 <option value=":">colons (:)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 <option value=" ">space ( )</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 <option value="-">dash (-)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 <option value=".">dot (.)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 <option value="|">pipe (|)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 <option value="_">underscore (_)</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 <option selected="True" value="tab">tab</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30 <!-- Note: the parameter ane MUST BE 'url_paste' -
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 This is a hack in the galaxy library (see ./lib/galaxy/util/__init__.py line 142)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32 If the name is 'url_paste' the string won't be sanitized, and all the non-alphanumeric characters
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 will be passed to the shell script -->
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34 <param name="file_data" type="text" area="true" size="5x35" label="AWK Program" help="">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35 <validator type="expression" message="Invalid Program!">value.find('\'')==-1</validator>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41 <param name="input" value="unix_awk_input1.txt" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 <output name="output" file="unix_awk_output1.txt" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43 <param name="FS" value="tab" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 <param name="OFS" value="tab" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45 <param name="file_data" value="$2>0.5 { print $2*9, $1 }" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49 <data format="input" name="output" metadata_source="input" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53 **What it does**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55 This tool runs the unix **awk** command on the selected data file.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59 **TIP:** This tool uses the **extended regular** expression syntax (not the perl syntax).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62 **Further reading**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 - Awk by Example (http://www.ibm.com/developerworks/linux/library/l-awk1.html)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65 - Long AWK tutorial (http://www.grymoire.com/Unix/Awk.html)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 - Learn AWK in 1 hour (http://www.selectorweb.com/awk.html)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67 - awk cheat-sheet (http://cbi.med.harvard.edu/people/peshkin/sb302/awk_cheatsheets.pdf)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68 - Collection of useful awk one-liners (http://student.northpark.edu/pemente/awk/awk1line.txt)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72 **AWK programs**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74 Most AWK programs consist of **patterns** (i.e. rules that match lines of text) and **actions** (i.e. commands to execute when a pattern matches a line).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 The basic form of AWK program is::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78 pattern { action 1; action 2; action 3; }
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84 **Pattern Examples**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86 - **$2 == "chr3"** will match lines whose second column is the string 'chr3'
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87 - **$5-$4>23** will match lines that after subtracting the value of the fourth column from the value of the fifth column, gives value alrger than 23.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88 - **/AG..AG/** will match lines that contain the regular expression **AG..AG** (meaning the characeters AG followed by any two characeters followed by AG). (This is the way to specify regular expressions on the entire line, similar to GREP.)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89 - **$7 ~ /A{4}U/** will match lines whose seventh column contains 4 consecutive A's followed by a U. (This is the way to specify regular expressions on a specific field.)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90 - **10000 &lt; $4 &amp;&amp; $4 &lt; 20000** will match lines whose fourth column value is larger than 10,000 but smaller than 20,000
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91 - If no pattern is specified, all lines match (meaning the **action** part will be executed on all lines).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95 **Action Examples**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97 - **{ print }** or **{ print $0 }** will print the entire input line (the line that matched in **pattern**). **$0** is a special marker meaning 'the entire line'.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98 - **{ print $1, $4, $5 }** will print only the first, fourth and fifth fields of the input line.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99 - **{ print $4, $5-$4 }** will print the fourth column and the difference between the fifth and fourth column. (If the fourth column was start-position in the input file, and the fifth column was end-position - the output file will contain the start-position, and the length).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100 - If no action part is specified (not even the curly brackets) - the default action is to print the entire line.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
102
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
103
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
104
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
105
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
106
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
107
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
108
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
109
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
110 **AWK's Regular Expression Syntax**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
111
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
112 The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
113
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
114 - **( ) { } [ ] . * ? + \ ^ $** are all special characters. **\\** can be used to "escape" a special character, allowing that special character to be searched for.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
115 - **^** matches the beginning of a string(but not an internal line).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
116 - **(** .. **)** groups a particular pattern.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
117 - **{** n or n, or n,m **}** specifies an expected number of repetitions of the preceding pattern.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
118
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
119 - **{n}** The preceding item is matched exactly n times.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
120 - **{n,}** The preceding item ismatched n or more times.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
121 - **{n,m}** The preceding item is matched at least n times but not more than m times.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
122
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
123 - **[** ... **]** creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as **a-z**.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
124 - **.** Matches any single character except a newline.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
125 - ***** The preceding item will be matched zero or more times.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
126 - **?** The preceding item is optional and matched at most once.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
127 - **+** The preceding item will be matched one or more times.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
128 - **^** has two meaning:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
129 - matches the beginning of a line or string.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
130 - indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
131 - **$** matches the end of a line or string.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
132 - **\|** Separates alternate possibilities.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
133
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
134
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
135 **Note**: AWK uses extended regular expression syntax, not Perl syntax. **\\d**, **\\w**, **\\s** etc. are **not** supported.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
136
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
137 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
138 </tool>