What it does
This tool runs the unix awk command on the selected data file.
TIP:
This tool uses the extended regular expression syntax (not the perl syntax).
\d, \w, \s etc. are not supported.
Further reading
AWK programs
Most AWK programs consist of patterns (i.e. rules that match lines of text) and actions (i.e. commands to execute when a pattern matches a line).
The basic form of AWK program is:
pattern { action 1; action 2; action 3; }
Pattern Examples
- $2 == "chr3" will match lines whose second column is the string 'chr3'
- $5-$4>23 will match lines that after subtracting the value of the fourth column from the value of the fifth column, gives value alrger than 23.
- /AG..AG/ will match lines that contain the regular expression AG..AG (meaning the characeters AG followed by any two characeters followed by AG). (This is the way to specify regular expressions on the entire line, similar to GREP.)
- $7 ~ /A{4}U/ will match lines whose seventh column contains 4 consecutive A's followed by a U. (This is the way to specify regular expressions on a specific field.)
- 10000 < $4 && $4 < 20000 will match lines whose fourth column value is larger than 10,000 but smaller than 20,000
- BEGIN will be executed once only, before the first input record is read.
- If no pattern is specified, all lines match (meaning the action part will be executed on all lines).
Action Examples
- { print } or { print $0 } will print the entire input line (the line that matched in pattern). $0 is a special marker meaning 'the entire line'.
- { print $1, $4, $5 } will print only the first, fourth and fifth fields of the input line.
- { print $4, $5-$4 } will print the fourth column and the difference between the fifth and fourth column. (If the fourth column was start-position in the input file, and the fifth column was end-position - the output file will contain the start-position, and the length).
- { FS = "," } can be used to change the field separator (delimeter) for parsing the input file.
- If no action part is specified (not even the curly brackets) - the default action is to print the entire line.
AWK's Regular Expression Syntax
The select tool searches the data for lines containing or not containing a match to the given pattern. A Regular Expression is a pattern descibing a certain amount of text.
- ( ) { } [ ] . * ? + ^ $ are all special characters. \ can be used to "escape" a special character, allowing that special character to be searched for.
- ^ matches the beginning of a string(but not an internal line).
- ( .. ) groups a particular pattern.
- { n or n, or n,m } specifies an expected number of repetitions of the preceding pattern.
- {n} The preceding item is matched exactly n times.
- {n,} The preceding item ismatched n or more times.
- {n,m} The preceding item is matched at least n times but not more than m times.
- [ ... ] creates a character class. Within the brackets, single characters can be placed. A dash (-) may be used to indicate a range such as a-z.
- . Matches any single character except a newline.
- * The preceding item will be matched zero or more times.
- ? The preceding item is optional and matched at most once.
- + The preceding item will be matched one or more times.
- ^ has two meaning:
- matches the beginning of a line or string.
- indicates negation in a character class. For example, [^...] matches every character except the ones inside brackets.
- $ matches the end of a line or string.
- | Separates alternate possibilities.