Csvtk advanced filter (also called filter2) outputs rows that satisfy the input awk-like artithmetic/string expressions. Please see the documentation for further details and examples on how to write expressions.
Single quotes are not allowed in text inputs!
If your wanted column header has a space in it, use the column number. Example: Use $1 if column #1 is called "Colony Counts"
Supported operators and types:
- Modifiers: + - / * & | ^ ** % >> <<
- Comparators: > >= < <= == != =~ !~
- Logical ops: || &&
- Numeric constants, as 64-bit floating point (12345.678)
- String constants (double quotes: "foobar")
- Date constants (double quotes)
- Boolean constants: true false
- Parenthesis to control order of evaluation ( )
- Arrays (anything separated by , within parenthesis: (1, 2, "foo"))
- Prefixes: ! - ~
- Ternary conditional: ? :
- Null coalescence: ??
**Limitations of Input Data** 1. The CSV parser requires all the lines have same number of fields/columns. If your file has illegal rows, set the "Illegal Rows" parameter to "Yes" to pass your data through Even lines with spaces will cause error. Example bad table below. 2. By default, csvtk thinks files have header rows. If your file does not, set global parameter "Has Header Row" to "No" 3. Column names should be unique and are case sensitive! 4. Lines starting with "#" or "$" will be ignored, if in the header row 5. If " exists in tab-delimited files, set Lazy quotes global parameter to "Yes"
Example bad table:
Head 1 | Head 2 | Head 3 | Head 3 |
---|---|---|---|
1 | 2 | 3 | |
this | will | break |
Bad tables may work if both the "Ignore Illegal Rows" and "Ignore Empty Rows" global parameters are set to "Yes", But there is no guarentee of that!
Ex. Filter2 on one column:
Suppose we had the following table:
Culture Label | Cell Count | Dilution |
---|---|---|
ECo-1 | 2523 | 1000 |
LPn-1 | 100 | 1000000 |
LPn-2 | 4 | 1000 |
If we wanted to find all samples with the label LPn, we could use the filter expression '$1 =~ "LPn*"' to get the following output:
Culture Label | Cell Count | Dilution |
---|---|---|
LPn-1 | 100 | 1000000 |
LPn-2 | 4 | 1000 |
Note how $1 was used to get column 1 due to it containing a space
Ex2. Filter2 with multiple inputs:
Same input table
Culture Label | Cell Count | Dilution |
---|---|---|
ECo-1 | 2523 | 1000 |
LPn-1 | 100 | 1000000 |
LPn-2 | 4 | 1000 |
Now if we use the expression '$1 =~ "LPn*" && $Dilution > 1000' to filter on, we would pull out the only row that satisfies both conditions:
Culture Label | Cell Count | Dilution |
---|---|---|
LPn-1 | 100 | 1000000 |
Multiple names can be given if separated by a ' , '.
- ex. 'ID,Organism' would target the columns named ID and Organism for the function
Column names are case SeNsitive
Column numbers can also be given:
-ex. '1,2,3' or '1-3' for inputting columns 1-3.
You can also specify all but unwanted column(s) with a ' - '.
- ex. '-ID' would target all columns but the ID column
For information from the creators of csvtk, please visit their site at: https://bioinf.shenwei.me/csvtk/
Although be aware that some features may not be available and some small changes were made to work with Galaxy.
Notable changes from their documentation: