Csvtk-replace is a tool that uses Regular Expressions (Regex) to match data in the specified column and replace it with the replacement string. Non-matched columns can be kept or filled with the Regex key or an input string
The regex input for this tool is structured such that your regular expression does not need to start with with quotes or brackets. You can start your expression with a ^ or just go straight into it
For example:
Using `.+` as an input would be used in the code as '(.+)' Using ^(.+)$ as an input would yield an input in the code as '(^(.+)$)'
Single quotes are not allowed in text inputs!
**Limitations of Input Data** 1. The CSV parser requires all the lines have same number of fields/columns. If your file has illegal rows, set the "Illegal Rows" parameter to "Yes" to pass your data through Even lines with spaces will cause error. Example bad table below. 2. By default, csvtk thinks files have header rows. If your file does not, set global parameter "Has Header Row" to "No" 3. Column names should be unique and are case sensitive! 4. Lines starting with "#" or "$" will be ignored, if in the header row 5. If " exists in tab-delimited files, set Lazy quotes global parameter to "Yes"
Example bad table:
Head 1 | Head 2 | Head 3 | Head 3 |
---|---|---|---|
1 | 2 | 3 | |
this | will | break |
Bad tables may work if both the "Ignore Illegal Rows" and "Ignore Empty Rows" global parameters are set to "Yes", But there is no guarentee of that!
You can use csvtk replace to any matched regex expressions with your input replacement string.
The replacement string has some unique properties that you can use too to better replace your data:
A good Regular expressions cheat sheet that you can use to help yourself build regular expressions can be found at: https://regexr.com/
Replace Examples
Input file:
Name | Animal |
---|---|
Bud | Dog |
Mittens | Cat |
Now if our regex was set to '.*' on column 2 and our replacement string was set to '{nr}-$1', the following output would be observed:
Name | Animal |
---|---|
Bud | 1-Dog |
Mittens | 2-Cat |
Suppose you set up a key-value TAB separated file that looked as such:
Key Value Dog Big Cat Small
And had a similar input file:
Name | Animal |
---|---|
Bud | Dog |
Mittens | Cat |
Fuzzy | Gerbil |
Now if the regex was '.*' on column 2 with the replacement string as '{kv}'. Your output would look as such with 'No' fill specified:
Name | Animal |
---|---|
Bud | Big |
Mittens | Small |
Fuzzy |
If you wanted to fill the blank cell you could set it to either:
If your having trouble with the regular expressions, please play around with a builder, there are many others online and they are great resources to improve your regex statements or test them before use!
For information from the creators of csvtk, please visit their site at: https://bioinf.shenwei.me/csvtk/
Although be aware that some features may not be available and some small changes were made to work with Galaxy.
Notable changes from their documentation: