Galaxy | Tool Preview

csvtk-replace (version 0.20.0+galaxy0)
Input a TSV or CSV file to work on
Multiple columns can be given if separated by a ' , '. Column numbers can be given too - ex. '1,2' will target columns 1 and 2. Please see the help section below for more detailed info
Regex to search column for. Input is structured as '(YOUR_INPUT_HERE)' so if your regex was just a period it would look like '(.)' as an input.
String to replace found data. Supports capture variables and special replacement symbols. - Capture Variables: $1 represents the text of the first submatch - {nr} inserts a record number starting from 1 - {kv} uses corresponding value of the key (captured variable $n) of a key-value file If using the special replacement symbols, the capture variable must be specified as ${1}!
Only specify a file if using {kv} in replacement string. The file must be tab delimited with one key/value pair per line. An example can be found in the help section below
ABC == abc
csvtk Global Parameters
csvtk Global Parameters 0

Csvtk - Replace Help

Info

Csvtk-replace is a tool that uses Regular Expressions (Regex) to match data in the specified column and replace it with the replacement string. Non-matched columns can be kept or filled with the Regex key or an input string

The regex input for this tool is structured such that your regular expression does not need to start with with quotes or brackets. You can start your expression with a ^ or just go straight into it

For example:

Using `.+` as an input would be used in the code as '(.+)'

Using ^(.+)$ as an input would yield an input in the code as '(^(.+)$)'

Single quotes are not allowed in text inputs!


Input Data

**Limitations of Input Data**

1. The CSV parser requires all the lines have same number of fields/columns.
    If your file has illegal rows, set the "Illegal Rows" parameter to "Yes" to pass your data through
    Even lines with spaces will cause error.
    Example bad table below.

2. By default, csvtk thinks files have header rows. If your file does not, set global parameter
    "Has Header Row" to "No"

3. Column names should be unique and are case sensitive!

4. Lines starting with "#" or "$" will be ignored, if in the header row

5. If " exists in tab-delimited files, set Lazy quotes global parameter to "Yes"

Example bad table:

Head 1 Head 2 Head 3 Head 3
1 2 3  
this will   break

Bad tables may work if both the "Ignore Illegal Rows" and "Ignore Empty Rows" global parameters are set to "Yes", But there is no guarentee of that!


Usage

You can use csvtk replace to any matched regex expressions with your input replacement string.

The replacement string has some unique properties that you can use too to better replace your data:

  • Replacement supports capture variables, like $1 which represents the text of the first submatch of the Regex
  • {nr} can be used to assign ascending numbers starting from 1 to each column
  • {kv} can be used to get the value of the key (captured variable $n) or a key-value file

A good Regular expressions cheat sheet that you can use to help yourself build regular expressions can be found at: https://regexr.com/

Replace Examples

  1. Replacement with {nr} and $1

Input file:

Name Animal
Bud Dog
Mittens Cat

Now if our regex was set to '.*' on column 2 and our replacement string was set to '{nr}-$1', the following output would be observed:

Name Animal
Bud 1-Dog
Mittens 2-Cat

  1. Replacement with {kv} file

Suppose you set up a key-value TAB separated file that looked as such:

Key     Value
Dog     Big
Cat     Small

And had a similar input file:

Name Animal
Bud Dog
Mittens Cat
Fuzzy Gerbil

Now if the regex was '.*' on column 2 with the replacement string as '{kv}'. Your output would look as such with 'No' fill specified:

Name Animal
Bud Big
Mittens Small
Fuzzy  

If you wanted to fill the blank cell you could set it to either:

  • String - the string you input (ex. 'NA') would fill up the blank cell.
  • Original value - would change the blank cell to 'Gerbil'

If your having trouble with the regular expressions, please play around with a builder, there are many others online and they are great resources to improve your regex statements or test them before use!


More Information

For information from the creators of csvtk, please visit their site at: https://bioinf.shenwei.me/csvtk/

Although be aware that some features may not be available and some small changes were made to work with Galaxy.

Notable changes from their documentation:

  • Cannot specify multiple file header names (IE cannot use "name;username" as a valid column match)
  • No single quotes / apostrophes allowed in text inputs