Galaxy | Tool Preview

Compute (version 2.0)
Dataset missing? See TIP below
Select Yes to be able to specify names for new columns and have them added to the header line. If you select No, the first line will be treated as a regular line: If it is empty or starts with a # character it will be skipped, otherwise the tool will attempt to compute the specified expression on it.
Expressions
Expressions 0
If yes, use fully expanded decimal representation when writing new columns with floating point values. To prevent scientific notation in just specific new columns, you can use numpy's format_float_positional function in the corresponding expression.
Error handlings
Error handling 0

TIP: If your data is not TAB delimited, use Text Manipulation->Convert


What it does

This tool computes an expression on every row of a dataset and appends or inserts the result as a new column (field).

Several expressions can be specified and will be applied sequentially to each row.

Expression rules


Simple examples

If this is your input:

chr1  151077881  151077918  2  200  -
chr1  151081985  151082078  3  500  +

computing "c4 * c5" will produce:

chr1  151077881  151077918  2  200  -   400
chr1  151081985  151082078  3  500  +  1500

You can also use this tool to evaluate expressions. For example, computing "c3 >= c2" for the input above will result in the following:

chr1  151077881  151077918  2  200  -  True
chr1  151081985  151082078  3  500  +  True

Similarly, computing "type(c2) == type(c3) will return:

chr1  151077881  151077918  2  200  -  True
chr1  151081985  151082078  3  500  +  True

Error handling

The tool will always fail on syntax errors in and other unrecoverable parsing errors with any of your expressions. For other problems, however, it offers control over how they should be handled:

  1. The default for "Autodetect column types" is "Yes", which means the tool will evaluate each column value as the type that Galaxy assumes for the column. This default behavior will allow you to write simpler expressions. The arithmetic expression "c4 * c5" from the first simple example, for instance, works only because Galaxy realizes that c4 and c5 are integer columns. Occasionally, this autodetection can cause issues. A common such situation are missing values in columns that Galaxy thinks are of numeric type. If you're getting errors like "Failed to convert some of the columns in line #X ...", a solution might be to turn off column type autodetection. The price you will have to pay for doing so is that now you will have to handle type conversions yourself. In the first example you would now have to use the epression: "int(c4) * int(c5)".

  2. By default, if any expression references columns that are not existing before that expression gets computed, the tool will fail, but you can uncheck the "Fail on references to non-existent columns" option. If you do so, the result will depend on your choice for "If an expression cannot be computed for a row" (see 3.)

  3. The default for rows, for which an expression fails to compute is, again, to fail the tool run, but you can also choose to:

    • skip the row on output

      This is a simple way to only keep lines conforming to an expected standard. It is also easy to mask problems with your expressions with this option so take a look at the results and try to understand what gets skipped and for what reasons (the stdout of the tool will contain information about both).

    • keep the row unchanged

      This can be a good solution if your input contains special separator lines that don't follow the general tabular format of other lines and you would like to keep those lines

    • produce an empty column value for the row

      This will use the empty string as a substitute for non-computable items. Different from the "keep the row unchanged option" the problematic line will have a column added or changed. This option is a good choice for inputs in which all rows have the same tabular layout where you want to make sure that the same is true for the output, i.e. that all output lines still have the same number of columns.

    • fill in a replacement value

      This option is very similar to the previous one, but lets you control the replacement value.

Example

In the following input:

chr1  151077881  151077918  2  200  -
chr1  151081985  151082078  3  500  +
chr1  151090031  151090938  4  700

the last line does not have a strand column. This violates the bed file format specification, which says that unknown strand is to be encoded as . in the strand column.

You can fix the file with the following tool run:

Add expression: c6

Mode of the operation: Replace

Use new column to replace column number: 6

Fail on references to non-existent columns: No

If an expression cannot be computed for a row: Fill in a replacement value

Replacement value: .