Galaxy | Tool Preview

Table Compute (version 1.2.4+galaxy0)
Operations on single tables are better tailored towards more general use-cases. For multiple tables usage, all tables should ideally be of the same dimensions for simple operations, although complex operations are also supported for unequally sized tables (see Help section).
Advanced File Options s
Advanced File Options 0

Table Compute

This tool is a Galaxy wrapper for the Pandas Data Analysis Library in Python, for manipulating and computing expressions upon tabular data and matrices. It can perform functions on the element, row, and column basis, as well as sub-select, duplicate, replace, and perform general and custom expressions on rows, columns, and elements.

Only a single operation can be performed on the data. Multiple operations can be performed by chaining successive runs of this tool. This is to provide a more transparent workflow for complex operations.

Many of the examples given below relate to common research use-cases such as filtering large matrices for specific values, counting unique instances of elements, conditionally manipulating the data, and replacing unwanted values. Full table operations such as normalisation can be easily performed by scaling the data via mean/median/min/max (and many other) metrics, and general expressions can even be computed across multiple tables.

Examples

Example 1: Sub-selecting from a table

We have the following table:

. c1 c2 c3
g1 10 20 30
g2 3 6 9
g3 4 8 12
g4 81 6 3
and we want to duplicate c1 and remove c2. Also select g1 to g3 and add g2 at the end as well. This

would result in the output table:

. c1 c1 c3
g1 10 10 30
g2 3 3 9
g3 4 4 12
g2 3 3 9

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationDrop, keep or duplicate rows and columns
    • List of columns to select1,1,3
    • List of rows to select1:3,2
    • Keep duplicate columnsYes
    • Keep duplicate rowsYes

Example 2: Filter for rows with row sums less than 50

We have the following table:

. c1 c2 c3
g1 10 20 30
g2 3 6 9
g3 4 8 12
g4 81 6 3

and we want:

. c1 c2 c3
g2 3 6 9
g3 4 8 12

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationFilter rows or columns by their properties
    • FilterRows
    • Filter CriterionResult of function applied to columns/rows
      • Keep column/row if its observedSum
      • is< (Less Than)
      • this value50

Example 3: Count the number of values per row smaller than a specified value

We have the following table:

. c1 c2 c3
g1 10 20 30
g2 3 6 9
g3 4 8 12
g4 81 6 3

and we want to count how many elements in each row are smaller than 10, i.e., we want to obtain the following results table:

. vec
g1 0
g2 3
g3 2
g4 2

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationManipulate selected table elements
    • Operation to performCustom
      • Custom Expression on 'elem'elem < 10
    • Operate on elementsAll

Note: There are actually simpler ways to achieve our purpose, but here we are demonstrating the use of a custom expression.

After executing, we would then be presented with a table like so:

. c1 c2 c3
g1 False False False
g2 True True True
g3 True True False
g4 False True True

To get to our desired table, we would then process this table with the tool again:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationCompute Expression across Rows or Columns
    • CalculateSum
    • For eachRow

Executing this will sum all the 'True' values in each row. Note that the values must have no extra whitespace in them for this to work (e.g. 'True ' or ' True' will not be parsed correctly).

Example 4: Perform a scaled log-transformation conditionally

We want to perform a scaled log transformation on all values greater than 5, and set all other values to 1.

We have the following table:

. c1 c2 c3
g1 0 20 30
g2 3 0 9
g3 4 8 0
g4 81 0 0

and we want:

. c1 c2 c3
g1 1.00000000 0.1497866 0.1133732
g2 1.00000000 1.0000000 0.2441361
g3 1.00000000 0.2599302 1.0000000
g4 0.05425246 1.0000000 1.0000000

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationManipulate selected table elements
    • Operation to performCustom
      • Custom Expression(math.log(elem) / elem) if (elem > 5) else 1
      • Operate on elementsAll

Example 5: Perform a Full table operation

We have the following table:

. c1 c2 c3
g1 10 20 30
g2 3 10 9
g3 4 8 10
g4 81 10 10
and we want to subtract from each column the mean of that column divided by the standard

deviation of it to yield:

. c1 c2 c3
g1 9.351737 17.784353 28.550737
g2 2.351737 7.784353 7.550737
g3 3.351737 5.784353 8.550737
g4 80.351737 7.784353 8.550737

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationPerform a Full Table Operation
    • OperationCustom
    • Custom Expression on 'table' along axis (0 or 1)table - table.mean(0)/table.std(0)

Example 6: Perform operations on multiple tables

We have the following three input tables:

Table 1

. c1 c2 c3
g1 10 20 30
g2 3 10 9
g3 4 8 10

Table 2

. c1 c2
g1 1 2
g2 3 4
g3 6 5

Table 3

. c1 c2 c3
g1 1 2 3
g2 1 2 3

Note that the dimensions of these tables do not match.

Dimensions:
  • Table1 [3,3]
  • Table2 [3,2]
  • Table3 [2,3]

In order to perform simple operations between Tables, they must be of the same dimensions.

To add Table2 to Table3 we would have to transpose one of the tables using the in-built T method:

table2 + table3.T

or:

table2.T + table3

We can also perform more general operations using all 3 tables, such as taking the minimum value of the maximum values of Table2 and Table3, and dividing the Table1 values by it:

table1 / min(table2.values.max(), table3.values.max())

To perform these types of operations in Galaxy we would select the following:

  • Input Single or Multiple TablesMultiple Tables

  • (For each inserted table)
    • Column names on first row?Yes
    • Row names on first column?Yes
  • Custom Expression

    <insert your desired function>
    

Please note that the last example shown above was chosen to illustrate the limitations of the tool. Nested attributes like table2.values.max are disallowed in expressions in the tool so the above would have to be replaced with the harder to read workaround:

table1 / min(np.max(np.max(table2)), np.max(np.max(table3)))

Complex operations (like ones that would benefit from specifying nested attributes) can often be broken into subsequent runs ot the tool, in which the first run generates an intermediate table representing the result of the "inner" operation that the second run can then use as input to perform the "outer" operation.

Also note that, currently min(), max() and sum() are the only built-in Python functions that can be used inside expressions. If you want to use additional functions, these have to be qualified functions from the math, np or pd libraries.

Example 7: Melt

We have the following table

. A B C
0 a B 1
1 b B 3
2 c B 5

and we want:

. A variable value
0 a B B
1 b B B
2 c B B
3 a C 1
4 b C 3
5 c C 5

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationPerform a Full Table Operation
    • OperationMelt
    • Variable IDsA
    • Unpivoted IDsB,C

This converts the "B" and "C" columns into variables.

Example 8: Pivot

We have the following table

. foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t

and we want:

. A B C
one 1 2 3
two 4 5 6

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationPerform a Full Table Operation
    • OperationPivot
    • Indexfoo
    • Columnbar
    • Valuesbaz
This splits the matrix using "foo" and "bar" using only the values from "baz". Header values
may contain extra information.

Example 9: Replacing text in specific rows or columns

We have the following table

. c1 c2 c3
g1 10 20 30
g2 3 3 9
g3 4 8 12
g4 81 6 3

and we want to add "chr" to the elements in column 2 AND rows 2 and 4:

. c1 c2 c3
g1 10 20 30
g2 3 chr3 9
g3 4 8 12
g4 81 chr6 3

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table

  • Column names on first row?Yes

  • Row names on first column?Yes

  • Type of table operationManipulate selected table elements

    • Operation to performReplace values

      • Replacement valuechr{elem:.0f}

        Here, the placeholder {elem} lets us refer to each element's current value, while the :.0f part is a format specifier that makes sure numbers are printed without decimals (for a complete description of the available syntax see the Python Format Specification Mini-Language).

      • Operate on elementsSpecific Rows and/or Columns

      • List of columns to select2

      • List of rows to select2,4

      • Inclusive SelectionNo

If we wanted to instead add "chr" to the ALL elements in column 2 and rows 2 and 4, we

would repeat the steps above but set the Inclusive Selection to "Yes", to give:

. c1 c2 c3
g1 10 chr20 30
g2 chr3 chr3 chr9
g3 4 8 12
g4 chr81 chr6 chr3

Example 10: Pivot Table with unified Aggregator

For an input table of:

A B C D E
foo one small 1 2
foo one large 2 4
foo one large 2 5
foo two small 3 5
foo two small 3 6
bar one large 4 6
bar one small 5 8
bar two small 6 9
bar two large 7 9

we wish to pivot the table with the 'A' column as the new row index and use the values of the column 'C' as the new column indexes, based on the aggregated values of 'D'. By default the aggregator function is the mean, but here we will instead pick the max, to yield:

C l s
A    
bar 7 6
foo 2 3

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationPerform a Full Table Operation
    • OperationPivot
    • IndexA
    • ColumnC
    • ValuesD
    • Aggregator Functionmax

Example 11: Pivot Table with custom Aggregrator

For an input table of:

Name Position City Age Random
Mary Manager Boston 34 0.678577
Josh Programmer New York 37 0.973168
Jon Manager Chicago 29 0.146668
Lucy Manager Los Angeles 40 0.150120
Jane Programmer Chicago 29 0.112769
Sue Programmer Boston 31 0.185198

we wish to pivot the table with the 'Position' column as the new index and transform the 'Age' and 'Random' columns to have mean and standard deviation values

Position Age Random Random
. mean mean std
Manager 34.333333 0.325122 0.306106
Programmer 32.333333 0.423712 0.477219

In Galaxy we would select the following:

  • Input Single or Multiple TablesSingle Table
  • Column names on first row?Yes
  • Row names on first column?Yes
  • Type of table operationPerform a Full Table Operation
    • OperationPivot
    • IndexPosition
    • Column-Function Mapping
      • Value ColumnAge
        • Functionmean
      • Value ColumnRandom
        • Functionmean
        • Functionstd
This splits the matrix using "foo" and "bar" using only the values from "baz". Header values
may contain extra information.