Galaxy | Tool Preview

Column Join (version 1.1.0)
All columns to left of selected column (plus selected column) will be used. Select 2 for pileup
Multi-select list - hold the appropriate key while clicking to select multiple columns
Additional Inputs
Additional Input 0

What it does

This tool allows you to join several files with the same column structure into one file, removing certain columns if necessary. The user needs to select a 'hinge', which is the number of left-most columns to match on. They also need to select the columns to include in the join, which should include the hinge columns, too.

Note that the files are expected to have the same number of columns. If for some reason the join column is missing (this only applies to the last column(s)), the tool attempts to handle this situation by inserting an empty item (or the appropriate filler) for that column on that row. This could lead to the situation where a row has a hinge but entirely empty or filled columns, if the hinge exists in at least one file but every file that has it is missing the join column. Also, note that the tool does not distinguish between a file missing the hinge altogether and a file having the hinge but missing the column (in both cases the column would be empty or filled). There is an example of this below.


General Example

Given the following files:

FILE 1
chr2    1    T    6    .C...,     I$$III
chr2    2    G    6    ..N..,     III@II
chr2    3    C    7    ..C...,    I$IIIII
chr2    4    G    7    .G....,    I#IIIII
chr2    5    G    7    ...N..,    IIII#BI
chr2    6    A    7    ..T...,    I$IDIII
chr1    1    C    1    ^:.        I
chr1    2    G    2    .^:.       $I
chr1    3    A    2    ..         I%
chr1    4    C    2    ..         I$
chr1    5    T    3    ..^:.      I#I
chr1    6    G    3    ..^:,      I#I

FILE 2
chr1    3    T    1    ^:.        I
chr1    4    G    2    .^:.       $I
chr1    5    T    2    ..         I%
chr1    6    C    3    ..^:.      III
chr1    7    G    3    ..^:.      I#I
chr1    8    T    4    ...^:,     I#II
chr2    77   C    6    .G...,     I$$III
chr2    78   G    6    ..N..,     III@II
chr2    79   T    7    ..N...,    I$IIIII
chr2    80   C    7    .G....,    I#IIIII
chr2    81   G    7    ...A..,    IIII#BI
chr2    82   A    8    ...G...,   I$IDIIII
chr2    83   T    8    .A.....N   IIIIIIII
chr2    84   A    9    ......T.   I$IIIIIII

FILE 3
chr1    1    A    1    .          I
chr1    2    T    2    G.         I$
chr1    3    C    2    .,         I@
chr1    4    C    3    ..N        III
chr1    42   C    5    ...N^:.    III@I
chr1    43   C    5    .N..^:.    IIIII
chr1    44   T    5    .A..,      IA@II
chr1    45   A    6    .N...^:.   IIIII$
chr1    46   G    6    .GN..^:.   I@IIII
chr1    47   A    7    ....^:..,  IIIII$I
chr2    73   T    5    .N..,      II$II
chr2    74   A    5    ....,      IIIII
chr2    75   T    5    ....,      IIIII
chr2    76   T    5    ....,      IIIII
chr2    77   C    5    ....,      IIIBI
chr2    78   T    5    ....,      IDIII

To join on columns 3 and 4 combining on columns 1 and 2, columns 1-4 should be selected for the 'Include these columns' option, and column 2 selected for the 'hinge'. With these settings, the following would be output:

chr1    1    C    1              A    1
chr1    2    G    2              T    2
chr1    3    A    2    T    1    C    2
chr1    4    C    2    G    2    C    3
chr1    5    T    3    T    2
chr1    6    G    3    C    3
chr1    7              G    3
chr1    8              T    4
chr1    42                       C    5
chr1    43                       C    5
chr1    44                       T    5
chr1    45                       A    6
chr1    46                       G    6
chr1    47                       A    7
chr2    1    T    6
chr2    2    G    6
chr2    3    C    7
chr2    4    G    7
chr2    5    G    7
chr2    6    A    7
chr2    73                       T    5
chr2    74                       A    5
chr2    75                       T    5
chr2    76                       T    5
chr2    77             C    6    C    5
chr2    78             G    6    T    5
chr2    79             T    7
chr2    80             C    7
chr2    81             G    7
chr2    82             A    8
chr2    83             T    8
chr2    84             A    9

Example with missing columns

Given the following input files:

FILE 1
1   A
2   B   b
4   C   c
5   D
6   E   e

FILE 2
1   M   m
2   N
3   O   o
4   P   p
5   Q
7   R   r

if we join only column 3 using column 1 as the hinge and with a fill value of '0', this is what will be output:

1   0   m
2   b   0
3   0   o
4   c   p
5   0   0
6   e   0
7   0   r

Row 5 appears in both files with the missing column, so it's got nothing but fill values in the output file.