Query Tabular

Inputs

Loads tabular datasets into a SQLite data base.

An existing SQLite data base can be used as input, and any selected tabular datasets will be added as new tables in that data base.

Input Line Filters

As a tabular file is being read, line filters may be applied.

- skip leading lines              skip the first *number* of lines
- comment char                    omit any lines that start with the specified comment character
- by regex expression matching    *include/exclude* lines the match the regex expression
- select columns                  choose to include only selected columns in the order specified
- regex replace value in column   replace a field in a column using a regex substitution (good for date reformatting)
- prepend a line number column    each line has the ordinal value of the line read by this filter as the first column
- append a line number column     each line has the ordinal value of the line read by this filter as the last column
- prepend a text column           each line has the text string as the first column
- append a text column            each line has the text string as the last column
- normalize list columns          replicates the line for each item in the specified list *columns*

Outputs

The results of a SQL query are output to the history as a tabular file.

The SQLite data base can also be saved and output as a dataset in the history.

(The SQLite to tabular tool can run additional queries on this database.)

For help in using SQLite see: http://www.sqlite.org/docs.html

NOTE: input for SQLite dates input field must be in the format: YYYY-MM-DD for example: 2015-09-30

See: http://www.sqlite.org/lang_datefunc.html

Example

Given 2 tabular datasets: customers and sales

Dataset customers

Table name: "customers"

Column names: "CustomerID,FirstName,LastName,Email,DOB,Phone"

#CustomerID FirstName LastName Email DOB Phone

1 John Smith John.Smith@yahoo.com 1968-02-04 626 222-2222

2 Steven Goldfish goldfish@fishhere.net 1974-04-04 323 455-4545

3 Paula Brown pb@herowndomain.org 1978-05-24 416 323-3232

4 James Smith jim@supergig.co.uk 1980-10-20 416 323-8888

Dataset sales

Table name: "sales"

Column names: "CustomerID,Date,SaleAmount"

#CustomerID Date SaleAmount

2 2004-05-06 100.22

1 2004-05-07 99.95

3 2004-05-07 122.95

3 2004-05-13 100.00

4 2004-05-22 555.55

The query
SELECT FirstName,LastName,sum(SaleAmount) as "TotalSales"
FROM customers join sales on customers.CustomerID = sales.CustomerID
GROUP BY customers.CustomerID ORDER BY TotalSales DESC;
Produces this tabular output:

#FirstName LastName TotalSales

James Smith 555.55

Paula Brown 222.95

Steven Goldfish 100.22

John Smith 99.95

If the optional Table name and Column names inputs are not used, the query would be:
SELECT t1.c2 as "FirstName", t1.c3 as "LastName", sum(t2.c3) as "TotalSales"
FROM t1 join t2 on t1.c1 = t2.c1
GROUP BY t1.c1 ORDER BY TotalSales DESC;
You can selectively name columns, e.g. on the customers input you could just name columns 2,3, and 5:

Column names: ,FirstName,LastName,,BirthDate

Results in the following data base table

#c1 FirstName LastName c4 BirthDate c6

1 John Smith John.Smith@yahoo.com 1968-02-04 626 222-2222

2 Steven Goldfish goldfish@fishhere.net 1974-04-04 323 455-4545

3 Paula Brown pb@herowndomain.org 1978-05-24 416 323-3232

4 James Smith jim@supergig.co.uk 1980-10-20 416 323-8888

Regular_expression functions are included for:
matching:      re_match('pattern',column)

SELECT t1.FirstName, t1.LastName
FROM t1
WHERE re_match('^.*\.(net|org)$',c4)
Results:

#FirstName LastName

Steven Goldfish

Paula Brown
searching:     re_search('pattern',column)
substituting:  re_sub('pattern','replacement,column)

SELECT t1.FirstName, t1.LastName, re_sub('^\d{2}(\d{2})-(\d\d)-(\d\d)','\3/\2/\1',BirthDate) as "DOB"
FROM t1
WHERE re_search('[hp]er',c4)
Results:

#FirstName LastName DOB

Steven Goldfish 04/04/74

Paula Brown 24/05/78

James Smith 20/10/80

#CustomerID	FirstName	LastName	Email	DOB	Phone
1	John	Smith	John.Smith@yahoo.com	1968-02-04	626 222-2222
2	Steven	Goldfish	goldfish@fishhere.net	1974-04-04	323 455-4545
3	Paula	Brown	pb@herowndomain.org	1978-05-24	416 323-3232
4	James	Smith	jim@supergig.co.uk	1980-10-20	416 323-8888

#CustomerID	Date	SaleAmount
2	2004-05-06	100.22
1	2004-05-07	99.95
3	2004-05-07	122.95
3	2004-05-13	100.00
4	2004-05-22	555.55

#FirstName	LastName	TotalSales
James	Smith	555.55
Paula	Brown	222.95
Steven	Goldfish	100.22
John	Smith	99.95

#c1	FirstName	LastName	c4	BirthDate	c6
1	John	Smith	John.Smith@yahoo.com	1968-02-04	626 222-2222
2	Steven	Goldfish	goldfish@fishhere.net	1974-04-04	323 455-4545
3	Paula	Brown	pb@herowndomain.org	1978-05-24	416 323-3232
4	James	Smith	jim@supergig.co.uk	1980-10-20	416 323-8888

#FirstName	LastName
Steven	Goldfish
Paula	Brown

#FirstName	LastName	DOB
Steven	Goldfish	04/04/74
Paula	Brown	24/05/78
James	Smith	20/10/80

Line Filtering Example

(Six filters are applied as the following file is read)

Input Tabular File:

#People with pets
Pets FirstName           LastName   DOB       PetNames  PetType
2    Paula               Brown      24/05/78  Rex,Fluff dog,cat
1    Steven              Jones      04/04/74  Allie     cat
0    Jane                Doe        24/05/78
1    James               Smith      20/10/80  Spot


Filter 1 - append a line number column:

#People with pets                                                 1
Pets FirstName           LastName   DOB       PetNames  PetType   2
2    Paula               Brown      24/05/78  Rex,Fluff dog,cat   3
1    Steven              Jones      04/04/74  Allie     cat       4
0    Jane                Doe        24/05/78                      5
1    James               Smith      20/10/80  Spot                6

Filter 2 - by regex expression matching [include]: '^\d+' (include lines that start with a number)

2    Paula               Brown      24/05/78  Rex,Fluff dog,cat   3
1    Steven              Jones      04/04/74  Allie     cat       4
0    Jane                Doe        24/05/78                      5
1    James               Smith      20/10/80  Spot                6

Filter 3 - append a line number column:

2    Paula               Brown      24/05/78  Rex,Fluff dog,cat   3  1
1    Steven              Jones      04/04/74  Allie     cat       4  2
0    Jane                Doe        24/05/78                      5  3
1    James               Smith      20/10/80  Spot                6  4

Filter 4 - regex replace value in column[4]: '(\d+)/(\d+)/(\d+)' '19\3-\2-\1' (convert dates to sqlite format)

2    Paula               Brown      1978-05-24  Rex,Fluff dog,cat   3  1
1    Steven              Jones      1974-04-04  Allie     cat       4  2
0    Jane                Doe        1978-05-24                      5  3
1    James               Smith      1980-10-20  Spot                6  4

Filter 5 - normalize list columns[5,6]:

2    Paula               Brown      1978-05-24  Rex       dog       3  1
2    Paula               Brown      1978-05-24  Fluff     cat       3  1
1    Steven              Jones      1974-04-04  Allie     cat       4  2
0    Jane                Doe        1978-05-24                      5  3
1    James               Smith      1980-10-20  Spot                6  4

Filter 6 - append a line number column:

2    Paula               Brown      1978-05-24  Rex       dog       3  1  1
2    Paula               Brown      1978-05-24  Fluff     cat       3  1  2
1    Steven              Jones      1974-04-04  Allie     cat       4  2  3
0    Jane                Doe        1978-05-24                      5  3  4
1    James               Smith      1980-10-20  Spot                6  4  5

Table name: pets

Table columns: Pets,FirstName,LastName,Birthdate,PetNames,PetType,line_num,entry_num,row_num

Query: SELECT * FROM pets

Result:

#Pets FirstName LastName BirthDate PetNames PetType line_num entry_num row_num

2 Paula Brown 1978-05-24 Rex dog 3 1 1

2 Paula Brown 1978-05-24 Fluff cat 3 1 2

1 Steven Jones 1974-04-04 Allie cat 4 2 3

0 Jane Doe 1978-05-24 5 3 4

1 James Smith 1980-10-20 Spot 6 4 5

#Pets	FirstName	LastName	BirthDate	PetNames	PetType	line_num	entry_num	row_num
2	Paula	Brown	1978-05-24	Rex	dog	3	1	1
2	Paula	Brown	1978-05-24	Fluff	cat	3	1	2
1	Steven	Jones	1974-04-04	Allie	cat	4	2	3
0	Jane	Doe	1978-05-24			5	3	4
1	James	Smith	1980-10-20	Spot		6	4	5

Normalizing by Line Filtering into 2 Tables

Relational database opertions work with single-valued column entries. To apply relational operations to tabular files that contain fields with lists of values, we need to "normalize" those fields, duplicating lines for each item in the list. In this example we create 2 tables, one for single-valued fields and a second with list-valued fields normalized. Becauce we add a line number first for each table, we can join the 2 tables on the line number column. https://en.wikipedia.org/wiki/First_normal_form

People Table
Filter 1 - by regex expression matching [include]: '^\d+' (include lines that start with a number)
Filter 2 - append a line number column:
Filter 3 - regex replace value in column[4]: '(\d+)/(\d+)/(\d+)' '19\3-\2-\1' (convert dates to sqlite format)
Filter 4 - select columns 7,2,3,4,1
Table: People Columns: id,FirstName,LastName,DOB,Pets

id FirstName LastName DOB Pets

1 Paula Brown 1978-05-24 2

2 Steven Jones 1974-04-04 1

3 Jane Doe 1978-05-24 0

4 James Smith 1980-10-20 1
Pet Table
Filter 1 - by regex expression matching [include]: '^\d+' (include lines that start with a number)
Filter 2 - append a line number column:
Filter 3 - by regex expression matching [exclude]: '^0\t' (exclude lines with no pets)
Filter 4 - normalize list columns[5,6]:
Filter 5 - select columns 7,5,6
Table: Pet Columns: id,PetName,PetType

id PetName PetType

1 Rex dog

1 Fluff cat

2 Allie cat

4 Spot
Query: SELECT FirstName,LastName,PetName FROM People JOIN Pet ON People.id = Pet.id WHERE PetType = 'cat';

Result:

FirstName LastName PetName

Paula Brown Fluff

Steven Jones Allie