Inputs
Loads tabular datasets into a SQLite data base.
An existing SQLite data base can be used as input, and any selected tabular datasets will be added as new tables in that data base.
Input Line Filters
As a tabular file is being read, line filters may be applied.
- skip leading lines skip the first *number* of lines - comment char omit any lines that start with the specified comment character - by regex expression matching *include/exclude* lines the match the regex expression - select columns choose to include only selected columns in the order specified - regex replace value in column replace a field in a column using a regex substitution (good for date reformatting) - prepend a line number column each line has the ordinal value of the line read by this filter as the first column - append a line number column each line has the ordinal value of the line read by this filter as the last column - prepend a text column each line has the text string as the first column - append a text column each line has the text string as the last column - normalize list columns replicates the line for each item in the specified list *columns*
Outputs
The results of a SQL query are output to the history as a tabular file.
The SQLite data base can also be saved and output as a dataset in the history.
(The SQLite to tabular tool can run additional queries on this database.)
For help in using SQLite see: http://www.sqlite.org/docs.html
NOTE: input for SQLite dates input field must be in the format: YYYY-MM-DD for example: 2015-09-30
See: http://www.sqlite.org/lang_datefunc.html
Example
Given 2 tabular datasets: customers and sales
Dataset customers
Table name: "customers"
Column names: "CustomerID,FirstName,LastName,Email,DOB,Phone"
#CustomerID FirstName LastName DOB Phone 1 John Smith John.Smith@yahoo.com 1968-02-04 626 222-2222 2 Steven Goldfish goldfish@fishhere.net 1974-04-04 323 455-4545 3 Paula Brown pb@herowndomain.org 1978-05-24 416 323-3232 4 James Smith jim@supergig.co.uk 1980-10-20 416 323-8888 Dataset sales
Table name: "sales"
Column names: "CustomerID,Date,SaleAmount"
#CustomerID Date SaleAmount 2 2004-05-06 100.22 1 2004-05-07 99.95 3 2004-05-07 122.95 3 2004-05-13 100.00 4 2004-05-22 555.55 The query
SELECT FirstName,LastName,sum(SaleAmount) as "TotalSales" FROM customers join sales on customers.CustomerID = sales.CustomerID GROUP BY customers.CustomerID ORDER BY TotalSales DESC;Produces this tabular output:
#FirstName LastName TotalSales James Smith 555.55 Paula Brown 222.95 Steven Goldfish 100.22 John Smith 99.95 If the optional Table name and Column names inputs are not used, the query would be:
SELECT t1.c2 as "FirstName", t1.c3 as "LastName", sum(t2.c3) as "TotalSales" FROM t1 join t2 on t1.c1 = t2.c1 GROUP BY t1.c1 ORDER BY TotalSales DESC;You can selectively name columns, e.g. on the customers input you could just name columns 2,3, and 5:
Column names: ,FirstName,LastName,,BirthDate
Results in the following data base table
#c1 FirstName LastName c4 BirthDate c6 1 John Smith John.Smith@yahoo.com 1968-02-04 626 222-2222 2 Steven Goldfish goldfish@fishhere.net 1974-04-04 323 455-4545 3 Paula Brown pb@herowndomain.org 1978-05-24 416 323-3232 4 James Smith jim@supergig.co.uk 1980-10-20 416 323-8888 Regular_expression functions are included for:
matching: re_match('pattern',column) SELECT t1.FirstName, t1.LastName FROM t1 WHERE re_match('^.*\.(net|org)$',c4)Results:
#FirstName LastName Steven Goldfish Paula Brown searching: re_search('pattern',column) substituting: re_sub('pattern','replacement,column) SELECT t1.FirstName, t1.LastName, re_sub('^\d{2}(\d{2})-(\d\d)-(\d\d)','\3/\2/\1',BirthDate) as "DOB" FROM t1 WHERE re_search('[hp]er',c4)Results:
#FirstName LastName DOB Steven Goldfish 04/04/74 Paula Brown 24/05/78 James Smith 20/10/80
(Six filters are applied as the following file is read)
Input Tabular File: #People with pets Pets FirstName LastName DOB PetNames PetType 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 1 Steven Jones 04/04/74 Allie cat 0 Jane Doe 24/05/78 1 James Smith 20/10/80 Spot Filter 1 - append a line number column: #People with pets 1 Pets FirstName LastName DOB PetNames PetType 2 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 3 1 Steven Jones 04/04/74 Allie cat 4 0 Jane Doe 24/05/78 5 1 James Smith 20/10/80 Spot 6 Filter 2 - by regex expression matching [include]: '^\d+' (include lines that start with a number) 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 3 1 Steven Jones 04/04/74 Allie cat 4 0 Jane Doe 24/05/78 5 1 James Smith 20/10/80 Spot 6 Filter 3 - append a line number column: 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 3 1 1 Steven Jones 04/04/74 Allie cat 4 2 0 Jane Doe 24/05/78 5 3 1 James Smith 20/10/80 Spot 6 4 Filter 4 - regex replace value in column[4]: '(\d+)/(\d+)/(\d+)' '19\3-\2-\1' (convert dates to sqlite format) 2 Paula Brown 1978-05-24 Rex,Fluff dog,cat 3 1 1 Steven Jones 1974-04-04 Allie cat 4 2 0 Jane Doe 1978-05-24 5 3 1 James Smith 1980-10-20 Spot 6 4 Filter 5 - normalize list columns[5,6]: 2 Paula Brown 1978-05-24 Rex dog 3 1 2 Paula Brown 1978-05-24 Fluff cat 3 1 1 Steven Jones 1974-04-04 Allie cat 4 2 0 Jane Doe 1978-05-24 5 3 1 James Smith 1980-10-20 Spot 6 4 Filter 6 - append a line number column: 2 Paula Brown 1978-05-24 Rex dog 3 1 1 2 Paula Brown 1978-05-24 Fluff cat 3 1 2 1 Steven Jones 1974-04-04 Allie cat 4 2 3 0 Jane Doe 1978-05-24 5 3 4 1 James Smith 1980-10-20 Spot 6 4 5
Table name: pets
Table columns: Pets,FirstName,LastName,Birthdate,PetNames,PetType,line_num,entry_num,row_num
Query: SELECT * FROM pets
Result:
#Pets FirstName LastName BirthDate PetNames PetType line_num entry_num row_num 2 Paula Brown 1978-05-24 Rex dog 3 1 1 2 Paula Brown 1978-05-24 Fluff cat 3 1 2 1 Steven Jones 1974-04-04 Allie cat 4 2 3 0 Jane Doe 1978-05-24 5 3 4 1 James Smith 1980-10-20 Spot 6 4 5
Normalizing by Line Filtering into 2 Tables
Relational database opertions work with single-valued column entries. To apply relational operations to tabular files that contain fields with lists of values, we need to "normalize" those fields, duplicating lines for each item in the list. In this example we create 2 tables, one for single-valued fields and a second with list-valued fields normalized. Becauce we add a line number first for each table, we can join the 2 tables on the line number column. https://en.wikipedia.org/wiki/First_normal_form
People Table
Filter 1 - by regex expression matching [include]: '^\d+' (include lines that start with a number) Filter 2 - append a line number column: Filter 3 - regex replace value in column[4]: '(\d+)/(\d+)/(\d+)' '19\3-\2-\1' (convert dates to sqlite format) Filter 4 - select columns 7,2,3,4,1Table: People Columns: id,FirstName,LastName,DOB,Pets
id FirstName LastName DOB Pets 1 Paula Brown 1978-05-24 2 2 Steven Jones 1974-04-04 1 3 Jane Doe 1978-05-24 0 4 James Smith 1980-10-20 1 Pet Table
Filter 1 - by regex expression matching [include]: '^\d+' (include lines that start with a number) Filter 2 - append a line number column: Filter 3 - by regex expression matching [exclude]: '^0\t' (exclude lines with no pets) Filter 4 - normalize list columns[5,6]: Filter 5 - select columns 7,5,6Table: Pet Columns: id,PetName,PetType
id PetName PetType 1 Rex dog 1 Fluff cat 2 Allie cat 4 Spot Query: SELECT FirstName,LastName,PetName FROM People JOIN Pet ON People.id = Pet.id WHERE PetType = 'cat';
Result:
FirstName LastName PetName Paula Brown Fluff Steven Jones Allie