Filter a tabular dataset by applying line filters as it is being read. Multiple filters may be used with each filter using the result of the previous filter.
Inputs
A tabular dataset.
Outputs
A filtered tabular dataset.
Input Line Filters
As a tabular file is being read, line filters may be applied.
- skip leading lines skip the first *number* of lines - comment char omit any lines that start with the specified comment character - by regex expression matching *include/exclude* lines the match the regex expression - select columns choose to include only selected columns in the order specified - regex replace value in column replace a field in a column using a regex substitution (good for date reformatting) - prepend a line number column each line has the ordinal value of the line read by this filter as the first column - append a line number column each line has the ordinal value of the line read by this filter as the last column - prepend a text column each line has the text string as the first column - append a text column each line has the text string as the last column - normalize list columns replicates the line for each item in the specified list *columns*
(Six filters are applied as the following file is read)
Input Tabular File: #People with pets Pets FirstName LastName DOB PetNames PetType 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 1 Steven Jones 04/04/74 Allie cat 0 Jane Doe 24/05/78 1 James Smith 20/10/80 Spot Filter 1 - append a line number column: #People with pets 1 Pets FirstName LastName DOB PetNames PetType 2 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 3 1 Steven Jones 04/04/74 Allie cat 4 0 Jane Doe 24/05/78 5 1 James Smith 20/10/80 Spot 6 Filter 2 - by regex expression matching [include]: '^\d+' (include lines that start with a number) 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 3 1 Steven Jones 04/04/74 Allie cat 4 0 Jane Doe 24/05/78 5 1 James Smith 20/10/80 Spot 6 Filter 3 - append a line number column: 2 Paula Brown 24/05/78 Rex,Fluff dog,cat 3 1 1 Steven Jones 04/04/74 Allie cat 4 2 0 Jane Doe 24/05/78 5 3 1 James Smith 20/10/80 Spot 6 4 Filter 4 - regex replace value in column[4]: '(\d+)/(\d+)/(\d+)' '19\3-\2-\1' (convert dates to sqlite format) 2 Paula Brown 1978-05-24 Rex,Fluff dog,cat 3 1 1 Steven Jones 1974-04-04 Allie cat 4 2 0 Jane Doe 1978-05-24 5 3 1 James Smith 1980-10-20 Spot 6 4 Filter 5 - normalize list columns[5,6]: 2 Paula Brown 1978-05-24 Rex dog 3 1 2 Paula Brown 1978-05-24 Fluff cat 3 1 1 Steven Jones 1974-04-04 Allie cat 4 2 0 Jane Doe 1978-05-24 5 3 1 James Smith 1980-10-20 Spot 6 4 Filter 6 - append a line number column: 2 Paula Brown 1978-05-24 Rex dog 3 1 1 2 Paula Brown 1978-05-24 Fluff cat 3 1 2 1 Steven Jones 1974-04-04 Allie cat 4 2 3 0 Jane Doe 1978-05-24 5 3 4 1 James Smith 1980-10-20 Spot 6 4 5