Filtering Data
Polymatheia provides a simple filtering language to remove records that are not needed for further processing. All
filtering is performed using the polymatheia.filter.RecordsFilter. All filters are specified using tuples.
Basic filters
The basic filters provided by Polymatheia allow you to compare a value in a record to a fixed value:
('true',): Lets any record pass.('false',): Lets no record pass.('eq', a, b): Lets the record pass if the value ofais equal to the value ofb.('neq', a, b): Lets the record pass if the value ofais not equal to the value ofb.('gt', a, b): Lets the record pass if the value ofais greater than the value ofb.('gte', a, b): Lets the record pass if the value ofais greater than or equal to the value ofb.('lt', a, b): Lets the record pass if the value ofais less than the value ofb.('lte', a, b): Lets the record pass if the value ofais less than or equal to the value ofb.('contains', a, b): Lets the record pass if the value ofais contains the value ofb.('exists', a): Lets the record pass if the value ofais notNone.
Where the filter expression contains a and b, either of these can be one of:
A dotted string: in this case the value to be compared is taken from the record using the dotted string to identify the value to compare.
A list: the value to be compared is taken from the record using the list to identify the value to compare.
Anything else: the value is compared as is.
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('eq', ['type'], 'IMAGE')
images = RecordsFilter(reader, fltr)
for record in images:
print(record)
Compound filters
Filters can be combined into more complex filter expressions using the following compound filters:
('not', filter_expression): Lets the record pass if thefilter_expressionis notTrue.('or', filter_expression_1, ..., filter_expression_n): Lets the record pass if one or more of thefilter_expression_1tofilter_expression_nisTrue.('and', filter_expression_1, ..., filter_expression_n): Lets the record pass only if allfilter_expression_1tofilter_expression_nareTrue.
The negation filter not is primarily needed with the contains filter, as the other basic filters provide
explicit negation filters:
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('not', ('contains', ['dcLanguage'], 'de'))
not_german = RecordsFilter(reader, fltr)
for record in not_german:
print(record)
The or and and filters use standard boolean logic for evaluation:
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('or', ('contains', ['dcLanguage'], 'de'), ('contains', ['dcLanguage'], 'ger'))
full_german = RecordsFilter(reader, fltr)
for record in full_german:
print(record)
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('and', ('contains', ['dcLanguage'], 'de'), ('eq', ['type'], 'IMAGE'))
german_images = RecordsFilter(reader, fltr)
for record in german_images:
print(record)
The compound filters can themselves be nested:
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('and',
('or',
('contains', ['dcLanguage'], 'de'),
('contains', ['dcLanguage'], 'ger')),
('eq', ['type'], 'IMAGE'))
full_german_images = RecordsFilter(reader, fltr)
for record in full_german_images:
print(record)