Filtering Data
Polymatheia provides a simple filtering language to remove records that are not needed for further processing. All
filtering is performed using the polymatheia.filter.RecordsFilter
. All filters are specified using tuples.
Basic filters
The basic filters provided by Polymatheia allow you to compare a value in a record to a fixed value:
('true',)
: Lets any record pass.('false',)
: Lets no record pass.('eq', a, b)
: Lets the record pass if the value ofa
is equal to the value ofb
.('neq', a, b)
: Lets the record pass if the value ofa
is not equal to the value ofb
.('gt', a, b)
: Lets the record pass if the value ofa
is greater than the value ofb
.('gte', a, b)
: Lets the record pass if the value ofa
is greater than or equal to the value ofb
.('lt', a, b)
: Lets the record pass if the value ofa
is less than the value ofb
.('lte', a, b)
: Lets the record pass if the value ofa
is less than or equal to the value ofb
.('contains', a, b)
: Lets the record pass if the value ofa
is contains the value ofb
.('exists', a)
: Lets the record pass if the value ofa
is notNone
.
Where the filter expression contains a
and b
, either of these can be one of:
A dotted string: in this case the value to be compared is taken from the record using the dotted string to identify the value to compare.
A list: the value to be compared is taken from the record using the list to identify the value to compare.
Anything else: the value is compared as is.
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('eq', ['type'], 'IMAGE')
images = RecordsFilter(reader, fltr)
for record in images:
print(record)
Compound filters
Filters can be combined into more complex filter expressions using the following compound filters:
('not', filter_expression)
: Lets the record pass if thefilter_expression
is notTrue
.('or', filter_expression_1, ..., filter_expression_n)
: Lets the record pass if one or more of thefilter_expression_1
tofilter_expression_n
isTrue
.('and', filter_expression_1, ..., filter_expression_n)
: Lets the record pass only if allfilter_expression_1
tofilter_expression_n
areTrue
.
The negation filter not
is primarily needed with the contains
filter, as the other basic filters provide
explicit negation filters:
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('not', ('contains', ['dcLanguage'], 'de'))
not_german = RecordsFilter(reader, fltr)
for record in not_german:
print(record)
The or
and and
filters use standard boolean logic for evaluation:
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('or', ('contains', ['dcLanguage'], 'de'), ('contains', ['dcLanguage'], 'ger'))
full_german = RecordsFilter(reader, fltr)
for record in full_german:
print(record)
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('and', ('contains', ['dcLanguage'], 'de'), ('eq', ['type'], 'IMAGE'))
german_images = RecordsFilter(reader, fltr)
for record in german_images:
print(record)
The compound filters can themselves be nested:
from polymatheia.data.reader import LocalReader
from polymatheia.filter import RecordsFilter
reader = LocalReader('europeana_json')
fltr = ('and',
('or',
('contains', ['dcLanguage'], 'de'),
('contains', ['dcLanguage'], 'ger')),
('eq', ['type'], 'IMAGE'))
full_german_images = RecordsFilter(reader, fltr)
for record in full_german_images:
print(record)