Reading and Writing XML Data

For performance and replicability reasons it is generally better to keep a local copy of the data-set that is being worked on. Polymatheia supports this in two ways, either using JSON or using XML.

Writing XML data

To store data locally, use the XMLWriter, providing the name of the directory to store the data in and the dotted path to the field that contains each record’s unique identifier:

from polymatheia.data.reader import EuropeanaSearchReader
from polymatheia.data.writer import XMLWriter

EUROPEANA_API_KEY = 'Put your key here'
reader = EuropeanaSearchReader(EUROPEANA_API_KEY, 'Gutzkow OR Zäunemann OR Heyse')
writer = XMLWriter('europeana_xml', 'guid')
writer.write(reader)

The filename is calculated as a sha-256 digest of the value of the unique identifier. Thus if the identifier is not actually unique, data will be overwritten and lost.

Warning

The XMLWriter will ensure that all keys are transformed into valid XML tag names. This means that if a key is not a valid XML tag name, then it will automatically be transformed into a valid XML tag name. As a result it is not possible to guarantee that a record saved with the XMLWriter will have exactly the same keys after loading with the XMLReader.

Thus in practice it is recommended that you use JSON as the format for locally storing data, as this guarantees that the records remain unchanged.

Reading XML data

To load data stored locally, use the XMLReader, providing the name of the directory that contains the data to load:

from polymatheia.data.reader import XMLReader

reader = XMLReader('europeana_xml')
for record in reader:
    print(record)

This will work with any data stored as XML, not just with data stored using the XMLWriter. The only requirements are that the files to load must have “.xml” as their extension and each file must contain exactly one record.

Note

The order of records is defined by order in which filenames are listed by the underlying operating system. As such no order can be guaranteed.