Loading Data via SRU

Polymatheia supports accessing metadata records via the SRU protocol. SRU (Search/Retrieve via URL) is a standard XML-based protocol for search queries, utilizing CQL (Contextual Query Language), a standard syntax for representing queries. Each web service that implements the SRU protocol should provide an Explain record at its base URL that allows a client to retrieve a description of the facilities available at this SRU server.

Getting the Explain record

Use the SRUExplainRecordReader. Polymatheia provides direct access to the record schemas that can be used with the SRU web service as well as to the echoed request (i.e., the request parameters echoed back to the client).

from polymatheia.data.reader import SRUExplainRecordReader

reader = SRUExplainRecordReader("http://sru.k10plus.de/gvk")
for record in reader:
    print(record)

print(reader.schemas)
print(reader.echo)

Fetching records

Use the class SRURecordReader to query an SRU server:

from polymatheia.data.reader import SRURecordReader

reader = SRURecordReader("http://sru.k10plus.de/gvk",
                         query="dog cat mouse")
for record in reader:
    print(record)

Note

This will fetch ALL records that match the query. Consider limiting the size of the result set using the max_records parameter as described below.

Limiting the number of records

Provide a parameter max_records that specifies the desired number of records to return:

from polymatheia.data.reader import SRURecordReader

reader = SRURecordReader("http://sru.k10plus.de/gvk",
                         query="dog cat mouse",
                         max_records=10)
for record in reader:
    print(record)

Note

This will either retrieve exactly max_records that match the query or less. It is a good idea to check the total number of records a query retrieves beforehand (see below).

Getting the total number of records for a query

The function result_count of SRURecordReader returns the number of records that match the given query. Checking this value in advance allows to specify the max_records as necessary.

from polymatheia.data.reader import SRURecordReader

result_count = SRURecordReader.result_count("http://sru.k10plus.de/gvk",
                                            query="dog cat mouse")
print(result_count)

Selecting a record schema

Passing the parameter record_schema, i.e. a metadata format, to the SRURecordReader returns all records in this format:

from polymatheia.data.reader import SRURecordReader

reader = SRURecordReader("http://sru.k10plus.de/gvk",
                         query="dog cat mouse",
                         max_records=10,
                         record_schema="mods"
                         )
for record in reader:
    print(record)

Note

See the SRU Explain record of the appropriate web service for all supported record schemas. Also, consider the SRU specification for more details about other available SRU parameters.

Getting the echoed request

The echo attribute of SRURecordReader echoes the request parameters back to the client. It is available after starting the iteration:

from polymatheia.data.reader import SRURecordReader

reader = SRURecordReader("http://sru.k10plus.de/gvk",
                         query="dog cat mouse",
                         max_records=10)
for record in reader:
    print(reader.echo)
    break