Loading Data via SRU¶
Polymatheia supports accessing metadata records via the SRU protocol. SRU (Search/Retrieve via URL) is a standard XML-based protocol for search queries, utilizing CQL (Contextual Query Language), a standard syntax for representing queries. Each web service that implements the SRU protocol should provide an Explain record at its base URL that allows a client to retrieve a description of the facilities available at this SRU server.
Getting the Explain record¶
Use the SRUExplainRecordReader
. Polymatheia provides direct
access to the record schemas that can be used with the SRU web service as well as to the echoed request
(i.e., the request parameters echoed back to the client).
from polymatheia.data.reader import SRUExplainRecordReader
reader = SRUExplainRecordReader("http://sru.k10plus.de/gvk")
for record in reader:
print(record)
print(reader.schemas)
print(reader.echo)
Fetching records¶
Use the class SRURecordReader
to query an SRU server:
from polymatheia.data.reader import SRURecordReader
reader = SRURecordReader("http://sru.k10plus.de/gvk",
query="dog cat mouse")
for record in reader:
print(record)
Note
This will fetch ALL records that match the query. Consider limiting the size of the result set
using the max_records
parameter as described below.
Limiting the number of records¶
Provide a parameter max_records
that specifies the desired number of records to return:
from polymatheia.data.reader import SRURecordReader
reader = SRURecordReader("http://sru.k10plus.de/gvk",
query="dog cat mouse",
max_records=10)
for record in reader:
print(record)
Note
This will either retrieve exactly max_records that match the query or less. It is a good idea to check the total number of records a query retrieves beforehand (see below).
Getting the total number of records for a query¶
The function result_count
of SRURecordReader
returns the number of records that match the given query. Checking this value in advance allows to
specify the max_records
as necessary.
from polymatheia.data.reader import SRURecordReader
result_count = SRURecordReader.result_count("http://sru.k10plus.de/gvk",
query="dog cat mouse")
print(result_count)
Selecting a record schema¶
Passing the parameter record_schema
, i.e. a metadata format, to the SRURecordReader
returns all
records in this format:
from polymatheia.data.reader import SRURecordReader
reader = SRURecordReader("http://sru.k10plus.de/gvk",
query="dog cat mouse",
max_records=10,
record_schema="mods"
)
for record in reader:
print(record)
Note
See the SRU Explain record of the appropriate web service for all supported record schemas. Also, consider the SRU specification for more details about other available SRU parameters.
Getting the echoed request¶
The echo
attribute of SRURecordReader
echoes the request parameters back to the client. It is available after starting the iteration:
from polymatheia.data.reader import SRURecordReader
reader = SRURecordReader("http://sru.k10plus.de/gvk",
query="dog cat mouse",
max_records=10)
for record in reader:
print(reader.echo)
break