OR2012 EThOS presentation
The following notes form the basis of the presentation at OR2012 about experiences of the EThOS webservice, developing a connector for it and how to use it.
About the webservice
The webservice uses SOAP / WSDL architecture. The WSDL is a computer-parsable file that describes how to interrogate the EThOS service. The XSD describes which elements you need to send to the service,and what data you'll get back.
For EPrints, you can use the SOAP::Lite module to format data and send it to webservice.
About the code
The code available at [http://files.eprints.org/778/] allows you to query the webservice and download theses for your institution.
The code consists of two parts:
- a bin script that runs from the command line.
- a config file (format may seem a little convoluted. It was written to cope with the 3-institution consortium model of the White Rose Etheses Online archive).
The configuration allows you to specify defaults for EThOS imports on top of the normal record defaults. It also allows you to match and process some data elements to match the EPrint model e.g. thesis_type (a controlled list in EPrints, a text element in EThOS).
The code is designed to be run periodically (e.g. monthly) and will attempt to download thesis where possible/necessary.
A word on Identifiers
Data harvested from your repository will include a
institutionReference element - from the OAI-PMH interface.
EThOS records are issued with an EThOSid. This isn't directly stated in the webservice, but is the
eprintId prefixed with 'uk.bl.ethos.'. By storing this identifier in the EPrints record, future runs of the download too can see that the record has already been ingested. It should also help the EThOS service match records too.
Downloading in anger
For the White Rose Etheses Online repository, we created a user, a user type and a custom workflow for importing the EThOS records.
This allows us to deal with the imported records as a seperate task to incoming student-submitted etheses. The workflow for our editors is minimal. There is one screen for metadata, and one screen for the files.
The metadata screen was specified by the team of people who would be working with it. Metadata elements that are required are at the top, those that would probably not get completed are collapsed at the bottom of the screen.