Difference between revisions of "OR2012 EThOS presentation"

From EPrints Documentation
Jump to: navigation, search
m
Line 30: Line 30:
  
 
The metadata screen was specified by the team of people who would be working with it. Metadata elements that are required are at the top, those that would probably not get completed are collapsed at the bottom of the screen.
 
The metadata screen was specified by the team of people who would be working with it. Metadata elements that are required are at the top, those that would probably not get completed are collapsed at the bottom of the screen.
 +
 +
== Issues with the plugin ==
 +
The plugin code was initially written to download the initial batch of theses being digitised by the BL. It was designed to get the record back to the local repository quickly. It doesn't use the full UKETD_DC record available through the webservice. An updated script will be published that makes full use of the UKETD_DC record.
 +
 +
== Issues with the Webservice ==
 +
* If you try and download ''a lot'' of records the connection to the webservice may time out.
 +
* Author names: Messy - and don't match those available via the EThOS website. ''This is being investigated by the EThOS crew''.

Revision as of 08:02, 10 July 2012

The following notes form the basis of the presentation at OR2012 about experiences of the EThOS webservice, developing a connector for it and how to use it.

About the webservice

The webservice uses SOAP / WSDL architecture. The WSDL is a computer-parsable file that describes how to interrogate the EThOS service. The XSD describes which elements you need to send to the service,and what data you'll get back.

WSDL: http://ethosdownload.bl.uk/EthosDownload/EthosDownloadService?wsdl

XSD: http://ethosdownload.bl.uk:80/EthosDownload/EthosDownloadService?xsd=1

For EPrints, you can use the SOAP::Lite module to format data and send it to webservice.

About the code

The code available at http://files.eprints.org/778/ allows you to query the webservice and download theses for your institution.

The code consists of two parts:

  1. a bin script that runs from the command line.
  2. a config file (format may seem a little convoluted. It was written to cope with the 3-institution consortium model of the White Rose Etheses Online archive).

The configuration allows you to specify defaults for EThOS imports on top of the normal record defaults. It also allows you to match and process some data elements to match the EPrint model e.g. thesis_type (a controlled list in EPrints, a text element in EThOS).

The code is designed to be run periodically (e.g. monthly) and will attempt to download thesis where possible/necessary.

A word on Identifiers

Data harvested from your repository will include a institutionReference element - from the OAI-PMH interface. EThOS records are issued with an EThOSid. This isn't directly stated in the webservice, but is the eprintId prefixed with 'uk.bl.ethos.'. By storing this identifier in the EPrints record, future runs of the download too can see that the record has already been ingested. It should also help the EThOS service match records too.

Downloading in anger

For the White Rose Etheses Online repository, we created a user, a user type and a custom workflow for importing the EThOS records.

This allows us to deal with the imported records as a seperate task to incoming student-submitted etheses. The workflow for our editors is minimal. There is one screen for metadata, and one screen for the files.

The metadata screen was specified by the team of people who would be working with it. Metadata elements that are required are at the top, those that would probably not get completed are collapsed at the bottom of the screen.

Issues with the plugin

The plugin code was initially written to download the initial batch of theses being digitised by the BL. It was designed to get the record back to the local repository quickly. It doesn't use the full UKETD_DC record available through the webservice. An updated script will be published that makes full use of the UKETD_DC record.

Issues with the Webservice

  • If you try and download a lot of records the connection to the webservice may time out.
  • Author names: Messy - and don't match those available via the EThOS website. This is being investigated by the EThOS crew.