Files/EThOS webservice download tool

From EPrints Documentation
Revision as of 14:24, 28 June 2012 by Libjlrs@leeds.ac.uk (talk | contribs) (Added EThOS identifiers note)
Jump to: navigation, search

The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. The code uses:

  • SOAP::Lite (to make the calls to the webservice)
  • LWP (to download the Zip files)
  • Archive::Zip (to extract the files)
  • Time::Local (to generate timestamps for the webservice)

The code consists of two parts:

  • a bin script (goes in ~/bin/)
  • a config file (goes in ~/archives/ARCHIVEID/cfg/cfg.d/)

Run ~/bin/ethos to see usage

The following rules are used: 1) If an incoming thesis contains an institutionalReference (therefore has been harvested):

  • A search is done on the reference
    • If the eprint doesn't exist, no import is done.
    • If the eprint exists and doesn't have an id_number, the id_number is added.
    • If the eprint exists and the id_number matches the ethosId no import is done.
    • If the eprint exists and the id_number doesn't match the ethosId, an error is reported.

2) If the incoming thesis does not have an institutionalReference, a search for the ethosId in the id_number field is conducted.

  • If an eprint exists with that ethosId, no import is done.
  • If the ethosId is not found, a new eprint is created. Files are downloaded, and added to the eprint. Any issues with the download are recorded in the 'suggestions' field.

The following fields are set in the EPrint:

  • userid
  • title
  • date
  • abstract
  • id_number
  • keywords
  • creators_name
  • thesis_type
  • suggestions

The scripts were written based on the White Rose Etheses Online set up. This means the config file may seem a bit warped in it's structure - as it was designed to be able to cope with 3 institutions.


EThOS Identifiers

EThOS identifiers are of the form: uk.bl.ethos.xxxxx. If you have ethos identifiers stored in your system in a different format, you will need to tweak the code to cope with your way of storing them. Any question/problems with the script, email the tech-list!