Files/EThOS webservice download tool

From EPrints Documentation
Revision as of 13:59, 28 June 2012 by Libjlrs@leeds.ac.uk (talk | contribs) (Created page with 'The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. The code uses: * SOAP::Lite (to make the …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. The code uses:

  • SOAP::Lite (to make the calls to the webservice)
  • LWP (to download the Zip files)
  • Archive::Zip (to extract the files)
  • Time::Local (to generate timestamps for the webservice)

The code consists of two parts:

  • a bin script (goes in ~/bin/)
  • a config file (goes in ~/archives/ARCHIVEID/cfg/cfg.d/)

Run ~/bin/ethos to see usage

The following rules are used: 1) If an incoming thesis contains an institutionalReference (therefore has been harvested):

  • A search is done on the reference
    • If the eprint doesn't exist, no import is done.
    • If the eprint exists and doesn't have an id_number, the id_number is added.
    • If the eprint exists and the id_number matches the ethosId no import is done.
    • If the eprint exists and the id_number doesn't match the ethosId, an error is reported.

2) If the incoming thesis does not have an institutionalReference, a search for the ethosId in the id_number field is conducted.

  • If an eprint exists with that ethosId, no import is done.
  • If the ethosId is not found, a new eprint is created. Files are downloaded, and added to the eprint. Any issues with the download are recorded in the 'suggestions' field.

The following fields are set in the EPrint:

  • userid
  • title
  • date
  • abstract
  • id_number
  • keywords
  • creators_name
  • thesis_type
  • suggestions