Difference between revisions of "Files/EThOS webservice download tool"

From EPrints Documentation
Jump to: navigation, search
(Created page with 'The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. The code uses: * SOAP::Lite (to make the …')
 
Line 34: Line 34:
 
* thesis_type
 
* thesis_type
 
* suggestions
 
* suggestions
 +
 +
The scripts were written based on the White Rose Etheses Online set up. This means the config file may seem a bit warped in it's structure - as it was designed to be able to cope with 3 institutions.
 +
 +
Any question/problems with the script, email the tech-list!

Revision as of 14:18, 28 June 2012

The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. The code uses:

  • SOAP::Lite (to make the calls to the webservice)
  • LWP (to download the Zip files)
  • Archive::Zip (to extract the files)
  • Time::Local (to generate timestamps for the webservice)

The code consists of two parts:

  • a bin script (goes in ~/bin/)
  • a config file (goes in ~/archives/ARCHIVEID/cfg/cfg.d/)

Run ~/bin/ethos to see usage

The following rules are used: 1) If an incoming thesis contains an institutionalReference (therefore has been harvested):

  • A search is done on the reference
    • If the eprint doesn't exist, no import is done.
    • If the eprint exists and doesn't have an id_number, the id_number is added.
    • If the eprint exists and the id_number matches the ethosId no import is done.
    • If the eprint exists and the id_number doesn't match the ethosId, an error is reported.

2) If the incoming thesis does not have an institutionalReference, a search for the ethosId in the id_number field is conducted.

  • If an eprint exists with that ethosId, no import is done.
  • If the ethosId is not found, a new eprint is created. Files are downloaded, and added to the eprint. Any issues with the download are recorded in the 'suggestions' field.

The following fields are set in the EPrint:

  • userid
  • title
  • date
  • abstract
  • id_number
  • keywords
  • creators_name
  • thesis_type
  • suggestions

The scripts were written based on the White Rose Etheses Online set up. This means the config file may seem a bit warped in it's structure - as it was designed to be able to cope with 3 institutions.

Any question/problems with the script, email the tech-list!