Difference between revisions of "Files/EThOS webservice download tool"
(Created page with 'The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. The code uses: * SOAP::Lite (to make the …') |
|||
(6 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. | The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection. | ||
− | The | + | |
+ | Some code is available http://files.eprints.org/778/ to pull the theses back from EThOS and to populate records that have been harvested by EThOS (via OAI-PMH) with their EThOS Ids. | ||
+ | |||
+ | A presentation from OR2012 is also available on the wiki: [[OR2012_EThOS_presentation]]. | ||
+ | |||
+ | The eprints script uses: | ||
* SOAP::Lite (to make the calls to the webservice) | * SOAP::Lite (to make the calls to the webservice) | ||
* LWP (to download the Zip files) | * LWP (to download the Zip files) | ||
Line 34: | Line 39: | ||
* thesis_type | * thesis_type | ||
* suggestions | * suggestions | ||
+ | |||
+ | The scripts were written based on the White Rose Etheses Online set up. This means the config file may seem a bit warped in it's structure - as it was designed to be able to cope with 3 institutions. | ||
+ | |||
+ | == EThOS Identifiers == | ||
+ | EThOS identifiers are of the form: '''uk.bl.ethos.''xxxxx'''''. | ||
+ | If you have ethos identifiers stored in your system in a different format, you will need to tweak the code to cope with your way of storing them. | ||
+ | Any question/problems with the script, email the tech-list! | ||
+ | |||
+ | == Downloading EThOS records in practice == | ||
+ | The process of dealing with etheses the three White Rose institutions is different at each site. | ||
+ | In general though, there is a different route and set of people dealing with theses coming from EThOS to those being deposited by students as part of teir studies. | ||
+ | We have created an 'EThOS import' user for each site, and provided a custom workflow for these users. | ||
+ | There are fewer 'required' metadata elements for EThOS theses - we don't have a record of any 'supervisor email address' for a 1956 thesis! |
Latest revision as of 09:15, 10 July 2012
The British Library EThOS service has a SOAP based webservice to allow institutions to pull digitized theses back from their collection.
Some code is available http://files.eprints.org/778/ to pull the theses back from EThOS and to populate records that have been harvested by EThOS (via OAI-PMH) with their EThOS Ids.
A presentation from OR2012 is also available on the wiki: OR2012_EThOS_presentation.
The eprints script uses:
- SOAP::Lite (to make the calls to the webservice)
- LWP (to download the Zip files)
- Archive::Zip (to extract the files)
- Time::Local (to generate timestamps for the webservice)
The code consists of two parts:
- a bin script (goes in ~/bin/)
- a config file (goes in ~/archives/ARCHIVEID/cfg/cfg.d/)
Run ~/bin/ethos to see usage
The following rules are used: 1) If an incoming thesis contains an institutionalReference (therefore has been harvested):
- A search is done on the reference
- If the eprint doesn't exist, no import is done.
- If the eprint exists and doesn't have an id_number, the id_number is added.
- If the eprint exists and the id_number matches the ethosId no import is done.
- If the eprint exists and the id_number doesn't match the ethosId, an error is reported.
2) If the incoming thesis does not have an institutionalReference, a search for the ethosId in the id_number field is conducted.
- If an eprint exists with that ethosId, no import is done.
- If the ethosId is not found, a new eprint is created. Files are downloaded, and added to the eprint. Any issues with the download are recorded in the 'suggestions' field.
The following fields are set in the EPrint:
- userid
- title
- date
- abstract
- id_number
- keywords
- creators_name
- thesis_type
- suggestions
The scripts were written based on the White Rose Etheses Online set up. This means the config file may seem a bit warped in it's structure - as it was designed to be able to cope with 3 institutions.
EThOS Identifiers
EThOS identifiers are of the form: uk.bl.ethos.xxxxx. If you have ethos identifiers stored in your system in a different format, you will need to tweak the code to cope with your way of storing them. Any question/problems with the script, email the tech-list!
Downloading EThOS records in practice
The process of dealing with etheses the three White Rose institutions is different at each site. In general though, there is a different route and set of people dealing with theses coming from EThOS to those being deposited by students as part of teir studies. We have created an 'EThOS import' user for each site, and provided a custom workflow for these users. There are fewer 'required' metadata elements for EThOS theses - we don't have a record of any 'supervisor email address' for a 1956 thesis!