Files/OAI Harvester for EPrints 3.2+

From EPrints Documentation
Revision as of 00:17, 3 October 2018 by Kgoetz (talk | contribs) (bring in contents from Synchronize your repository via OAI-PMH)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

About OAI Harvester plugin

Keeps your repository in-sync with another repository via OAI-PMH.

Note that these modules contain only the abstract classes, you will need to write your own module which translate whatever XML format you're harvesting to EPrints data structure. An example is provided under cfg/plugins/EPrints/Plugin/Import/OAIPMH/Stub.pm.stub.

To be able to harvest another repository using OAIPMH you must install this plugin and configure it per instructions below.

Install

  • As root user:

cpan HTTP::OAI

  • As eprints user:

cp -rf bin/ cfg/ ./archives/ARCHIVEID/

check the perl declaration in file ./archives/ARCHIVEID/bin/oai/harvest and change it accordingly

./bin/epadmin update_database_structure ARCHIVEID

  • As root user:

Restart Web Server.

Usage

  • Create a module which transforms XML as explained in the intro of this file. Say you created a DC importer called OAIPMH::OAI_DC.
  • Edit the configuration file cfg/cfg.d/oai_harvester.pl and create a new configuration for the service you want to harvest (an example is provided in that file) eg:
$c->{oai_harvester}->{service_name} = {
     url => 'http://oaiserver.com/oai',
     set => 'that_set'
     default_values => sub {
          my( $session, $epdata, $header ) = @_;
          $epdata->{userid} = 1234;	# user '1234' will own all imported publications
          $epdata->{eprint_status} = 'archive';	# imported publications will go straight to the live archive
          # etc...
     },
};
  • Run (or setup in cron) bin/oai/harvest periodically:

bin/oai/harvest ARCHIVEID --plugin=OAIPMH::OAI_DC --conf=service_name

That's it!

Export your repository via OAI-PMH

Make sure you follow the instructions on OAI to configure your repository with the right OAI elements.