Preservation Support in GNU EPrints 3
GNU EPrints version 3 introduces a number of features that will help support the preservation of digital objects stored in repositories. We refer here to preservation as providing for the long-term access to digital objects.
The features described here have been jointly developed with the Preserv project, with coding on the METS and Creative Commons (CC) licensing components by Preserv. The features are designed to allow an EPrints repository to support preservation through a specialist service provider. The key actions covered include:
- Recording changes to a repository object by updating its 'preservation metadata' (History Module)
- Enabling the service provider to download all the files and metadata comprising an object (METS and DIDL export plugins)
- Notifying the service provider of any rights it has to copy and act on the content of an object (CC licencing)
Complex-Object Export: METS and DIDL plugins
There are many ways to disseminate digital objects stored in an EPrints repository, depending on whether the request for an object comes from a human user or a machine (e.g. a search engine robot), and on what service is requested. The request may be for a full-text document, or for the data to be presented in some other format.
To increase the number of 'export' formats available in EPrints v3 it is possible to write plugins - modular bits of code that are dynamically loaded into the system. Plugins are a new mechanism for customising and extending EPrints by writing code that interacts with the core software but is not part of it. In this way plugins can be written and implemented independently, and made available for others to use. These modules are written in Perl and can perform a number of roles: importing and exporting from and to arbitrary formats, converting documents from one format to another (for full-text indexing) and user interface widgets. Plugins sit in their own directory and are registered by GNU EPrints when the system is run.
EPrints export plugins convert repository objects (things that contain metadata and files) into streamed data, in most cases XML. Objects can be exported through the EPrints Web interface (in search results, abstract pages or from the user's workspace), the OAI-PMH interface, or from the command-line `export` script.
There are already a number of object types in EPrints that are supported by export plugins, as shown in Figure 1. The OpenURL !ContextObject export plugin can also be used to export !AccessLog objects, which enables access logs to be harvested from GNU EPrints. Document objects are exported as part of an EPrint object. (There are several other EPrint objects available in EPrints - Users, saved searches and History - but these aren't yet exportable, until export plugins have been written to support them.)
Figure 1. Export formats suported by EPrints plugins
If the request to export an object from a repository is from a preservation service provider, it will need to obtain all the files associated with the object. Objects that have more than one file are often referred to as 'complex objects'.
To support more efficient transfer of complex objects, two formats that may be used to disseminate objects for digital preservation are Metadata Encoding Transmission Standard METS and MPEG-21 Digital Item Declaration Language DIDL. Examples of the XML exports for these two formats are shown in Figures 2 and 3.
The METS export plugin is derived from work done by the Repository Bridge project, who implemented a METS export for EPrints 2. This has been updated for the new plugin architecture and data model in EPrints 3.
Figure 2. METS export format example
DIDL support has been built by Chris Gutteridge (the lead developer on EPrints) as the result of collaboration with researchers from the Los Alamos National Laboratory who have utilised MPEG-21 DIDL to build digital library systems. While DIDL is less well known in the digital preservation field, it serves a similar purpose and may well end up being more widely used (given the strength of backing of media companies in MPEG standards).
Figure 3. MPEG-21/DIDL export format example
EPrints v3 introduces a new 'history' function that documents all changes made to records, from the point of deposit (when the record was first created) onwards. Currently this feature is used to provide an audit trail for editorial purposes, but as digital preservation services are developed they will need to modify the content of repositories, e.g. to migrate file formats, and such actions must be recorded to inform later preservation decisions. To keep track of what has happened to a record the repository will need to store both the object itself and all actions that have been performed on it over time.
Figure 4 visualises changes that have been made to an object. In this instance a new 'Document' has been added to the record, with the consequent addition of new files.
Figure 4. History module shows the addition of a new document in the record for an object
Preservation Rights Declaration
An issue raised by the British Library, a partner acting as a prospective preservation service provider in the Preserv project, is that it needs appropriate permissions to handle materials for preservation purposes. With this in mind a 'license' option has been added to the EPrints deposit process allowing the depositing user to provide an explicit license for access to their deposited materials (Figures 5 and 6). Based on Creative Commons, these licenses have been included in the EPrints v3 beta version. If an explicit license is required for preservation purposes (e.g. the right of a third party to store and act on the material, e.g. perform migration, etc.) this can easily be included.
Figure 5. Part of EPrints deposit form showing License field
Figure 6. Drop-down options from License field