New Features in EPrints 3.2
Plug-in Based Storage Layer *CURRENTLY IN DEVELOPMENT*
The idea here is to separate the storage layer from the direct control of the repository and instead enable plugins to be written conforming to a common API which store and retrieve the relevant data upon request.
The main key influences for this work come from the current Open Storage and Interoperability movement which became very clear at the [Open Repositories Conference 2008|http://or08.ecs.soton.ac.uk]. Both movements are being well backed by both [Sun|http://www.sun.com] and [Microsoft|http://www.microsoft.com] who are looking at Hardware and Software solutions respectively. Many projects are also doing a lot of promotion in this area not just for their own gain but also for that of the community include:
- The Sun Honeycomb Project - Are promoting the use of an Open Storage layer for repository softwares (Eprints, Fedora and DSpace)
- [The Common Repository Interoperability Group|http://www.ukoln.ac.uk/repositories/digirep/index/CRIG] (CRIG) - Promote interopability through common APIs/Schemas and Open Storage.
- [Preserv | http://preserv.eprints.org] - Promoting and Developing Preservation tools/techniques for digital data in intitutional repositories. Both Interoperability and Open Storage lend themselves well towards digital preservation.
How it Works
It is important to undertand that the plug-in based storage layer is not just a layer which allows you to plug-in a single storage plugin. A Storage Controller which sits between EPrints and the Plugins will also be customisable by the repository manager allowing them to define which storage plug-in will be used dependant on things such as file type, file usage and file size. Each plug-in only needs to conform to the basic set of API calls in order to work.
The Storage API (Alpha release)
NOTE - Although the function names will remain the same through to the actual finished version of the API, it is very likely that the parameters which are passed to each one will change as they are optimised.
- store (dataObj, bucket, uri, fh)
- retrieve ( dataObj, bucket, uri)
- delete ( dataObj, bucket, uri)
- get_size ( dataObj, bucket, uri)
- dataObj = The EPrints data object containing the metadata relating to this object and EPrint.
- bucket = In the classic case this represents the type of the object, e.g. a "Thumbnail" which is a repository layer file. There are many types of bucket and a good example of bucket usage would be to store repository layer files (which can be reconstracted from the user data) locally and store user level files on an Open Storage server.
- uri = The uri of the file! This is generated by eprints and each plugin must know how to store a file against a uri and retrieve this file given this uri.
We are still thinking of a way to optimise the retrieve process such that a map is stored of the uri -> plug-in/s relationship. This will speed up the file retrieval process and may involve some OAI-ORE at some point.
The Storage Controller (0.1 Alpha)
Technical documentation coming soon.