Access Log Dataset
EPrints 3 introduces some new features for capturing and analysing usage of the repository.
Everytime an abstract or full-text is accessed a record is written to the access dataset recording data about the access.
The access dataset can be used like any other data set in the EPrints system e.g. through the EPrints API or imported/exported.
access data set fields
OpenURL terminology is used where an equivalent term in OpenURL is defined.
|accessid||A unique identifier for the access (sequential numbering).||context-object identifier|
|datestamp||The time the access occurred in UTC.||context-object "timestamp" attribute|
|requester_id||An identifier for the user that made the request.||requester/identifier|
|requester_user_agent||The user agent string as given by the user's browser.||requester/private-data|
|requester_country||The ISO country code for the user's location.||N/A|
|requester_institution||The user's institution or organisation.||N/A|
|referring_entity_id||The identifier of the object that the user followed a link from.||referring_entity/identifier|
|service_type_id||The type of service requested, either fulltext=yes or abstract=yes.||service-type/* (see info:ofi/fmt:xml:xsd:sch_svc)|
|referent_id||The identifier of the eprint requested.||referent/identifier|
|referent_docid||The document number requested (for fulltext requests).||N/A|
The requester id contains an identifier for the user that made the request. In the release version this consists of uri:ip: followed by the IP address of the user's connection. Particular network topographies may result in the IP address being an intermediary e.g. a caching proxy server.
In future the requester_id field may contain a unique cookie or similar mechanism for identifying users.
Requester country and institution
If available the Maxmind GeoIP databases can be used to capture the country code and organisation of the user, based on their IP address.
The referent - the object being requested - is stored using the full OAI identifier. This may be replaced with just the eprint number.
Filtering at the LogHandler Stage
Requests to the EPrints web server are captured using a mod_perl handler (EPrints::Apache::LogHandler). This handler gets called on every request to the web server. The handler filters out all non-HTTP 200 requests (e.g. ignores redirects and partial-content). Only requests to abstract and full-text URLs are recognised i.e. requests to /xx/ and /xx/yy/... where xx is the eprint id and yy the document id.
Harvesting accesses from an EPrints 3 repository
A new OAI interface has been added to EPrints 3 that reads records from the access data set. This supports all the normal OAI verbs but only supports exposing metadata in OpenURL ContextObject format.
Because accesses are explicitly time based a harvester can easily harvest accesses for a specific time period by using the OAI from and until arguments.
Disk Usage Considerations
The disk usage requirements for storing accesses in the EPrints database are yet to be fully tested.