Difference between revisions of "Access Log Dataset"
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:Manual]] | ||
+ | |||
EPrints 3 introduces some new features for capturing and analysing usage of the repository. | EPrints 3 introduces some new features for capturing and analysing usage of the repository. | ||
Everytime an abstract or full-text is accessed a record is written to the access dataset recording data about the access. | Everytime an abstract or full-text is accessed a record is written to the access dataset recording data about the access. | ||
+ | |||
+ | The access dataset can be used like any other data set in the EPrints system e.g. through the EPrints API or imported/exported. | ||
== access data set fields == | == access data set fields == | ||
Line 10: | Line 14: | ||
!field | !field | ||
!description | !description | ||
+ | !openurl equivalent | ||
|- | |- | ||
|accessid | |accessid | ||
|A unique identifier for the access (sequential numbering). | |A unique identifier for the access (sequential numbering). | ||
+ | |context-object identifier | ||
|- | |- | ||
|datestamp | |datestamp | ||
|The time the access occurred in UTC. | |The time the access occurred in UTC. | ||
+ | |context-object "timestamp" attribute | ||
|- | |- | ||
|requester_id | |requester_id | ||
|An identifier for the user that made the request. | |An identifier for the user that made the request. | ||
+ | |requester/identifier | ||
|- | |- | ||
|requester_user_agent | |requester_user_agent | ||
|The user agent string as given by the user's browser. | |The user agent string as given by the user's browser. | ||
+ | |requester/private-data | ||
|- | |- | ||
|requester_country | |requester_country | ||
|The ISO country code for the user's location. | |The ISO country code for the user's location. | ||
+ | |N/A | ||
|- | |- | ||
|requester_institution | |requester_institution | ||
|The user's institution or organisation. | |The user's institution or organisation. | ||
+ | |N/A | ||
|- | |- | ||
|referring_entity_id | |referring_entity_id | ||
|The identifier of the object that the user followed a link from. | |The identifier of the object that the user followed a link from. | ||
+ | |referring_entity/identifier | ||
|- | |- | ||
|service_type_id | |service_type_id | ||
|The type of service requested, either fulltext=yes or abstract=yes. | |The type of service requested, either fulltext=yes or abstract=yes. | ||
+ | |service-type/* (see info:ofi/fmt:xml:xsd:sch_svc) | ||
|- | |- | ||
|referent_id | |referent_id | ||
|The identifier of the eprint requested. | |The identifier of the eprint requested. | ||
+ | |referent/identifier | ||
|- | |- | ||
|referent_docid | |referent_docid | ||
|The document number requested (for fulltext requests). | |The document number requested (for fulltext requests). | ||
+ | |N/A | ||
|- | |- | ||
|} | |} |
Latest revision as of 12:02, 8 February 2010
EPrints 3 introduces some new features for capturing and analysing usage of the repository.
Everytime an abstract or full-text is accessed a record is written to the access dataset recording data about the access.
The access dataset can be used like any other data set in the EPrints system e.g. through the EPrints API or imported/exported.
Contents
access data set fields
OpenURL terminology is used where an equivalent term in OpenURL is defined.
field | description | openurl equivalent |
---|---|---|
accessid | A unique identifier for the access (sequential numbering). | context-object identifier |
datestamp | The time the access occurred in UTC. | context-object "timestamp" attribute |
requester_id | An identifier for the user that made the request. | requester/identifier |
requester_user_agent | The user agent string as given by the user's browser. | requester/private-data |
requester_country | The ISO country code for the user's location. | N/A |
requester_institution | The user's institution or organisation. | N/A |
referring_entity_id | The identifier of the object that the user followed a link from. | referring_entity/identifier |
service_type_id | The type of service requested, either fulltext=yes or abstract=yes. | service-type/* (see info:ofi/fmt:xml:xsd:sch_svc) |
referent_id | The identifier of the eprint requested. | referent/identifier |
referent_docid | The document number requested (for fulltext requests). | N/A |
requester_id
The requester id contains an identifier for the user that made the request. In the release version this consists of uri:ip: followed by the IP address of the user's connection. Particular network topographies may result in the IP address being an intermediary e.g. a caching proxy server.
In future the requester_id field may contain a unique cookie or similar mechanism for identifying users.
Requester country and institution
If available the Maxmind GeoIP databases can be used to capture the country code and organisation of the user, based on their IP address.
referent_id
The referent - the object being requested - is stored using the full OAI identifier. This may be replaced with just the eprint number.
Filtering at the LogHandler Stage
Requests to the EPrints web server are captured using a mod_perl handler (EPrints::Apache::LogHandler). This handler gets called on every request to the web server. The handler filters out all non-HTTP 200 requests (e.g. ignores redirects and partial-content). Only requests to abstract and full-text URLs are recognised i.e. requests to /xx/ and /xx/yy/... where xx is the eprint id and yy the document id.
As a special case any requests to a full-text where the referring entity is also a request to the full-text are ignored. The net result of this is to ignore all inline content e.g. images and javascript in HTML documents.
Harvesting accesses from an EPrints 3 repository
Currently Disabled!
A new OAI interface has been added to EPrints 3 that reads records from the access data set. This supports all the normal OAI verbs but only supports exposing metadata in OpenURL ContextObject format.
Because accesses are explicitly time based a harvester can easily harvest accesses for a specific time period by using the OAI from and until arguments.
Disk Usage Considerations
The disk usage requirements for storing accesses in the EPrints database are yet to be fully tested.