Download Metrics in EPrints
Eprints uses the Apache web server and mod_perl (an accelerator for Perl CGI scripts). mod_perl provides hooks to the logging and control features of the Apache server. One of these hooks is to the access logging function. A log handler is registered with Apache by using the PerlLogHandler pragma e.g.
This handler is called after a response has been generated by the web server (i.e. in response to requests coming in from web clients).
In the initial implementation we are only interested in requests to eprint objects. This ignores requests to static content (e.g. the home page), searches and user-specific content (e.g. depositing processes). This is because our current requirements are to capture usage to determine download metrics for objects contained in a repository, rather than analysing how a repository is used.
For the remainder of this section we will use the following example request (as you would otherwise see in an Apache access log file):
127.0.0.1 - - [25/May/2007:13:07:24 +0100] "GET /12614/01/Semantic_Web_Revisted.pdf HTTP/1.1" 200 130951 "http://www.w3.org/2001/sw/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
The pertinent parts of this entry are:
|date/time of request (by server time)||25/May/2007:13:07:24 +0100|
|HTTP response code (200 is 'ok')||200|
|referring web page||http://www.w3.org/2001/sw/|
|user's web browser (or web crawler id)||Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)|
In the Eprints log handler any non-200 HTTP response code requests are ignored i.e. any errors or redirects. Requests are then split up into three sets, non-eprint requests, eprint abstract requests and full-text requests. This is based on the page requested:
|eprint identifier||document identifier||file requested|