Download Metrics

From EPrints Documentation
Revision as of 12:24, 25 May 2007 by Timbrody (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Download Metrics

Download Metrics in EPrints

Capturing Usage

Eprints uses the Apache web server and mod_perl (an accelerator for Perl CGI scripts). mod_perl provides hooks to the logging and control features of the Apache server. One of these hooks is to the access logging function. A log handler is registered with Apache by using the PerlLogHandler pragma e.g.

PerlLogHandler EPrints::Apache::LogHandler

This handler is called after a response has been generated by the web server (i.e. in response to requests coming in from web clients).

In the initial implementation we are only interested in requests to eprint objects. This ignores requests to static content (e.g. the home page), searches and user-specific content (e.g. depositing processes). This is because our current requirements are to capture usage to determine download metrics for objects contained in a repository, rather than analysing how a repository is used.

For the remainder of this section we will use the following example request (as you would otherwise see in an Apache access log file):

127.0.0.1 - - [25/May/2007:13:07:24 +0100] "GET /12614/01/Semantic_Web_Revisted.pdf HTTP/1.1" 200 130951 "http://www.w3.org/2001/sw/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

The pertinent parts of this entry are:

field example
requesting host 127.0.0.1
date/time of request (by server time) 25/May/2007:13:07:24 +0100
page requested /12614/01/Semantic_Web_Revisted.pdf
HTTP response code (200 is 'ok') 200
referring web page http://www.w3.org/2001/sw/
user's web browser (or web crawler id) Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

In the Eprints log handler any non-200 HTTP response code requests are ignored i.e. any errors or redirects. Requests are then split up into three sets, non-eprint requests, eprint abstract requests and full-text requests. This is based on the page requested:

/12614/ /01/ Semantic_Web_Revisted.pdf
eprint identifier document identifier file requested

Storing Usage

Exposing Usage

Analysing Usage

Analytical Considerations for Usage