IRStats
IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.
Technical Overview
The following is a quick tour of IRStats.
Parameters
IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:
Start Date and End Date
Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.
An Eprint Set
As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.
View
The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.
Views
Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.
Visualisations
A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.
The Database Interface
The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.
Data Flow Diagram
Required Data
In order for IRStats to run, it requires two things:
- a database table containing all hits to the repository
- text files describing the contents of the repository
The Hits Table
Awaiting a redevelopment.
The Text Files
In order for IRStats to build up a picture of a repository, a number of text files need to be created and stored in the cfg/ directory:
- epstats_set_membership.txt
- epstats_set_member_codes.txt
- epstats_set_member_full_citations.txt
- epstats_set_member_short_citations.txt
- epstats_set_member_urls.txt
Explanation by Example
Imagine a very small repository. Here are its contents:
- eprints
- (1) The Smells of Cheese
- (2) The Tastes of Wines
- (3) The Sounds of Oboes
- Authors
- (1) John Smith
- (2) Harriet Jones
If we then imagine that the following are also true:
- John Smith is credited with being an author of eprints (1) and (2)
- Harriet Jones is credited with being an author of eprints (2) and (3)
- All three eprints are the output of a research group named "Senses"
Creating sets
Sets are groups of eprints, and every eprint is a member of at least one set (the set containing only that eprint). From the information above, we have three sets. The eprint set, the author set and the research group set. We need to add the following to epstats_set_membership.txt (the format is <id><tab><csv list of eprint ids>
author_1 1,2 author_2 2,3 group_1 1,2,3 eprint_1 1 eprint_2 2 eprint_3 3
Giving Sets IDs
So, we now have some sets, but we need to give them unique IDs so that we can retrieve stats for these sets. To do this, we add the following to epstats_set_member_codes.txt:
author_1 js author_2 hj group_1 senses eprint_1 1 eprint_2 2 eprint_3 3
IRStats now assigns the following unique IDs to each set: author_js, author_hj, group_senses, eprint_1, eprint_2, eprint_3. Note that the IDs should probably be kept alphanumeric, and must be unique within a class of sets (but you can have author_hj, group_hj and eprint_hj).
Citations
IRStats uses two citations for each set member, one short and one long. Which you use depends on how you would like your visualisation to look. However, we do need to add these to the citations files:
epstats_set_member_short_citations.txt
author_1 Smith
epstats_set_member_full_citations.txt
author_1 Dr John Smith, PhD
Note that the above examples are only for author_1. It would be exactly the same for any set member.
URLs
Although URLs are not currently implemented, it is probably a good idea to include this information (in epstats_set_member_urls.txt) for future functionality.
author_1 http://homepage.john.smith.com/