IRStats2 API

From EPrints Documentation
Jump to: navigation, search

A few examples on how to get data from IRStats2.

There are two ways to get it: from a script (this is the real API, ie. using PERL) or from an Ajax request (this is to embed data on a page, ie. using JavaScript/HTML).

Prerequisites

You must have IRStats2 installed on your repository, and preferably, have some data already processed by IRStats2. Look at http://github.com/eprints/irstats2 on how to get started with IRStats2.

Core concepts

Datatype

Which data to provide: IRStats2 also the processing of any data on your repository. The typical use of IRStats2 is however for usage statistics so this is the main dataset. But data on deposits, open access, full text (etc) are also processed. Some repositories even include data from scopus (citation counts).

Main datatypes:

  • downloads
  • deposits
  • doc_access
  • doc_format
  • history
  • referrer
  • search_terms
  • views (similar to downloads: how many times the summary page of an eprint was hit)
  • browser

Sets

By default, IRStats2 returns data over the entire repository ie. the entire set of eprints is assumed.

You can however restrict which "set" to use: the publications of an author, of a university division, of a subject, etc.

Dates and Ranges

You can also restrict by dates or by a range. By default, all the stats are returned without any dates restrictions.

Dates can be set as YYYYMMDD or YYYY-MM-DD or YYYY/MM/DD (eg. 20140101, 2013-11-04 etc). Dates is a hash containing two keys: from and to (either can be omitted to say: from that particular date, or up to that particular date).

Ranges follow a %d%c format and the upper limit is "now" or "today", for instance:

  • 6m: over the past 6 months
  • 12d: over the past 12 days
  • 3y: over the past 3 years

Only "m" (months), "d" (days) or "y" (years) may be used. You can see that 12m is the same as 1y.

Groupings

This tells IRStats2 how to group data and is generally only used for things like "give me the TOP eprints", "give me the TOP authors".

So having a "grouping" set to "eprint" means the top eprints. If set to "authors", the top authors etc. The grouping must be a valid set except for when it equals to "eprint".

Further restrictions

It is possible to limit the amount of records being returned (for when this is relevant: if you want the top downloads, since the beginning of time, then you'd only get one data row back, which is that count). But for queries which ask for, say, the top authors, it is then interesting to be able to get only the first 10 authors. 10 here is the limit.

It is also possible to ask IRStats2 to return certain data field in queries. For top eprints, you generally want the "eprintid" field. To draw timeline graphs (eg. evolution of downloads over-time), you'd want the "datestamp" field. More examples are illustrated below.

Data from scripts

Main API

# get the IRStats2 handler, required to query IRStats2
my $handler = $repo->plugin( "Stats::Handler" );

# ask IRStats2 to show debug statements (SQL queries)
$handler->debug(1);

# Create a Context object
my $ctx = $handler->context( { datatype: "downloads" } );

# Retrieve data rows
my $data = $handler->data( $ctx )->select();

# How many rows returned:
printf "I got %d data rows back\n", $data->count;

# Get stats for divisions "uos-ecs":
$ctx->set( { set_name => 'divisions', set_value => 'uos-ecs' } );

# Get stats over the last 6 months:
$ctx->dates( { range => '6m' } );

# Get stats between 1st January 2012 and 31st March 2012:
$ctx->dates( { from => '20120101', to => '20120331' } );

Full Examples

Actually those are not really full examples. They assume you can write the beginning of a PERL script and that you have already instantiated the Stats Handler (cf. above) as $handler.

# How many downloads in total over the entire repository

my $ctx = $handler->context( { datatype => "downloads" } );
printf "I got %d downloads\n", $handler->data( $ctx )->select->sum_all;
# How many downloads in 2013 over the entire repository

my $ctx = $handler->context( { datatype => "downloads", range => "2013" } );
printf "I got %d downloads\n", $handler->data( $ctx )->select->sum_all;
# The top 5 EPrints over the entire repository

my $ctx = $handler->context( { grouping => "eprint", datatype => "downloads" } );

my $stats = $handler->data( $ctx )->select( fields => ["eprintid"], limit => 5 );

foreach( @{ $stats->data } )
{
        printf "EPrint %d got %d downloads\n", $_->{eprintid}, $_->{count};
}
# The top 10 Subjects (let's assume LoC) for deposits (not downloads!!)

my $ctx = $handler->context( { set_name => "subjects", datatype => "deposits" } );

my $stats = $handler->data( $ctx )->select( fields => ["set_value"], limit => 10 );

my $i = 1;
foreach( @{ $stats->data } )
{
        printf "%d) %s with %d items deposited\n", $i++, $_->{set_value}, $_->{count};
}
# The top 5 downloaded EPrints for LoC Subject "D1"

my $ctx = $handler->context( { set_name => "subjects", set_value => 'D1', datatype => "downloads" } );

my $stats = $handler->data( $ctx )->select( fields => ["eprintid"], limit => 5 );

my $i = 1;
foreach( @{ $stats->data } )
{
        printf "%d) EPrintd %d with %d downloads\n", $i++, $_->{eprintid}, $_->{count};
}

Embedding data

This is similar to retrieving data from scripts (cf. section above) but with a few extra options:

  • "view": the name of the Stats::View plug-in which will draw the requested stuff (a Table? a Graph? etc.)
  • "container_id": the DOM element "id", where the drawn stuff will be inserted on the page (if the Ajax callback is successful)

Then there exists a number of options proper to each View plug-in. See the provided examples below.

Graphs

The typical example is to embed the global downloads graph. This is usually the first displayed item on the IRStats2 main report page (/cgi/stats/report).