IRStats 2

From EPrints Documentation
Revision as of 14:25, 5 December 2014 by Af05v@ecs.soton.ac.uk (talk | contribs) (Dependencies)
Jump to: navigation, search


IRStats2 is a statistical framework for EPrints - It comes with some cool default tools and reports and it can also be customised to, for instance, add new metrics or data sets. It has a Javascript API to include stats on any pages you want.

IRStats2 is developed against EPrints 3.3 but it was written to also work on EPrints 3.2. Older versions of EPrints are, however, not supported.

Installation

Dependencies

The following perl libraries are required:

* Geo::IP or Geo::IP::PurePerl
* Date::Calc

Both can usually be installed via your Linux package managers (apt-get, yum, ...) or via CPAN if you are unable to locate a package.

e.g. in Debian/Ubuntu:

apt-get install libgeo-ip-perl libdate-calc-perl

EPrints 3.3

IRStats2 can be installed directly via the Bazaar on EPrints 3.3 which makes the installation much simpler than with EPrints 3.2.

EPrints 3.3.11 onwards

Install IRStats from the bazaar following installation of dependencies. Restarting apache afterwards is recommended.

EPrints 3.3.1 to 3.3.10

Following bazaar installation, two patches need to be applied if you would like to use the Google "map of the world" in your reports. Everything else should work as normal without the patches.

The patches relate to an incompatibility between the Prototype JS library (used by EPrints) and Google Charts (used by IRStats2). The two patches you need to apply are:

EPrints 3.2

You will have to manually copy the required files to your EPrints installation path. It is a low-risk operation since IRStats2 is a true add-on to EPrints and it does not interact with the core software.

Get the files from [1]GitHub or by following this link [tar.gz]. Copy the modules and various configuration files to your local archive (create the bin and cgi directories if they don't exist):

cp bin/* /opt/eprints3/archives/ARCHIVEID/bin/
cp cfg/* /opt/eprints3/archives/ARCHIVEID/cfg/
cp cgi/* /opt/eprints3/archives/ARCHIVEID/cgi/

It's a good idea to run a test at this point to see if anything has broken:

/opt/eprints3/bin/epadmin test

Add in the <head> sections of your template files (usually located in /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/templates/) the following:

<script type="text/javascript" src="http://www.google.com/jsapi">// <!-- No script --></script>
<script type="text/javascript">
        google.load("visualization", "1", {packages:["corechart", "geochart"]});
</script>

Finally, restart the web server

Processing Stats

IRStats uses its own tables to manage statistics, which it populates from the EPrints access table (a table containing a row for every access to EPrints objects). Once installed, IRStats needs to process the full contents of this table. The following command needs to be run.

/opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID --setup --verbose

Note that this may take a long time, perhaps up to several days for a large repository.

Following this initial run, the script needs to be added to the eprints crontab to run every night:

 perl /EPRINTS_ROOT/archives/REPOSITORY_ID/bin/stats/process_stats REPOSITORY_ID 1>/dev/null 2>/dev/null


Processing works in two steps: the initial processing and then a daily incremental processing. Because the initial processing will take care of all your legacy "download" data, this can take a long time. I do mean: a loooooong time. It can take a few days if your repository is very large. More likely it will take a few hours.

For the initial processing, run, as the "eprints" user, the above command. And remember this may take a long time to complete. So if you are running it from an SSH session, you may want to use the "screen" Linux utility to make sure your SSH session will persist.

/opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID --setup --verbose For the daily incremental processing, add the above line in cron. I let you the liberty of deciding when that cron job should run but it is a good idea to do over-night when there is less traffic to your repository.

perl /opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID 1>/dev/null 2>/dev/null The two redirections to /dev/null forces the process to not output anything.

When the initial processing has completed, you may point your browser to http://yourrepo.url/cgi/stats/report to look at some stats!


Troubleshooting

Can't call method "add_trigger"

Error when running process_stats:

Use of uninitialized value in concatenation (.) or string at (eval 70) line 11.
Use of uninitialized value in concatenation (.) or string at (eval 70) line 16.
Use of uninitialized value in concatenation (.) or string at (eval 70) line 19.

------------------------------------------------------------------
---------------- EPrints System Error ----------------------------
------------------------------------------------------------------
Error in configuration:
Can't call method "add_trigger" on unblessed reference at /usr/share/eprints3/archives/sandbox/bin/../cfg/cfg.d/z_irstats2.pl line 162.


------------------------------------------------------------------
EPrints System Error inducing stack dump
 at /usr/share/eprints3/archives/sandbox/bin/stats/../../../../perl_lib/EPrints.pm line 146.
	EPrints::abort("EPrints") called at /usr/share/eprints3/archives/sandbox/bin/stats/../../../../perl_lib/EPrints/Config.pm line 151
	EPrints::Config::load_system_config() called at /usr/share/eprints3/archives/sandbox/bin/stats/../../../../perl_lib/EPrints/Config.pm line 96
	EPrints::Config::init() called at /usr/share/eprints3/archives/sandbox/bin/stats/../../../../perl_lib/EPrints.pm line 706
	require EPrints.pm called at /usr/share/eprints3/archives/sandbox/bin/stats/process_stats line 12
	main::BEGIN() called at /usr/share/eprints3/archives/sandbox/bin/stats/../../../../perl_lib/EPrints.pm line 0
	eval {...} called at /usr/share/eprints3/archives/sandbox/bin/stats/../../../../perl_lib/EPrints.pm line 0

This is due to the FindBin library not working correctly. Open the process_stats script and remove the line:

use lib "$FindBin::Bin/../../../../perl_lib";

When executing, use -I to explicitly set the EPrints perl_lib directory:

perl -I/EPRINTS_ROOT/perl_lib /EPRINTS_ROOT/archives/ARCHIVEID/bin/stats/process_stats