Difference between revisions of "IRStats 2"

Revision as of 14:01, 5 December 2014

IRStats2 is a statistical framework for EPrints - It comes with some cool default tools and reports and it can also be customised to, for instance, add new metrics or data sets. It has a Javascript API to include stats on any pages you want.

IRStats2 is developed against EPrints 3.3 but it was written to also work on EPrints 3.2. Older versions of EPrints are, however, not supported.

Installation

Dependencies

The following perl libraries are required:

* Geo::IP or Geo::IP::PurePerl
* Date::Calc

Both can usually be installed via your Linux package managers (apt-get, yum, ...) or via CPAN if you must.

EPrints 3.3

IRStats2 can be installed directly via the Bazaar on EPrints 3.3 which makes the installation much simpler than with EPrints 3.2.

EPrints 3.3.11 onwards

Install IRStats from the bazaar following installation of dependencies. Restarting apache afterwards is recommended.

EPrints 3.3.1 to 3.3.10

Following bazaar installation, two patches need to be applied if you would like to use the Google "map of the world" in your reports. Everything else should work as normal without the patches.

The patches relate to an incompatibility between the Prototype JS library (used by EPrints) and Google Charts (used by IRStats2). The two patches you need to apply are:

EPrints 3.2

You will have to manually copy the required files to your EPrints installation path. It is a low-risk operation since IRStats2 is a true add-on to EPrints and it does not interact with the core software.

Get the files from [1]GitHub or by following this link [tar.gz]. Copy the modules and various configuration files to your local archive (create the bin and cgi directories if they don't exist):

cp bin/* /opt/eprints3/archives/ARCHIVEID/bin/
cp cfg/* /opt/eprints3/archives/ARCHIVEID/cfg/
cp cgi/* /opt/eprints3/archives/ARCHIVEID/cgi/

It's a good idea to run a test at this point to see if anything has broken:

/opt/eprints3/bin/epadmin test

Add in the <head> sections of your template files (usually located in /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/templates/) the following:

<script type="text/javascript" src="http://www.google.com/jsapi">// <!-- No script --></script>
<script type="text/javascript">
        google.load("visualization", "1", {packages:["corechart", "geochart"]});
</script>

Finally, restart the web server

Processing Stats

IRStats uses its own tables to manage statistics, which it populates from the EPrints access table (a table containing a row for every access to EPrints objects). Once installed, IRStats needs to process the full contents of this table. The following command needs to be run.

/opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID --setup --verbose

Note that this may take a long time, perhaps up to several days for a large repository.

Following this initial run, the script needs to be added to the eprints crontab to run every night:

 perl -I/EPRINTS_ROOT/perl_lib /opt/eprints3/archives/REPOSITORY_ID/bin/stats/process_stats REPOSITORY_ID 1>/dev/null 2>/dev/null

Processing works in two steps: the initial processing and then a daily incremental processing. Because the initial processing will take care of all your legacy "download" data, this can take a long time. I do mean: a loooooong time. It can take a few days if your repository is very large. More likely it will take a few hours.

For the initial processing, run, as the "eprints" user, the above command. And remember this may take a long time to complete. So if you are running it from an SSH session, you may want to use the "screen" Linux utility to make sure your SSH session will persist.

/opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID --setup --verbose For the daily incremental processing, add the above line in cron. I let you the liberty of deciding when that cron job should run but it is a good idea to do over-night when there is less traffic to your repository.

perl /opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID 1>/dev/null 2>/dev/null The two redirections to /dev/null forces the process to not output anything.

When the initial processing has completed, you may point your browser to http://yourrepo.url/cgi/stats/report to look at some stats!

Difference between revisions of "IRStats 2"

Revision as of 14:01, 5 December 2014

Contents

Installation

Dependencies

EPrints 3.3

EPrints 3.3.11 onwards

EPrints 3.3.1 to 3.3.10

EPrints 3.2

Processing Stats

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Wiki

Tools

@@ Line 50: / Line 50: @@
 Add in the <head> sections of your template files (usually located in /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/templates/) the following:
+<pre>
 <script type="text/javascript" src="http://www.google.com/jsapi">// <!-- No script --></script>
 <script type="text/javascript">
          google.load("visualization", "1", {packages:["corechart", "geochart"]});
 </script>
+</pre>
 Finally, restart the web server
+== Processing Stats ==
+IRStats uses its own tables to manage statistics, which it populates from the EPrints access table (a table containing a row for every access to EPrints objects).  Once installed, IRStats needs to process the full contents of this table.  The following command needs to be run.
+ /opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID --setup --verbose
+Note that this may take a long time, perhaps up to several days for a large repository.
+Following this initial run, the script needs to be added to the eprints crontab to run every night:
+  perl -I/EPRINTS_ROOT/perl_lib /opt/eprints3/archives/REPOSITORY_ID/bin/stats/process_stats REPOSITORY_ID 1>/dev/null 2>/dev/null
+Processing works in two steps: the initial processing and then a daily incremental processing. Because the initial processing will take care of all your legacy "download" data, this can take a long time. I do mean: a loooooong time. It can take a few days if your repository is very large. More likely it will take a few hours.
+For the initial processing, run, as the "eprints" user, the above command. And remember this may take a long time to complete. So if you are running it from an SSH session, you may want to use the "screen" Linux utility to make sure your SSH session will persist.
+/opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID --setup --verbose
+For the daily incremental processing, add the above line in cron. I let you the liberty of deciding when that cron job should run but it is a good idea to do over-night when there is less traffic to your repository.
+perl /opt/eprints3/archives/REPO_ID/bin/stats/process_stats REPO_ID 1>/dev/null 2>/dev/null
+The two redirections to /dev/null forces the process to not output anything.
+When the initial processing has completed, you may point your browser to http://yourrepo.url/cgi/stats/report to look at some stats!