Files/Simple usage statistics (using analog)

From EPrints Documentation
Revision as of 12:40, 6 July 2007 by Ckeene (Talk | contribs)

Jump to: navigation, search

Summary

Provide basic usage statistics, for individual eprints and the whole system. Other tools and systems are available to provide more sophisticated solutions. It has been tested with Eprints 2, some minor modification (especially to the SQL) will be required for Eprints 3.

You can download it from http://files.eprints.org/259/

Before you start

  • Find out where your logs are, and if they are being 'rotated'
  • Find or create a directory where static html files can be placed and accessible by the web server (we will use /home/eprints/htdocs/stats/)
  • Consider setting up Apache to use different log files for different eprint archives, and for non-eprints requests. this can easily be done by editing /opt/eprints2/archives/ARCHIVE/cfg/apachevhost.conf and adding a line similar to this;
CustomLog /usr/local/apache/logs/eprints_log combined

Installation

  1. Install analog (http://www.analog.cx). Most linux systems will have a package available for this.
  2. Download the gzip file containing the config files and perl scripts.
  3. unpack in to a directory, e.g. /home/eprints/analog/
  4. Ensure that the two perl scripts and the shell script and executable by you (e.g. chmod 744 *.pl)
  5. edit 'general.cfg' you will need to:
    1. Replace the fictional eprints users (foobar and mit.edu) with your own org. These include the configuration entries BASEURL, HOSTURL, HOSTNAME, REFREPEXCLUDE, REFSITEEXCLUDE and further down you may want to also edit the SUBDOMAIN.
    2. Specify the location of the log file(s) - near the top of the file
    3. You may need to uncomment and set up the DNS cache near the bottom (see http://www.analog.cx for info)
  6. Edit the other .cfg files to specify the output location for the report and associated chart images
  7. At this point you can run ./runanalog.sh (or use the commands it contains) for some basic reports.
  8. Edit generate_aliases.pl and enter your database, username and password. This will change the 'Request report' to show the title of the eprint rather than the (not very useful) filename.
  9. Edit recordreports.pl - again edit the db, username and password. You may also need to change the file locations in the analog command at the bottom.
  10. run ./recordreports.pl - this may take a while!
  11. Add cron jobs to update this on a regular basis

Reports

You should now have five general reports: detailed, simple, full text articles only, records and a final report for the last 30 days. You should also have a directory containing a report for each eprint in your archive.

All the reports use 'general.cfg' as a base, as a rule you can over ride any of the 'defaults' in general.cfg in the individual config files. You can of course edit any config file to customise to your needs (see the Analog website for docs).

Things you might want to change

  • The names produced by generate_aliases.pl - this perl script makes use of a feature of Analogs to replace a file name with some text. The default text is, for files and metadata records respectively:
[filetype] eprint title
[record] eprint title

This is quite basic and may not work for all types of archives. However it is quite easy to modify the script. The core of the perl script is an SQL statement and then a while loop which creates a config line for each full text file. This is then repeated for the metadata records. It should be simple to add code and modify the strings as you require.

  • You may well want to add generate_aliases.pl to runanalog.sh or to your crontab.
  • Included are three txt files which contain additional Analog settings (e.g. RobotsInclude.txt) you can use these by uncommenting the lines at the top of general.cfg, however they do slow the program down a little

Issues

  • This method for providing statistics will get slower over time. For every eprint, it is processing all the log files specified. A lot of eprints and a lot of logs will make this a very slow process. While this is clearly due to the bad design by the author, you may want to investigate Analogs 'cache' facility, or you may want to limit the date range or logs used. Another option would be to configure recordreports.pl to only run Analog for a number of eprints each time it is run. For example for decent sized archive, run daily, on the first of the month process 0-999, on the 2nd process eprints 1000-1999, etc.
  • If it takes a long time to run you may need to check the DNSCACHE settings.

Adding a link from the abstract/metadata pages to the stats

This will add a link from an eprint's abstract page to the stats report.

This should work, though there must be better ways (please update with a better example if you have one). This goes in 'ArchiveRenderConfig.pm', to start with I suggest you try around line 797 (after the 'edit record' button):

 my $analogurl = "http:\/\/researchonline.lib.sussex.ac.uk\/stats\/records\/" . $eprint->get_value( "eprintid" ) . ".html";
 #
 # build the link and the linked text
 my $analoglink = $session->make_element( "a", href=>"$analogurl");
 $analoglink->appendChild($session->make_text(" Usage report "));
 # build analogstat which is a p which includes the link
 my $analogstat = $session->make_element( "p", align=>"right" );
 $analogstat->appendChild( $analoglink );
 $page->appendChild( $analogstat );

Note: in this example, the usage reports are on a different (virtual) host to eprints, so the URL is hard coded (first line) rather than using the eprints framework.