Chris Keene. Technical Development Manager, University of Sussex Library, UK.
Provide basic usage statistics, for individual eprints and the whole system. Other tools and systems are available to provide more sophisticated solutions. It has been tested with Eprints 2, some minor modification (especially to the SQL) will be required for Eprints 3.
Before you start
- Find out where your logs are, and if they are being 'rotated'
- Find or create a directory where static html files can be placed and accessible by the web server (we will use /home/eprints/htdocs/stats/)
- Consider setting up Apache to use different log files for different eprint archives, and for non-eprints requests
- Install analog (http://www.analog.cx). Most linux systems will have a package available for this.
- Download the gzip file containing the config files and perl scripts.
- unpack in to a directory, e.g. /home/eprints/analog/
- Ensure that the two perl scripts and the shell script and executable by you (e.g. chmod 744 *.pl)
- edit 'general.cfg' you will need to:
- Replace the fictional eprints users (foobar and mit.edu) with your own org. These include the configuration entries BASEURL, HOSTURL, HOSTNAME, REFREPEXCLUDE, REFSITEEXCLUDE and further down you may want to also edit the SUBDOMAIN.
- Specify the location of the log file(s) - near the top of the file
- You may need to uncomment and set up the DNS cache near the bottom (see http://www.analog.cx for info)
- Edit the other .cfg files to specify the output location for the report and associated chart images
- At this point you can run ./runanalog.sh (or use the commands it contains) for some basic reports.
- Edit generate_aliases.pl and enter your database, username and password. This will change the 'Request report' to show the title of the eprint rather than the (not very useful) filename.
- Edit recordreports.pl - again edit the db, username and password. You may also need to change the file locations in the analog command at the bottom.
- run ./recordreports.pl - this may take a while!
You should now have five general reports: detailed, simple, full text articles only, records and a final report for the last 30 days. You should also have a directory containing a report for each eprint in your archive.
All the reports use 'general.cfg' as a base, as a rule you can over ride any of the 'defaults' in general.cfg in the individual config files. You can of course edit any config file to customise to your needs (see the Analog website for docs).
Things you might want to change
- The names produced by generate_aliases.pl - this perl script makes use of a feature of Analogs to replace a file name with some text. The default text is, for files and metadata records respectively :
[filetype] eprint title [record] eprint title This is quite basic and may not work for all types of archives. However it is quite easy to modify. The core of the file is an SQL statement, then a while loop, which creates a string, makes sure it isn't too long and then adds it to the config file. This is then repeated for the metadata records. It should be simple to add code and modify the strings as you require.
- You may well want to add generate_aliases.pl to runanalog.sh or to your crontab.
- Included are three txt files which contain additional Analog settings (e.g. RobotsInclude.txt) you can use these by uncommenting the lines at the top of general.cfg, however they do slow the program down a litt.e
- This method for providing statistics will get slower over time. For every eprint, it is processing all the log files specified. A lot of eprints and a lot of logs will make this a very slow process. While this is clearly due to the bad design of the author, you may want to investigate Analogs 'cache' facility, or you may want to limit the date range or logs used. Another option would be to configure recordreports.pl to only run Analog for a number of eprints each time it is run. For example for decent sized archive, run daily, on the first of the month process 0-999, on the 2nd process eprints 1000-1999, etc.
- If it takes a long time to run you may need to check the DNSCACHE settings.