Kieker

From EPrints Documentation
Revision as of 00:52, 12 September 2018 by Kgoetz (talk | contribs) (adding categories)
Jump to: navigation, search


This page describes what Kieker is and how to integrate it with EPrints (tested on v3.3). The Kieker framework is developed and maintained at the University of Kiel in Germany and it is a neat tool for analysing a system whether in development or in production.


What is Kieker

Concretely Kieker allows you to profile and to monitor all the internal module calls (to EPrints). Any functions called will be trapped and sent to a queue for further processing.

Kieker comes with many post-processing graphs and diagrams to see, for instance, all the calls made, the execution times, etc. This then allows you to have a global picture of the EPrints' internals in order to e.g. optimise certain parts of the system, find coupled modules, detect un-wanted loops (between modules) etc.

Read more about it here: http://kieker-monitoring.net/


How does it work?

For EPrints, Kieker uses a PERL module called Sub::WrapPackages which basically allows user to define global wrappers for internal calls. It is possible to restrict which modules are wrapped: for instance, you may want to only profile the database layer, in which case you probably want to wrap EPrints::Database and EPrints::Database::MySQL. The EPrints/Kieker extensions (https://github.com/eprints/epkieker) make it easy to set this up.

When Kieker is enabled it will trap any selected internal calls and will send some information to a queueing system. The data sent will contain a high-precision timestamp, the name of the module, of the function etc. Traditionally Kieker uses JMS as a queueing system but we integrated it with memcached which is a popular, easy-to-install server-wide caching system.

Once you're happy with the profiling, you can disable kieker and retrieve the data from the queue. Then you may use Kieker's built-in tools to generate all sorts of graphs.

Installation

Kieker

We used Kieker v1.9 during our tests. You may get it from: http://kieker-monitoring.net/download/

Note that Kieker requires Java (but this can easily be installed under Ubuntu).


Required modules

If you're using Ubuntu, you can install the modules as follow:

sudo apt-get install libclass-accessor-perl libnet-stomp-perl libdevel-caller-ignorenamespaces-perl libsub-install-perl libparams-util-perl libdata-optlist-perl libsub-exporter-perl libsub-prototype-perl libsub-wrappackages-perl

If you'd rather install them via CPAN, the above packages' name are:

  • Class::Accessor
  • Net::Stomp
  • Devel::Caller::IgnoreNamespaces
  • Sub::Install
  • Params::Util
  • Data::OptList
  • Sub::Exporter
  • Sub::Prototype
  • Sub::WrapPackages

You must also install memcached, if not already present on your server:

sudo apt-get install memcached libcache-memcached-fast-perl


EPrints/Kieker extensions

You need to copy a few modules from https://github.com/eprints/epkieker to finalise the installation of Kieker, to run with EPrints.

If EPrints is install under its default path /opt/eprints3 (otherwise adjust the paths):

cp -rf perl_lib/Kieker* /opt/eprints3/perl_lib
cp perl_lib/EPrints/Apache/KiekerHandler.pm /opt/eprints3/perl_lib/EPrints/Apache
cp -rf bin/kieker /opt/eprints3/archives/<id>/bin/

And you're done.

Configuration

There are only a few options you need to edit to configure Kieker, which are in /opt/eprints3/perl_lib/EPrints/Apache/KiekerHandler.pm.

Any time you make a configuration change, you must restart Apache.

Monitored URI

This option allows you to restrict the URI which will capture profiling data. For instance, if you have a problem with searching, you may want to set this to "/cgi/search". To do the same with the browse views, use "/view/".

Some examples:

my $MONIT_URI = "/";                 # monitors "/" ie. /index.html
my $MONIT_URI = /cgi/search";        # monitors the search
my $MONIT_URI = "/view/year/";       # monitors the browse view "by year"
my $MONIT_URI = "/cgi/users/home";   # monitors the user (logged-in) area


Monitored IP

It can be dangerous to allow monitoring for any clients using your repository (if using Kieker on a production server). This will slow down the system for everyone, as well as generating lots of useless data.

This option is there to restrict which IP address enables monitoring. At the moment it can only be a single IP address, you cannot specify network blocks. If you want to monitor EPrints then this should probably be set to your IP (v4) address.

Example:

my $MONIT_URI = "124.3.45.7";


Monitored Packages

Remember I told you that, with PERL, Kieker wraps any modules? Well this last option allows you to specify which modules will be wrapped hence which modules will be monitored. Anything outside of the selected scope will not be monitored and will not generate any data.

Some common examples:

my $MONIT_PACKAGES = "EPrints EPrints::*";                            # any EPrints call
my $MONIT_PACKAGES = "EPrints::Database EPrints::Database::*";        # anything relating to EPrints' Database layer
my $MONIT_PACKAGES = "EPrints::Plugin::Screen::Items;"                # the "Manage deposits" page
my $MONIT_PACKAGES = "EPrints::MetaField::* EPrints::XML;"            # EPrints' metafield layer, in relation to the XML module


Enabling/Disabling Kieker

Typically EPrints tells Apache that it handles *any* web requests for a given virtual host (the URL of your repository).

The line responsible for this delegation is inside /opt/eprints3/cfg/apache/<id>.conf:

PerlTransHandler +EPrints::Apache::Rewrite


We just need to set this to the Kieker handler to enable Kieker:

PerlTransHandler +EPrints::Apache::KiekerPerlHandler


Restart Apache and you are done. To disable Kieker, do the opposite.


Monitoring cycles

Say that you notice a slow-down when visiting a certain page on EPrints and you would like to investigate what could be the cause(s) and what could be optimised. Kieker is good at showing undesired coupling between modules and is also very good at showing "slowest paths" (when executing a web request from A to Z you can easily spot which method call(s) is/are the slowest - this could be long DB query, lots of files reads, ETC.).

The standard cycle when using Kieker is to:

  • Enable Kieker on the URI you think is problematic, for your IP address only
  • Restart Apache
  • Go to the URI with your browser: this may be slower than usual, because Kieker is generating lots of data
  • Disable Kieker
  • Restart Apache
  • Dequeue the Kieker monitoring data from memcached
  • Use Kieker's shipped-in modules to generate releavant graphs for your analysis.

Once you start using Kieker, you will see that you can get results and graphs very, very quickly.

Processing Data

As said earlier, once enabled, Kieker will send its data to memcached. To process the data you will need to dequeue the data from memcached by simply using the supplied script.

mkdir -p ~/kieker/logs/
mkdir -p ~/kieker/logs/out/
cd ~/kieker/logs/
perl /opt/eprints3/archives/<id>/bin/kieker/dequeue_memcached.pl

And that's it! The PERL script will create all necessary files in the current directory. You can then feed the directory to Kieker's built-in tools.

For example:

~/kieker-1.9/bin/trace-analysis.sh -i ~/kieker/logs -o ~/kieker/logs/out/ --plot-Assembly-Component-Dependency-Graph --plot-Aggregated-Deployment-Call-Tree --plot-Container-Dependency-Graph --plot-Assembly-Operation-Dependency-Graph responseTimes responseTimeColoring 20

This will generate four different graphs in the ".dot" format. Kieker can generate other graphs, look at the available options by running ~/kieker-1.9/bin/trace-analysis.sh without any parameters.

".dot" files can be converted to SVG (then PDF), to JPEG etc using the xdot linux library (readily available via apt-get on Ubuntu) for instance:

dot -T jpeg -o assembly-graph.jpg assemblyComponentDependencyGraph.dot

You can turn SVG files into PDF by using the free tool "inkscape" which may be invoked from the command-line.

Notes

  • In order to avoid LOTS of warnings in the Apache's error logs, we are using a shipped-in module called Sub::WrapEPrints instead of Sub::WrapPackages.
  • memcached can be set-up on a remote server - this is actually a standard feature of memcached. You could then monitor one server and store the monitoring data on another machine.