Dataset Manipulation, Triggers and Events

From EPrints Documentation
Revision as of 16:01, 2 December 2010 by DaveTarrant (talk | contribs)
Jump to: navigation, search

In this exercise, we are going to look at a lot of capabilities of EPrints combined in one exercise.

The overall aim is to add a Trigger which queues an Event on the indexer to scan a file for it's mime type (according to the unix file command) and update this data in the file dataset

We shall also add a new dataset although we won't specifically be using it.

Adding To Datasets

This is done with a config file under cfg/cfg.d/ which we can call for example.

The following shows an example of adding the file_cmd_mime to the file dataset.

 $c->add_dataset_field( "file", {
       name => "file_cmd_mime",
       type => "text",
 }, reuse => 1 );

Note that we can specify the reuse flag to say if we can reuse this field if it already exists. If this is set to 0 (or not defined) and the field already exists the package install will fail.

Note also that if none of the config files define the field, then it is assumed it no longer required and it is removed (data included). Some investigation is need to see if upgrades work... (:S)

Adding New Datasets

Adding new datasets is much the same as adding to an existing one other than the fact we need to define the dataset and the basic class wrappers.

 # Define the dataset
 $c->{datasets}->{package_dataset} = {
      class => "EPrints::DataObj::PackageDataset",
      sqlname => "package_dataset",
      datestamp => "datestamp",       
      sql_counter => "datasetid",
 # Add fields to the dataset
 $c->add_dataset_field( "package_dataset", { name=>"datasetid", type=>"counter", required=>1, can_clone=>0, sql_counter=>"datasetid" }, );
 $c->add_dataset_field( "package_dataset", { name=>"name", type=>"text", required=>0, }, );
 $c->add_dataset_field( "package_dataset", { name=>"count", type=>"int", required=>0, }, );
 # Define the class, this can either be done using a new file in the right place, or by using this override trick, open a '{' and then continue as it this is new file
   package EPrints::DataObj::PackageDataset;
   our @ISA = qw( EPrints::DataObj );
   # The new method can simply return the constructor of the super class (Dataset)
   sub new
       return shift->SUPER::new( @_ );
   # This method is required to just return the dataset_id.
   sub get_dataset_id
       my ($self) = @_;
       return "package_dataset";


Events are things which can be triggered by the indexer at various times. Because we don't want to have to wait for out mime type scan to complete and are not bothered when it completes we may as well make an event which can run at a convienient time.

Other examples of events are:

  • Thumbnail Generation
  • Full text indexing
  • RDF generation

These are all things which can be queued up so as not to slow the deposit process.

The last thing to note about events is that the indexer obeys the eprint edit-lock, so if someone has the resource locked, the events won't happen yet.

The indexer trys to execute queued events every 30 seconds and you can view the status of events and the indexer via the "status" button under the "System Tools" tab of the admin interface.

ScanFile event

An Event is just another type of plug-in thus you create it in a the archives cfg/plugins/EPrints/Plugins/Event/ folder.

Below is an event with a single sub which performs the needed operation, all this needs to be parsed is a file_id.

 package EPrints::Plugin::Event::ScanFile;
 @ISA = qw( EPrints::Plugin::Event );
 use strict;
 sub scanfile
       my( $self, $file_id ) = @_;
       my $repository = $self->{repository};
       my $file = new EPrints::DataObj::File( $repository, $file_id );
       my $src_path = $file->get_local_copy;
       my $cmd = "file -i $src_path | awk '{split (\$0, a, \" \"); print a[2]}'";
       my $ret = `$cmd`;
       $ret =~ s/\r\n//;
       $ret =~ s/\n//;
       if (defined $ret and (!($ret eq ""))) {
               $file->set_value("file_cmd_mime", $ret);

To register the event with the indexer which is going to be executing it we need to restart the indexer. The same applies is you change the event, you have to reload the indexer.

On the command line you can watch the indexer log:

 tail -f eprints_root/var/indexer.log


At this point we just need something to trigger our event. Logically this would be when a file is added to the system.

Triggers are bassically message queues which you can register a plug-in with to be alerted when a trigger is "hit".

EPrints has a number of triggers (not all of them implemented at time of writing) which are listed in the EPrints/ file.

Ideally we should use the trigger EP_TRIGGER_FILES_MODIFIED for our opperation but this is one of the ones not yet implementated therefor we are going to use EP_TRIGGER_AFTER_COMMIT on a file object only.

There are 2 types of triggers

  • Repository triggers: more general operation triggers
  • Dataset triggers: Have the same constants and meaning, you can just define which datasets you care about for this trigger.

We are going to use a dataset trigger on the file dataset, thus each time a file is committed we get told.


We register our trigger to call our event by simply adding it to the repository configuration. So when testing if you change this you will need to reload the repository configuration.

 $c->add_dataset_trigger( "file", EP_TRIGGER_AFTER_COMMIT , sub {
       my ( %params ) = @_;
       my $repository = %params->{repository};
       return undef if (!defined $repository);
       if (defined %params->{dataobj}) {
               my $file = %params->{dataobj};
               my $file_id = $file->value("fileid");
               $repository->dataset( "event_queue" )->create_dataobj({
                       pluginid => "Event::ScanFile",
                       action => "scanfile",
                       params => [$file_id],