Dataset Manipulation, Triggers and Events

From EPrints Documentation
Revision as of 15:44, 2 December 2010 by DaveTarrant (talk | contribs) (ScanFile event)
Jump to: navigation, search

In this exercise, we are going to look at a lot of capabilities of EPrints combined in one exercise.

The overall aim is to add a Trigger which queues an Event on the indexer to scan a file for it's mime type (according to the unix file command) and update this data in the file dataset

We shall also add a new dataset although we won't specifically be using it.

Adding To Datasets

This is done with a config file under cfg/cfg.d/ which we can call for example.

The following shows an example of adding the file_cmd_mime to the file dataset.

 $c->add_dataset_field( "file", {
       name => "file_cmd_mime",
       type => "text",
 }, reuse => 1 );

Note that we can specify the reuse flag to say if we can reuse this field if it already exists. If this is set to 0 (or not defined) and the field already exists the package install will fail.

Note also that if none of the config files define the field, then it is assumed it no longer required and it is removed (data included). Some investigation is need to see if upgrades work... (:S)

Adding New Datasets

Adding new datasets is much the same as adding to an existing one other than the fact we need to define the dataset and the basic class wrappers.

 # Define the dataset
 $c->{datasets}->{package_dataset} = {
      class => "EPrints::DataObj::PackageDataset",
      sqlname => "package_dataset",
      datestamp => "datestamp",       
      sql_counter => "datasetid",
 # Add fields to the dataset
 $c->add_dataset_field( "package_dataset", { name=>"datasetid", type=>"counter", required=>1, can_clone=>0, sql_counter=>"datasetid" }, );
 $c->add_dataset_field( "package_dataset", { name=>"name", type=>"text", required=>0, }, );
 $c->add_dataset_field( "package_dataset", { name=>"count", type=>"int", required=>0, }, );
 # Define the class, this can either be done using a new file in the right place, or by using this override trick, open a '{' and then continue as it this is new file
   package EPrints::DataObj::PackageDataset;
   our @ISA = qw( EPrints::DataObj );
   # The new method can simply return the constructor of the super class (Dataset)
   sub new
       return shift->SUPER::new( @_ );
   # This method is required to just return the dataset_id.
   sub get_dataset_id
       my ($self) = @_;
       return "package_dataset";


Events are things which can be triggered by the indexer at various times. Because we don't want to have to wait for out mime type scan to complete and are not bothered when it completes we may as well make an event which can run at a convienient time.

Other examples of events are:

  • Thumbnail Generation
  • Full text indexing
  • RDF generation

These are all things which can be queued up so as not to slow the deposit process.

The last thing to note about events is that the indexer obeys the eprint edit-lock, so if someone has the resource locked, the events won't happen yet.

The indexer trys to execute queued events every 30 seconds and you can view the status of events and the indexer via the "status" button under the "System Tools" tab of the admin interface.

ScanFile event

An Event is just another type of plug-in thus you create it in a the archives cfg/plugins/EPrints/Plugins/Event/ folder.

Below is an event with a single sub which performs the needed operation, all this needs to be parsed is a file_id.

 package EPrints::Plugin::Event::ScanFile;
 @ISA = qw( EPrints::Plugin::Event );
 use strict;
 sub scanfile
       my( $self, $file_id ) = @_;
       my $repository = $self->{repository};
       my $file = new EPrints::DataObj::File( $repository, $file_id );
       my $src_path = $file->get_local_copy;
       my $cmd = "file -i $src_path | awk '{split (\$0, a, \" \"); print a[2]}'";
       my $ret = `$cmd`;
       $ret =~ s/\r\n//;
       $ret =~ s/\n//;
       if (defined $ret and (!($ret eq ""))) {
               $file->set_value("file_cmd_mime", $ret);