Difference between revisions of "Create Export Plugins"

From EPrints Documentation
Jump to: navigation, search
 
(40 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Export plugins for anything but eprints are beyond the scope of this howto.
+
[[Category:Plugins]]
 +
 
 +
Export plugins increase the value of your repository by allowing users to get data out in the format they want.
 +
 
 +
Export plugins also help Integrate your repository with other systems by allowing the systems to exchange data via an interchange format.
 +
 
 +
EPrints 3 is packaged with a number of export plugins, and export plugins are also developed and shared by other EPrints users.
 +
 
 +
The purpose of this guide is to describe how to create new export plugins for your repository.
 +
 
 +
'''Before getting started''', check that the output/interchange format that you want to add to your repository has not already been made available in the EPrints Files repository: http://files.eprints.org/view/type/plugin.html
 +
 
 +
Plugins are written in Perl, so some coding experience is required, as is a familiarity with the EPrints API.
 +
 
 +
==Export plugin overview==
 +
 
 +
An EPrints export plugin is typically a standalone Perl module. There are 2 key functions that an export plugin must carry out:
 +
 
 +
# Register with EPrints
 +
# Define how to convert EPrint records to the output/interchange format
 +
 
 +
===Registration===
 +
 
 +
Export plugins register the following properties:
 +
 
 +
* ''name'' - the name of the plugin
 +
* ''visible'' - who can use it
 +
* ''accept'' - what the plugin can convert
 +
** lists of records or single records (or both)
 +
** type of record (eprints, users, subjects.. see [[Data_Object|EPrints data objects]])
 +
* ''suffix''  and ''mimetype'' - file extension and MIME type of format it converts to
 +
 
 +
'''Example: BibTeX export plugin (extract from registration section)
 +
 
 +
        $self->{name} = "BibTeX";
 +
        $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ];
 +
        $self->{visible} = "all";
 +
        $self->{suffix} = ".bib";
 +
        $self->{mimetype} = "text/plain";
 +
 
 +
This BibTeX export plugin can convert lists of [[EPrint_Object|eprints]] or single eprints, is available to all users, and produces a plain text file with a .bib extension.
 +
 
 +
'''Example: XML (with embedded files) export plugin
 +
 
 +
        $self->{name} = "EP3 XML with Files Embeded";
 +
        $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ];
 +
        $self->{visible} = "staff";
 +
        $self->{suffix} = ".xml";
 +
        $self->{mimetype} = "text/xml";
 +
 
 +
This XML export plugin is available to repository staff only.
 +
 
 +
'''Example: DIDL export plugin
 +
 
 +
        $self->{name} = "DIDL";
 +
        $self->{accept} = [ 'dataobj/eprint' ];
 +
        $self->{visible} = "all";
 +
        $self->{suffix} = ".xml";
 +
        $self->{mimetype} = "text/xml";
 +
 
 +
This DIDL export plugin can handle only a single eprint record at a time.
 +
 
 +
'''Example: FOAF export plugin
 +
 
 +
        $self->{name} = "FOAF Export";
 +
        $self->{accept} = [ 'dataobj/user' ];
 +
        $self->{visible} = "all";
 +
        $self->{suffix} = ".rdf";
 +
        $self->{mimetype} = "text/xml";
 +
 
 +
This FOAF export plugin converts a single [[User_Object|user]] record.
 +
 
 +
'''Example: XML export plugin
 +
 
 +
        $self->{name} = "EP3 XML";
 +
        $self->{accept} = [ 'list/*', 'dataobj/*' ];
 +
        $self->{visible} = "all";
 +
        $self->{suffix} = ".xml";
 +
        $self->{mimetype} = "text/xml";
 +
 
 +
This XML export plugin can handle lists or individual records of [[Data_Object|any type]].
 +
 
 +
===Conversion===
 +
 
 +
This might include '''mapping''' EPrints fields to output/interchange format fields and '''serialising''' the output/interchange format.
 +
 
 +
'''Example: EndNote export plugin (extract from conversion section)
 +
 
 +
        # K Keywords
 +
        $data->{K} = $dataobj->get_value( "keywords" ) if $dataobj->exists_and_set( "keywords" );
 +
        # T Title
 +
        $data->{T} = $dataobj->get_value( "title" ) if $dataobj->exists_and_set( "title" );
 +
        # U URL
 +
        $data->{U} = $dataobj->get_url;
 +
        # X Abstract
 +
        $data->{X} = $dataobj->get_value( "abstract" ) if $dataobj->exists_and_set( "abstract" );
 +
        # Z Notes
 +
        $data->{Z} = $dataobj->get_value( "note" ) if $dataobj->exists_and_set( "note" );
 +
 
 +
This extract shows how the values of the EndNote fields %K, %T, %U, %X and %Z are mapped from the [[EPrint_Object]].
 +
 
 +
'''Example: Text export plugin (extract from conversion section)
 +
 
 +
        my $cite = $dataobj->render_citation;
 +
        return EPrints::Utils::tree_to_utf8( $cite )."\n\n";
 +
 
 +
To serialise an [[EPrint_Object]], the Text export plugin simply outputs the citation.
 +
 
 +
==Hello World export plugin==
 +
 
 +
http://en.wikipedia.org/wiki/Hello_world_program
 +
 
 +
Export plugins are stored in:
 +
 
 +
/opt/eprints3/perl_lib/EPrints/Plugin/Export/
 +
 
 +
Create a new file in this directory called ''HelloWorld.pm'', and paste the following code into it (this is a useful template for writing export plugins!):
 +
 
 +
<pre>
 +
package EPrints::Plugin::Export::HelloWorld;
 +
 
 +
use EPrints::Plugin::Export;
 +
@ISA = ( "EPrints::Plugin::Export" );
 +
 
 +
use strict;
 +
 
 +
sub new
 +
{
 +
        my( $class, %opts ) = @_;
 +
}
 +
 
 +
sub output_dataobj
 +
{
 +
        my( $plugin, $dataobj ) = @_;
 +
}
 +
 
 +
1;
 +
</pre>
 +
 
 +
 
 +
The ''new'' subroutine creates the plugin - this is where you '''register''' the plugin with EPrints. The ''output_dataobj'' subroutine is where you '''convert''' EPrints data to the output format.
 +
 
 +
'''Perl notes:
 +
 
 +
# <tt>package ...</tt> - the plugin (Perl module) namespace, which should always be EPrints::Plugin::Export::''PluginID'' (note that the file should be called ''PluginID''.pm)
 +
# <tt>use EPrints::Plugin::Export, @ISA=...</tt> - inherit all the internal wiring needed for EPrints to use your plugin
 +
 
 +
===Register Hello World plugin===
 +
 
 +
Add the following to the ''new'' subroutine to register the plugin with EPrints:
 +
 
 +
<pre>
 +
sub new
 +
{
 +
        my( $class, %opts ) = @_;
 +
 
 +
        my $self = $class->SUPER::new( %opts );
 +
 
 +
        $self->{name} = "Hello, World!";
 +
        $self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
 +
        $self->{visible} = "all";
 +
        $self->{suffix} = ".txt";
 +
        $self->{mimetype} = "text/plain; charset=utf-8";
 +
 
 +
        return $self;
 +
}
 +
</pre>
 +
 
 +
===Convert EPrints data to Hello World data===
 +
 
 +
Add the following to the ''output_dataobj'' subroutine:
 +
 
 +
<pre>
 +
sub output_dataobj
 +
{
 +
        my( $plugin, $dataobj ) = @_;
 +
 
 +
        my $title = $dataobj->get_value( "title" );
 +
 
 +
        return "Hello, World! $title\n\n";
 +
}
 +
</pre>
 +
 
 +
This subroutine "converts" an eprint object by getting its title and using it in a Hello, World! message.
 +
 
 +
===Testing the Hello World plugin===
 +
 
 +
Save the ''HelloWorld.pm'' file and then restart the Web server, eg.:
 +
 
 +
service httpd restart
 +
 
 +
'''Why do I need to restart the Web server?''' EPrints uses mod_perl which loads all Perl modules at start up; therefore whenever these modules change they need to be reloaded.
 +
 
 +
The Hello World export plugin handles lists of eprints and single eprints. Therefore, EPrints displays it in the list of export plugins on the search results page:
 +
 
 +
[[Image:Hello-world-export.png|frame|none|Selecting the Hello World export plugin from the search results page]]
 +
 
 +
When the Hello World export plugin is activated, the ''convert_dataobj'' subroutine is applied to every item in the list to produce the result:
 +
 
 +
[[Image:Hello-world-export-result.png|frame|none|The output of the Hello World export plugin]]
 +
 
 +
==Walkthough: Using existing plugins to build new plugins==
 +
 
 +
==Walkthrough: Deposit activity plugin==
  
 
Imagine we want to create an export plugin that will take a group of eprints (or a single eprint) and output a csv file containing a list of who deposited the eprints, and the dates on which they were deposited.
 
Imagine we want to create an export plugin that will take a group of eprints (or a single eprint) and output a csv file containing a list of who deposited the eprints, and the dates on which they were deposited.
  
== Essentials ==
+
===Registration===
  
 
The top of the plugin should look like this:
 
The top of the plugin should look like this:
 
+
<pre>
 
  package EPrints::Plugin::Export::DepositorActivity;
 
  package EPrints::Plugin::Export::DepositorActivity;
 
   
 
   
 
  use Unicode::String qw( utf8 );
 
  use Unicode::String qw( utf8 );
 
  use EPrints::Plugin::Export;
 
  use EPrints::Plugin::Export;
 +
use EPrints::DataObj::User;
 
  @ISA = ( "EPrints::Plugin::Export" );
 
  @ISA = ( "EPrints::Plugin::Export" );
 
  use strict;
 
  use strict;
Line 22: Line 226:
 
         $self->{name} = "Depositor Activity";
 
         $self->{name} = "Depositor Activity";
 
         $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ];
 
         $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ];
         $self->{visible} = "staff";
+
         $self->{visible} = "all";
 
         $self->{suffix} = ".csv";
 
         $self->{suffix} = ".csv";
 
         $self->{mimetype} = "text/csv";
 
         $self->{mimetype} = "text/csv";
Line 28: Line 232:
 
         return $self;
 
         return $self;
 
  }
 
  }
 
+
</pre>
 
This will create a filter object, and set a number of configuration constants:
 
This will create a filter object, and set a number of configuration constants:
  
 
* name - The name of the filter
 
* name - The name of the filter
 
* accept - A list detailing what the filter will take as inputs.  In this case, a list of eprints or a single eprint.  It is possible to write filters for dataobj types 'eprint', 'user', 'subject', 'history', 'access' and '*' (all).
 
* accept - A list detailing what the filter will take as inputs.  In this case, a list of eprints or a single eprint.  It is possible to write filters for dataobj types 'eprint', 'user', 'subject', 'history', 'access' and '*' (all).
* visible - Who can see this filter.  It's set to staff above so that only repository staff can use it.  It could be set to 'All' to allow everyone to use it.  If set to 'API' then the filter is not available through the web interface.
+
* visible - Who can see this filter.  It's set to 'all' above so that anyone can use it.  It could be set to 'staff' to only allow repository staff to use it.  If set to 'API' then the filter is not available through the web interface.
 
* suffix - Appended to the url to create a filename extension.
 
* suffix - Appended to the url to create a filename extension.
 
* mimetype - Should be set to the correct mime type for the output of the filter.
 
* mimetype - Should be set to the correct mime type for the output of the filter.
Line 39: Line 243:
 
Note that 'name' and 'accept' are essential.  These allow the filter to register itself with EPrints.
 
Note that 'name' and 'accept' are essential.  These allow the filter to register itself with EPrints.
  
== Converting the dataobj ==
+
We will be extracting the username of the depositor, so we need to use 'EPrints::DataObj::User'.
 +
 
 +
===Conversion===
  
 
The 'output_dataobj' function takes a dataobj (in our case an eprint object) and returns a perl scalar which will be the output.  We are going to extract some data from the dataobj using EPrints API calls.
 
The 'output_dataobj' function takes a dataobj (in our case an eprint object) and returns a perl scalar which will be the output.  We are going to extract some data from the dataobj using EPrints API calls.
  
 
Note that by convention, '$plugin' is used instead of '$self'.
 
Note that by convention, '$plugin' is used instead of '$self'.
 +
<pre>
 +
sub output_dataobj
 +
{
 +
      my( $plugin, $dataobj ) = @_;
 +
 +
      my $r = "";
 +
      if ($dataobj->exists_and_set("userid"))                                      #userid may not be set if the deposit was done by a script.
 +
      {
 +
              my $session = $plugin->{"session"};
 +
              my $userid = $dataobj->get_value( "userid" );
 +
              my $depositor_obj = new EPrints::DataObj::User($session, $userid);    #create a user object
 +
              my $depositor = $depositor_obj->get_value( "username" );              #get the user ID
 +
              if ($depositor =~ m/[\n" ,]/)                                        #Check for illegal CSV characters
 +
              {
 +
                      $depositor =~ s/"/""/g;                                      #escape quotes
 +
                      $depositor = '"' . $depositor . '"';                          #delimit text
 +
              }
 +
              $r .= $depositor;
 +
      }
 +
      else
 +
      {
 +
              $r .= '"Depositor Unknown"';
 +
      }
 +
      $r .= ',"' . $dataobj->get_value( "datestamp" ) . '"' ."\n";                  #datestamp is always set, and contains a space so needs delimiting
 +
 +
      return $r;
 +
}
 +
</pre>
 +
Notes:
 +
 +
* Retreiving the username takes a little fancy footwork because the EPrints object contains depositor userids.  We need to create a user object and get the username from that.
 +
* We use '$dataobj->get_value' to retrieve metadata from the eprint (or user) objects.
 +
* As we're outputting in CSV, we need to do a little normalisation.
 +
 +
===Put it in a Module===
 +
 +
Put all this into a file called 'DepositorActivity.pm' and save the file into the 'eprints3/perl_lib/EPrints/Plugin/Export/' directory.  Don't forget to add this to the bottom of the file:
 +
<pre>
 +
1;
 +
</pre>
 +
Before you can use the plugin, you must restart the webserver.  This will cause EPrints to load it.
 +
 +
===Adding Column Headings===
 +
 +
The 'output_dataobj' runs on a single EPrint.  If the plugin runs over a list of eprints (we've given it that capability), the default behaviour is to run 'output_dataobj' on every eprint in the list and concatenate the results.
 +
 +
The output_list function is what handles the lists.  This takes itself ($plugin) and a hash (%opts) as arguments.  The %opt hash contains the list.  It could also contain a filehandle.  When writing 'output_list', you need to check for the filehandle and if present, print to it.  If it's not present, return the results as a scalar.
  
  sub output_dataobj
+
Here is an output_list function that will add column headings to our CSV file.
 +
<pre>
 +
  sub output_list
 +
{
 +
        my( $plugin, %opts ) = @_;
 +
        my $r = [];                                                      #array for results accumulation
 +
        my $part;
 +
 +
        $part = '"User ID","Date Stamp"' . "\n";                        #column headings
 +
        if( defined $opts{fh} )                                          #write to file or accumulate headings
 +
        {
 +
                print {$opts{fh}} $part;
 +
        }
 +
        else
 +
        {
 +
                push @{$r}, $part;
 +
        }
 +
 +
        foreach my $dataobj ( $opts{list}->get_records )                #Iterate over list
 +
        {
 +
                $part = $plugin->output_dataobj( $dataobj, %opts );      #call output_dataobj
 +
                if( defined $opts{fh} )                                  #write to file or accumulate results
 +
                {
 +
                        print {$opts{fh}} $part;
 +
                }
 +
                else
 +
                {
 +
                        push @{$r}, $part;
 +
                }
 +
        }
 +
 +
        if( defined $opts{fh} )                                          #Don't return results if writing to file.
 +
        {
 +
                return;
 +
        }
 +
        return join( '', @{$r} );
 +
}
 +
</pre>
 +
The conditionals for printing to a file make the function look overly complex.  Here it is if you ignore file handles (which you certainly shouldn't do):
 +
<pre>
 +
sub output_list
 +
{
 +
        my( $plugin, %opts ) = @_;
 +
        my $r = [];                                                      #array for results accumulation
 +
        my $part;
 +
 +
        $part = '"User ID","Date Stamp"' . "\n";                        #column headings
 +
        push @{$r}, $part;
 +
        foreach my $dataobj ( $opts{list}->get_records )                #Iterate over list
 +
        {
 +
                $part = $plugin->output_dataobj( $dataobj, %opts );      #call output_dataobj
 +
                push @{$r}, $part;
 +
        }
 +
        return join( '', @{$r} );
 +
}
 +
</pre>
 +
 
 +
===More Complex List Processing===
 +
 
 +
output_list can be used to do more than simple concatenating results from output_dataobj.  For example, the plugin above will output a table containing one entry for every eprint showing the depositor and the deposit date.  Perhaps this could be made more useful by changing the table so that it contains a row for each user that deposited an eprint.  Perhaps three columns (userid, number of deposits, datestamp of latest deposit) could be useful.
 +
 
 +
For readability, output_list is shown without filehandle handling.  If this were a real filter, IT WOULD BE NECESSARY!
 +
 
 +
Firstly, an auxhillary function that will return a CSV normalised username.  It's similar to the output_data function above, so should be easy to understand.
 +
<pre>
 +
sub get_username
 +
{
 +
      my( $plugin, $dataobj ) = @_;
 +
 +
      my $username;
 +
      if ($dataobj->exists_and_set("userid"))                                    #userid may not be set if the deposit was done by a script.
 +
      {
 +
              my $session = $plugin->{"session"};
 +
              my $userid = $dataobj->get_value( "userid" );
 +
 
 +
              my $depositor_obj = new EPrints::DataObj::User($session, $userid);  #create a user object
 +
              my $depositor = $depositor_obj->get_value( "username" );            #get the user ID
 +
              if ($depositor =~ m/[\n" ,]/)        #Check for illegal CSV characters
 +
              {
 +
                      $depositor =~ s/"/""/g; #escape quotes
 +
                      $depositor = '"' . $depositor . '"'; #delimit text
 +
              }
 +
              $username = $depositor;
 +
      }
 +
      else
 +
      {
 +
              $username = '"Depositor Unknown"';
 +
      }
 +
      return $username;
 +
}
 +
</pre>
 +
The new output_list function uses a hash to accumlate the results:
 +
<pre>
 +
sub output_list
 
  {
 
  {
         my( $plugin, $dataobj ) = @_;
+
         my( $plugin, %opts ) = @_;
       
+
         my %r = ();
         my $r = "";
+
         if ($dataobj->exists_and_set("userid"))
+
        #Iterate over the list
 +
         foreach my $dataobj ( $opts{list}->get_records )
 
         {
 
         {
                 my $depositor = $dataobj->get_value( "userid" );
+
                 my $username = $plugin->get_username( $dataobj );
                 if ($depositor =~ m/[\n" ,]/) #Check for illegal CSV characters
+
                my $datestamp = '"' . $dataobj->get_value( "datestamp" ) . '"';
 +
                 if (defined $r{$username})                                        #if it's defined, increment and compare timestamps
 +
                {
 +
                        $r{$username}->{count} ++;
 +
                        if ($r{$username}->{most_recent} lt $datestamp)
 +
                        {
 +
                                $r{$username}->{most_recent} = $datestamp;
 +
                        }
 +
                }
 +
                else                                                              #if it's not defined, create a hash for this user's results
 
                 {
 
                 {
                         $depositor =~ s/"/""/g; #escape quotes
+
                         $r{$username} = {count => 1, most_recent => $datestamp};
                        $depositor = '"' . $depositor . '"'; #delimit text
 
 
                 }
 
                 }
                $r .= $depositor;
 
 
         }
 
         }
         else
+
 +
         # Construct the CSV and return it.
 +
        my $csv = '"User Name","Number of Deposits","Most Recent Deposit"' . "\n";
 +
        foreach my $username (sort keys %r)
 
         {
 
         {
                 $r .= '"Depositor Unknown"';
+
                 $csv .= $username . "," . $r{$username}->{count} . "," . $r{$username}->{most_recent} . "\n";
 
         }
 
         }
        $r .= ',' . $dataobj->get_value( "datestamp" ) . "\n";
+
         return $csv;
       
 
         return $r;
 
 
  }
 
  }
 +
</pre>
 +
Note that because this plugin can take a single eprint as well as a list of eprints, you must have a output_dataobj function that will do something sensible.  However, bear in mind that search results are always a list, even if there's only one result.

Latest revision as of 12:57, 14 October 2015


Export plugins increase the value of your repository by allowing users to get data out in the format they want.

Export plugins also help Integrate your repository with other systems by allowing the systems to exchange data via an interchange format.

EPrints 3 is packaged with a number of export plugins, and export plugins are also developed and shared by other EPrints users.

The purpose of this guide is to describe how to create new export plugins for your repository.

Before getting started, check that the output/interchange format that you want to add to your repository has not already been made available in the EPrints Files repository: http://files.eprints.org/view/type/plugin.html

Plugins are written in Perl, so some coding experience is required, as is a familiarity with the EPrints API.

Export plugin overview

An EPrints export plugin is typically a standalone Perl module. There are 2 key functions that an export plugin must carry out:

  1. Register with EPrints
  2. Define how to convert EPrint records to the output/interchange format

Registration

Export plugins register the following properties:

  • name - the name of the plugin
  • visible - who can use it
  • accept - what the plugin can convert
    • lists of records or single records (or both)
    • type of record (eprints, users, subjects.. see EPrints data objects)
  • suffix and mimetype - file extension and MIME type of format it converts to

Example: BibTeX export plugin (extract from registration section)

       $self->{name} = "BibTeX";
       $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ];
       $self->{visible} = "all";
       $self->{suffix} = ".bib";
       $self->{mimetype} = "text/plain";

This BibTeX export plugin can convert lists of eprints or single eprints, is available to all users, and produces a plain text file with a .bib extension.

Example: XML (with embedded files) export plugin

       $self->{name} = "EP3 XML with Files Embeded";
       $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ];
       $self->{visible} = "staff";
       $self->{suffix} = ".xml";
       $self->{mimetype} = "text/xml";

This XML export plugin is available to repository staff only.

Example: DIDL export plugin

       $self->{name} = "DIDL";
       $self->{accept} = [ 'dataobj/eprint' ];
       $self->{visible} = "all";
       $self->{suffix} = ".xml";
       $self->{mimetype} = "text/xml";

This DIDL export plugin can handle only a single eprint record at a time.

Example: FOAF export plugin

       $self->{name} = "FOAF Export";
       $self->{accept} = [ 'dataobj/user' ];
       $self->{visible} = "all";
       $self->{suffix} = ".rdf";
       $self->{mimetype} = "text/xml";

This FOAF export plugin converts a single user record.

Example: XML export plugin

       $self->{name} = "EP3 XML";
       $self->{accept} = [ 'list/*', 'dataobj/*' ];
       $self->{visible} = "all";
       $self->{suffix} = ".xml";
       $self->{mimetype} = "text/xml";

This XML export plugin can handle lists or individual records of any type.

Conversion

This might include mapping EPrints fields to output/interchange format fields and serialising the output/interchange format.

Example: EndNote export plugin (extract from conversion section)

       # K Keywords
       $data->{K} = $dataobj->get_value( "keywords" ) if $dataobj->exists_and_set( "keywords" );
       # T Title
       $data->{T} = $dataobj->get_value( "title" ) if $dataobj->exists_and_set( "title" );
       # U URL
       $data->{U} = $dataobj->get_url;
       # X Abstract
       $data->{X} = $dataobj->get_value( "abstract" ) if $dataobj->exists_and_set( "abstract" );
       # Z Notes
       $data->{Z} = $dataobj->get_value( "note" ) if $dataobj->exists_and_set( "note" );

This extract shows how the values of the EndNote fields %K, %T, %U, %X and %Z are mapped from the EPrint_Object.

Example: Text export plugin (extract from conversion section)

       my $cite = $dataobj->render_citation;
       return EPrints::Utils::tree_to_utf8( $cite )."\n\n";

To serialise an EPrint_Object, the Text export plugin simply outputs the citation.

Hello World export plugin

http://en.wikipedia.org/wiki/Hello_world_program

Export plugins are stored in:

/opt/eprints3/perl_lib/EPrints/Plugin/Export/

Create a new file in this directory called HelloWorld.pm, and paste the following code into it (this is a useful template for writing export plugins!):

package EPrints::Plugin::Export::HelloWorld;

use EPrints::Plugin::Export;
@ISA = ( "EPrints::Plugin::Export" );

use strict;

sub new
{
        my( $class, %opts ) = @_;
}

sub output_dataobj
{
        my( $plugin, $dataobj ) = @_;
}

1;


The new subroutine creates the plugin - this is where you register the plugin with EPrints. The output_dataobj subroutine is where you convert EPrints data to the output format.

Perl notes:

  1. package ... - the plugin (Perl module) namespace, which should always be EPrints::Plugin::Export::PluginID (note that the file should be called PluginID.pm)
  2. use EPrints::Plugin::Export, @ISA=... - inherit all the internal wiring needed for EPrints to use your plugin

Register Hello World plugin

Add the following to the new subroutine to register the plugin with EPrints:

sub new
{
        my( $class, %opts ) = @_;

        my $self = $class->SUPER::new( %opts );

        $self->{name} = "Hello, World!";
        $self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
        $self->{visible} = "all";
        $self->{suffix} = ".txt";
        $self->{mimetype} = "text/plain; charset=utf-8";

        return $self;
}

Convert EPrints data to Hello World data

Add the following to the output_dataobj subroutine:

sub output_dataobj
{
        my( $plugin, $dataobj ) = @_;

        my $title = $dataobj->get_value( "title" );

        return "Hello, World! $title\n\n";
} 

This subroutine "converts" an eprint object by getting its title and using it in a Hello, World! message.

Testing the Hello World plugin

Save the HelloWorld.pm file and then restart the Web server, eg.:

service httpd restart

Why do I need to restart the Web server? EPrints uses mod_perl which loads all Perl modules at start up; therefore whenever these modules change they need to be reloaded.

The Hello World export plugin handles lists of eprints and single eprints. Therefore, EPrints displays it in the list of export plugins on the search results page:

Selecting the Hello World export plugin from the search results page

When the Hello World export plugin is activated, the convert_dataobj subroutine is applied to every item in the list to produce the result:

The output of the Hello World export plugin

Walkthough: Using existing plugins to build new plugins

Walkthrough: Deposit activity plugin

Imagine we want to create an export plugin that will take a group of eprints (or a single eprint) and output a csv file containing a list of who deposited the eprints, and the dates on which they were deposited.

Registration

The top of the plugin should look like this:

 package EPrints::Plugin::Export::DepositorActivity;
 
 use Unicode::String qw( utf8 );
 use EPrints::Plugin::Export;
 use EPrints::DataObj::User;
 @ISA = ( "EPrints::Plugin::Export" );
 use strict;
 
 sub new
 {
        my( $class, %params ) = @_;
 
        my $self = $class->SUPER::new( %params );
 
        $self->{name} = "Depositor Activity";
        $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ];
        $self->{visible} = "all";
        $self->{suffix} = ".csv";
        $self->{mimetype} = "text/csv";
 
        return $self;
 }

This will create a filter object, and set a number of configuration constants:

  • name - The name of the filter
  • accept - A list detailing what the filter will take as inputs. In this case, a list of eprints or a single eprint. It is possible to write filters for dataobj types 'eprint', 'user', 'subject', 'history', 'access' and '*' (all).
  • visible - Who can see this filter. It's set to 'all' above so that anyone can use it. It could be set to 'staff' to only allow repository staff to use it. If set to 'API' then the filter is not available through the web interface.
  • suffix - Appended to the url to create a filename extension.
  • mimetype - Should be set to the correct mime type for the output of the filter.

Note that 'name' and 'accept' are essential. These allow the filter to register itself with EPrints.

We will be extracting the username of the depositor, so we need to use 'EPrints::DataObj::User'.

Conversion

The 'output_dataobj' function takes a dataobj (in our case an eprint object) and returns a perl scalar which will be the output. We are going to extract some data from the dataobj using EPrints API calls.

Note that by convention, '$plugin' is used instead of '$self'.

 sub output_dataobj
 {
       my( $plugin, $dataobj ) = @_;
 
       my $r = "";
       if ($dataobj->exists_and_set("userid"))                                       #userid may not be set if the deposit was done by a script.
       {
               my $session = $plugin->{"session"}; 
               my $userid = $dataobj->get_value( "userid" );
               my $depositor_obj = new EPrints::DataObj::User($session, $userid);    #create a user object
               my $depositor = $depositor_obj->get_value( "username" );              #get the user ID
               if ($depositor =~ m/[\n" ,]/)                                         #Check for illegal CSV characters
               {
                       $depositor =~ s/"/""/g;                                       #escape quotes
                       $depositor = '"' . $depositor . '"';                          #delimit text
               }
               $r .= $depositor;
       }
       else
       {
               $r .= '"Depositor Unknown"';
       }
       $r .= ',"' . $dataobj->get_value( "datestamp" ) . '"' ."\n";                   #datestamp is always set, and contains a space so needs delimiting
 
       return $r;
 }

Notes:

  • Retreiving the username takes a little fancy footwork because the EPrints object contains depositor userids. We need to create a user object and get the username from that.
  • We use '$dataobj->get_value' to retrieve metadata from the eprint (or user) objects.
  • As we're outputting in CSV, we need to do a little normalisation.

Put it in a Module

Put all this into a file called 'DepositorActivity.pm' and save the file into the 'eprints3/perl_lib/EPrints/Plugin/Export/' directory. Don't forget to add this to the bottom of the file:

 1;

Before you can use the plugin, you must restart the webserver. This will cause EPrints to load it.

Adding Column Headings

The 'output_dataobj' runs on a single EPrint. If the plugin runs over a list of eprints (we've given it that capability), the default behaviour is to run 'output_dataobj' on every eprint in the list and concatenate the results.

The output_list function is what handles the lists. This takes itself ($plugin) and a hash (%opts) as arguments. The %opt hash contains the list. It could also contain a filehandle. When writing 'output_list', you need to check for the filehandle and if present, print to it. If it's not present, return the results as a scalar.

Here is an output_list function that will add column headings to our CSV file.

 sub output_list
 {
        my( $plugin, %opts ) = @_;
        my $r = [];                                                      #array for results accumulation
        my $part;
 
        $part = '"User ID","Date Stamp"' . "\n";                         #column headings
        if( defined $opts{fh} )                                          #write to file or accumulate headings
        {
                print {$opts{fh}} $part;
        }
        else
        {
                push @{$r}, $part;
        }
 
        foreach my $dataobj ( $opts{list}->get_records )                 #Iterate over list
        {
                $part = $plugin->output_dataobj( $dataobj, %opts );      #call output_dataobj
                if( defined $opts{fh} )                                  #write to file or accumulate results
                {
                         print {$opts{fh}} $part;
                }
                else
                {
                        push @{$r}, $part;
                }
        }
 
        if( defined $opts{fh} )                                          #Don't return results if writing to file.
        {
                return;
        }
        return join( '', @{$r} );
 }

The conditionals for printing to a file make the function look overly complex. Here it is if you ignore file handles (which you certainly shouldn't do):

 sub output_list
 {
        my( $plugin, %opts ) = @_;
        my $r = [];                                                      #array for results accumulation
        my $part;
 
        $part = '"User ID","Date Stamp"' . "\n";                         #column headings
        push @{$r}, $part;
        foreach my $dataobj ( $opts{list}->get_records )                 #Iterate over list
        {
                $part = $plugin->output_dataobj( $dataobj, %opts );      #call output_dataobj
                push @{$r}, $part;
        }
        return join( '', @{$r} );
 }

More Complex List Processing

output_list can be used to do more than simple concatenating results from output_dataobj. For example, the plugin above will output a table containing one entry for every eprint showing the depositor and the deposit date. Perhaps this could be made more useful by changing the table so that it contains a row for each user that deposited an eprint. Perhaps three columns (userid, number of deposits, datestamp of latest deposit) could be useful.

For readability, output_list is shown without filehandle handling. If this were a real filter, IT WOULD BE NECESSARY!

Firstly, an auxhillary function that will return a CSV normalised username. It's similar to the output_data function above, so should be easy to understand.

 sub get_username
 {
       my( $plugin, $dataobj ) = @_;
 
       my $username;
       if ($dataobj->exists_and_set("userid"))                                     #userid may not be set if the deposit was done by a script.
       {
               my $session = $plugin->{"session"};
               my $userid = $dataobj->get_value( "userid" );

               my $depositor_obj = new EPrints::DataObj::User($session, $userid);  #create a user object
               my $depositor = $depositor_obj->get_value( "username" );            #get the user ID
               if ($depositor =~ m/[\n" ,]/)         #Check for illegal CSV characters
               {
                       $depositor =~ s/"/""/g; #escape quotes
                       $depositor = '"' . $depositor . '"'; #delimit text
               }
               $username = $depositor;
       }
       else
       {
               $username = '"Depositor Unknown"';
       }
       return $username;
 }

The new output_list function uses a hash to accumlate the results:

 sub output_list
 {
        my( $plugin, %opts ) = @_;
        my %r = ();
 
        #Iterate over the list
        foreach my $dataobj ( $opts{list}->get_records )
        {
                my $username = $plugin->get_username( $dataobj );
                my $datestamp = '"' . $dataobj->get_value( "datestamp" ) . '"';
                if (defined $r{$username})                                        #if it's defined, increment and compare timestamps
                {
                        $r{$username}->{count} ++;
                        if ($r{$username}->{most_recent} lt $datestamp)
                        {
                                $r{$username}->{most_recent} = $datestamp;
                        }
                }
                else                                                              #if it's not defined, create a hash for this user's results
                {
                        $r{$username} = {count => 1, most_recent => $datestamp};
                }
        }
 
        # Construct the CSV and return it.
        my $csv = '"User Name","Number of Deposits","Most Recent Deposit"' . "\n";
        foreach my $username (sort keys %r)
        {
                $csv .= $username . "," . $r{$username}->{count} . "," . $r{$username}->{most_recent} . "\n";
        }
        return $csv;
 }

Note that because this plugin can take a single eprint as well as a list of eprints, you must have a output_dataobj function that will do something sensible. However, bear in mind that search results are always a list, even if there's only one result.