Difference between revisions of "Create Export Plugins"
Line 6: | Line 6: | ||
The top of the plugin should look like this: | The top of the plugin should look like this: | ||
− | + | <pre> | |
package EPrints::Plugin::Export::DepositorActivity; | package EPrints::Plugin::Export::DepositorActivity; | ||
Line 29: | Line 29: | ||
return $self; | return $self; | ||
} | } | ||
− | + | </pre> | |
This will create a filter object, and set a number of configuration constants: | This will create a filter object, and set a number of configuration constants: | ||
Line 47: | Line 47: | ||
Note that by convention, '$plugin' is used instead of '$self'. | Note that by convention, '$plugin' is used instead of '$self'. | ||
− | + | <pre> | |
sub output_dataobj | sub output_dataobj | ||
{ | { | ||
Line 70: | Line 70: | ||
$r .= '"Depositor Unknown"'; | $r .= '"Depositor Unknown"'; | ||
} | } | ||
− | $r .= ',' . $dataobj->get_value( "datestamp" ) . "\n"; | + | $r .= ',"' . $dataobj->get_value( "datestamp" ) . '"' ."\n"; #datestamp is always set, and contains a space so needs delimiting |
return $r; | return $r; | ||
} | } | ||
− | + | </pre> | |
Notes: | Notes: | ||
Line 84: | Line 84: | ||
Put all this into a file called 'DepositorActivity.pm' and save the file into the 'eprints3/perl_lib/EPrints/Plugin/Export/' directory. Don't forget to add this to the bottom of the file: | Put all this into a file called 'DepositorActivity.pm' and save the file into the 'eprints3/perl_lib/EPrints/Plugin/Export/' directory. Don't forget to add this to the bottom of the file: | ||
− | + | <pre> | |
1; | 1; | ||
− | + | </pre> | |
== Adding Column Headings == | == Adding Column Headings == | ||
Line 94: | Line 94: | ||
Here is an output_list function that will add column headings to our CSV file. | Here is an output_list function that will add column headings to our CSV file. | ||
− | + | <pre> | |
sub output_list | sub output_list | ||
{ | { | ||
Line 130: | Line 130: | ||
return join( '', @{$r} ); | return join( '', @{$r} ); | ||
} | } | ||
− | + | </pre> | |
The conditionals for printing to a file make the function look overly complex. Here it is if you ignore file handles (which you certainly shouldn't do): | The conditionals for printing to a file make the function look overly complex. Here it is if you ignore file handles (which you certainly shouldn't do): | ||
− | + | <pre> | |
sub output_list | sub output_list | ||
{ | { | ||
Line 148: | Line 148: | ||
return join( '', @{$r} ); | return join( '', @{$r} ); | ||
} | } | ||
+ | </pre> | ||
+ | == More Complex List Processing == | ||
+ | |||
+ | output_list can be used to do more than simple concatenating results from output_dataobj. For example, the plugin above will output a table containing one entry for every eprint showing the depositor and the deposit date. Perhaps this could be made more useful by changing the table so that it contains a row for each user that deposited an eprint. Perhaps three columns (userid, number of deposits, datestamp of latest deposit) could be useful. | ||
+ | |||
+ | For readability, output_list is shown without filehandle handling. If this were a real filter, IT WOULD BE NECESSARY! | ||
+ | |||
+ | Firstly, an auxhillary function that will return a CSV normalised username. It's similar to the output_data function above, so should be easy to understand. | ||
+ | <pre> | ||
+ | sub get_username | ||
+ | { | ||
+ | my( $plugin, $dataobj ) = @_; | ||
+ | |||
+ | my $username; | ||
+ | if ($dataobj->exists_and_set("userid")) #userid may not be set if the deposit was done by a script. | ||
+ | { | ||
+ | my $session = $plugin->{"session"}; | ||
+ | my $userid = $dataobj->get_value( "userid" ); | ||
+ | |||
+ | my $depositor_obj = new EPrints::DataObj::User($session, $userid); #create a user object | ||
+ | my $depositor = $depositor_obj->get_value( "username" ); #get the user ID | ||
+ | if ($depositor =~ m/[\n" ,]/) #Check for illegal CSV characters | ||
+ | { | ||
+ | $depositor =~ s/"/""/g; #escape quotes | ||
+ | $depositor = '"' . $depositor . '"'; #delimit text | ||
+ | } | ||
+ | $username = $depositor; | ||
+ | } | ||
+ | else | ||
+ | { | ||
+ | $username = '"Depositor Unknown"'; | ||
+ | } | ||
+ | return $username; | ||
+ | } | ||
+ | </pre> | ||
+ | The new output_list function uses a hash to accumlate the results: | ||
+ | <pre> | ||
+ | sub output_list | ||
+ | { | ||
+ | my( $plugin, %opts ) = @_; | ||
+ | my %r = (); | ||
+ | |||
+ | #Iterate over the list | ||
+ | foreach my $dataobj ( $opts{list}->get_records ) | ||
+ | { | ||
+ | my $username = $plugin->get_username( $dataobj ); | ||
+ | my $datestamp = '"' . $dataobj->get_value( "datestamp" ) . '"'; | ||
+ | if (defined $r{$username}) #if it's defined, increment and compare timestamps | ||
+ | { | ||
+ | $r{$username}->{count} ++; | ||
+ | if ($r{$username}->{most_recent} lt $datestamp) | ||
+ | { | ||
+ | $r{$username}->{most_recent} = $datestamp; | ||
+ | } | ||
+ | } | ||
+ | else #if it's not defined, create a hash for this user's results | ||
+ | { | ||
+ | $r{$username} = {count => 1, most_recent => $datestamp}; | ||
+ | } | ||
+ | } | ||
+ | |||
+ | # Construct the CSV and return it. | ||
+ | my $csv = '"User Name","Number of Deposits","Most Recent Deposit"' . "\n"; | ||
+ | foreach my $username (sort keys %r) | ||
+ | { | ||
+ | $csv .= $username . "," . $r{$username}->{count} . "," . $r{$username}->{most_recent} . "\n"; | ||
+ | } | ||
+ | return $csv; | ||
+ | } | ||
+ | </pre> | ||
+ | Note that because this plugin can take a single eprint as well as a list of eprints, you must have a output_dataobj function that will do something sensible. |
Revision as of 13:48, 12 February 2007
Export plugins for anything but eprints are beyond the scope of this howto.
Imagine we want to create an export plugin that will take a group of eprints (or a single eprint) and output a csv file containing a list of who deposited the eprints, and the dates on which they were deposited.
Contents
Essentials
The top of the plugin should look like this:
package EPrints::Plugin::Export::DepositorActivity; use Unicode::String qw( utf8 ); use EPrints::Plugin::Export; use EPrints::DataObj::User; @ISA = ( "EPrints::Plugin::Export" ); use strict; sub new { my( $class, %params ) = @_; my $self = $class->SUPER::new( %params ); $self->{name} = "Depositor Activity"; $self->{accept} = [ 'list/eprint', 'dataobj/eprint' ]; $self->{visible} = "all"; $self->{suffix} = ".csv"; $self->{mimetype} = "text/csv"; return $self; }
This will create a filter object, and set a number of configuration constants:
- name - The name of the filter
- accept - A list detailing what the filter will take as inputs. In this case, a list of eprints or a single eprint. It is possible to write filters for dataobj types 'eprint', 'user', 'subject', 'history', 'access' and '*' (all).
- visible - Who can see this filter. It's set to 'all' above so that anyone can use it. It could be set to 'staff' to only allow repository staff to use it. If set to 'API' then the filter is not available through the web interface.
- suffix - Appended to the url to create a filename extension.
- mimetype - Should be set to the correct mime type for the output of the filter.
Note that 'name' and 'accept' are essential. These allow the filter to register itself with EPrints.
We will be extracting the username of the depositor, so we need to use 'EPrints::DataObj::User'.
Converting the dataobj
The 'output_dataobj' function takes a dataobj (in our case an eprint object) and returns a perl scalar which will be the output. We are going to extract some data from the dataobj using EPrints API calls.
Note that by convention, '$plugin' is used instead of '$self'.
sub output_dataobj { my( $plugin, $dataobj ) = @_; my $r = ""; if ($dataobj->exists_and_set("userid")) #userid may not be set if the deposit was done by a script. { my $session = $plugin->{"session"}; my $userid = $dataobj->get_value( "userid" ); my $depositor_obj = new EPrints::DataObj::User($session, $userid); #create a user object my $depositor = $depositor_obj->get_value( "username" ); #get the user ID if ($depositor =~ m/[\n" ,]/) #Check for illegal CSV characters { $depositor =~ s/"/""/g; #escape quotes $depositor = '"' . $depositor . '"'; #delimit text } $r .= $depositor; } else { $r .= '"Depositor Unknown"'; } $r .= ',"' . $dataobj->get_value( "datestamp" ) . '"' ."\n"; #datestamp is always set, and contains a space so needs delimiting return $r; }
Notes:
- Retreiving the username takes a little fancy footwork because the EPrints object contains depositor userids. We need to create a user object and get the username from that.
- We use '$dataobj->get_value' to retrieve metadata from the eprint (or user) objects.
- As we're outputting in CSV, we need to do a little normalisation.
Put it in a Module
Put all this into a file called 'DepositorActivity.pm' and save the file into the 'eprints3/perl_lib/EPrints/Plugin/Export/' directory. Don't forget to add this to the bottom of the file:
1;
Adding Column Headings
The 'output_dataobj' runs on a single EPrint. If the plugin runs over a list of eprints (we've given it that capability), the default behaviour is to run 'output_dataobj' on every eprint in the list and concatenate the results.
The output_list function is what handles the lists. This takes itself ($plugin) and a hash (%opts) as arguments. The %opt hash contains the list. It could also contain a filehandle. When writing 'output_list', you need to check for the filehandle and if present, print to it. If it's not present, return the results as a scalar.
Here is an output_list function that will add column headings to our CSV file.
sub output_list { my( $plugin, %opts ) = @_; my $r = []; #array for results accumulation my $part; $part = '"User ID","Date Stamp"' . "\n"; #column headings if( defined $opts{fh} ) #write to file or accumulate headings { print {$opts{fh}} $part; } else { push @{$r}, $part; } foreach my $dataobj ( $opts{list}->get_records ) #Iterate over list { $part = $plugin->output_dataobj( $dataobj, %opts ); #call output_dataobj if( defined $opts{fh} ) #write to file or accumulate results { print {$opts{fh}} $part; } else { push @{$r}, $part; } } if( defined $opts{fh} ) #Don't return results if writing to file. { return; } return join( '', @{$r} ); }
The conditionals for printing to a file make the function look overly complex. Here it is if you ignore file handles (which you certainly shouldn't do):
sub output_list { my( $plugin, %opts ) = @_; my $r = []; #array for results accumulation my $part; $part = '"User ID","Date Stamp"' . "\n"; #column headings push @{$r}, $part; foreach my $dataobj ( $opts{list}->get_records ) #Iterate over list { $part = $plugin->output_dataobj( $dataobj, %opts ); #call output_dataobj push @{$r}, $part; } return join( '', @{$r} ); }
More Complex List Processing
output_list can be used to do more than simple concatenating results from output_dataobj. For example, the plugin above will output a table containing one entry for every eprint showing the depositor and the deposit date. Perhaps this could be made more useful by changing the table so that it contains a row for each user that deposited an eprint. Perhaps three columns (userid, number of deposits, datestamp of latest deposit) could be useful.
For readability, output_list is shown without filehandle handling. If this were a real filter, IT WOULD BE NECESSARY!
Firstly, an auxhillary function that will return a CSV normalised username. It's similar to the output_data function above, so should be easy to understand.
sub get_username { my( $plugin, $dataobj ) = @_; my $username; if ($dataobj->exists_and_set("userid")) #userid may not be set if the deposit was done by a script. { my $session = $plugin->{"session"}; my $userid = $dataobj->get_value( "userid" ); my $depositor_obj = new EPrints::DataObj::User($session, $userid); #create a user object my $depositor = $depositor_obj->get_value( "username" ); #get the user ID if ($depositor =~ m/[\n" ,]/) #Check for illegal CSV characters { $depositor =~ s/"/""/g; #escape quotes $depositor = '"' . $depositor . '"'; #delimit text } $username = $depositor; } else { $username = '"Depositor Unknown"'; } return $username; }
The new output_list function uses a hash to accumlate the results:
sub output_list { my( $plugin, %opts ) = @_; my %r = (); #Iterate over the list foreach my $dataobj ( $opts{list}->get_records ) { my $username = $plugin->get_username( $dataobj ); my $datestamp = '"' . $dataobj->get_value( "datestamp" ) . '"'; if (defined $r{$username}) #if it's defined, increment and compare timestamps { $r{$username}->{count} ++; if ($r{$username}->{most_recent} lt $datestamp) { $r{$username}->{most_recent} = $datestamp; } } else #if it's not defined, create a hash for this user's results { $r{$username} = {count => 1, most_recent => $datestamp}; } } # Construct the CSV and return it. my $csv = '"User Name","Number of Deposits","Most Recent Deposit"' . "\n"; foreach my $username (sort keys %r) { $csv .= $username . "," . $r{$username}->{count} . "," . $r{$username}->{most_recent} . "\n"; } return $csv; }
Note that because this plugin can take a single eprint as well as a list of eprints, you must have a output_dataobj function that will do something sensible.