EPrints Documentation - User contributions [en-gb]

Contribute: Plugins/ImportPluginsAWS

2007-09-28T19:16:02Z

Tom: /* In More Detail */

= Import Plugin Tutorial 2: Amazon Web Services =

In the [[Contribute:_Plugins/ImportPluginsCSV|last tutorial]] we created an import plugin that took data which needed very little modification to import into the respository. The column names in the [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of [http://en.wikipedia.org/wiki/ASIN ASIN].

We will be accessing the service using a [http://en.wikipedia.org/wiki/REST REST] approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using [http://en.wikipedia.org/wiki/SOAP SOAP], but that will not be discussed here.

= Before You Start =

== Amazon Web Services ==
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

== Required Modules ==
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.

<pre>
cpan LWP::UserAgent
</pre>

= AWS.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
my( $class, %params ) = @_;
my $self = $class->SUPER::new( %params );

$self->{name} = 'AWS';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};

my @records = <$fh>;
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my ($plugin, $input) = @_;
my %output = ();

$input =~ m/([A-Za-z0-9]+)/;
$input = $1;

my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";

my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);

my $dom = EPrints::XML::parse_xml_string($response->content);

my $rep =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}

#Get Item Object
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}

my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}

$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';

my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);

my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}

my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}

my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}

my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
}

1;
</pre>

= In More Detail =
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
<pre>
use URI::Escape;
</pre>

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the [http://en.wikipedia.org/wiki/TLD TLD] we can use the US, Canadian, German or French servers.

The accesskey variable stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, which gives us more information about the item.

<pre>
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';
</pre>

== Constructor ==
The constructor is similar to the one used for the [Contribute:_Plugins/ImportPluginsCSV CSV plugin], except this one will import individual eprints, given an [http://en.wikipedia.org/wiki/ASIN ASIN].
<pre>
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
</pre>

Like we imported Text::CSV in the [Contribute:_Plugins/ImportPluginsCSV last tutorial], here we import LWP::UserAgent which will be used for making requests to the web service.
<pre>
my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}
</pre>

== Input ==
=== input_fh ===
This method is similar to the one used in the [Contribute:_Plugins/ImportPluginsCSV CSV plugin], but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.
<pre>
my @ids;
</pre>

Next we read all the lines in the supplied file handle into our records array.
<pre>
my $fh = $opts{fh};

my @records = <$fh>;
</pre>

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
<pre>
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}
</pre>

Then we return a List object of the items imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===

[http://en.wikipedia.org/wiki/ASIN ASINs] are strings which identify a product. Here we remove any non-alphanumerical characters which are surrounding the [http://en.wikipedia.org/wiki/ASIN ASIN].
<pre>
$input =~ m/([A-Za-z0-9]+)/;
$input = $1;
</pre>

We form the request from the variables we created earlier and the [http://en.wikipedia.org/wiki/ASIN ASIN] we have just obtained.
<pre>
my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";
</pre>

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
<pre>
my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);
</pre>

We then create a [http://en.wikipedia.org/wiki/Document_Object_Model DOM] object from the XML document returned.
<pre>
my $dom = EPrints::XML::parse_xml_string($response->content);
</pre>

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.
<pre>
my $rep =
$dom->getElementsByTagName("Items")->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}
</pre>

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
<pre>
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}
</pre>

Each item contains an ItemAttributes element which contains most of the metadata about an item.
<pre>
my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
</pre>

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
<pre>
my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}
</pre>

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
<pre>
$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';
</pre>

We get and set the title.
<pre>
my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);
</pre>

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
<pre>
my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
</pre>

We can set the [http://en.wikipedia.org/wiki/ISBN ISBN]. Note that the [http://en.wikipedia.org/wiki/ISBN ISBN] is often the same as the [http://en.wikipedia.org/wiki/ASIN ASIN].
<pre>
my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}
</pre>

We can set the number of pages.
<pre>
my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}
</pre>

We can set the publisher.
<pre>
my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}
</pre>

We can set the publication date and finally return our output hash.
<pre>
my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
</pre>

= Testing Your Plugin =
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs.
Type this into the "Cut and Paste Records" box:
<pre>
0946719616
0297843877
</pre>

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

Contribute: Plugins/ImportPluginsAWS

2007-09-28T19:11:22Z

Tom: /* convert_input */

= Import Plugin Tutorial 2: Amazon Web Services =

In the [[Contribute:_Plugins/ImportPluginsCSV|last tutorial]] we created an import plugin that took data which needed very little modification to import into the respository. The column names in the [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of [http://en.wikipedia.org/wiki/ASIN ASIN].

We will be accessing the service using a [http://en.wikipedia.org/wiki/REST REST] approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using [http://en.wikipedia.org/wiki/SOAP SOAP], but that will not be discussed here.

= Before You Start =

== Amazon Web Services ==
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

== Required Modules ==
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.

<pre>
cpan LWP::UserAgent
</pre>

= AWS.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
my( $class, %params ) = @_;
my $self = $class->SUPER::new( %params );

$self->{name} = 'AWS';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};

my @records = <$fh>;
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my ($plugin, $input) = @_;
my %output = ();

$input =~ m/([A-Za-z0-9]+)/;
$input = $1;

my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";

my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);

my $dom = EPrints::XML::parse_xml_string($response->content);

my $rep =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}

#Get Item Object
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}

my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}

$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';

my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);

my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}

my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}

my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}

my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
}

1;
</pre>

= In More Detail =
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
<pre>
use URI::Escape;
</pre>

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the TLD we can use the US, Canadian, German or French servers.

The accesskey stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the service API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, giving a lot of information about the item.

<pre>
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';
</pre>

== Constructor ==
The constructor is similar to the one used for the CSV plugin, except this one will import individual eprints, given an ASIN.
<pre>
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
</pre>

Like we imported Text::CSV in the last tutorial, here we import LWP::UserAgent which will be used for making requests to the web service.
<pre>
my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}
</pre>

== Input ==
=== input_fh ===
This method is similar to the one used in the CSV plugin, but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.
<pre>
my @ids;
</pre>

Next we read all the lines in the supplied file handle into our records array.
<pre>
my $fh = $opts{fh};

my @records = <$fh>;
</pre>

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
<pre>
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}
</pre>

Then we return a List object of the items imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===

ASIN are strings which identify a product. Here we remove any non-alphanumerical characters which are surrounding the ASIN.
<pre>
$input =~ m/([A-Za-z0-9]+)/;
$input = $1;
</pre>

We form the request from the variables we created earlier and the ASIN we have just obtained.
<pre>
my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";
</pre>

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
<pre>
my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);
</pre>

We then create a DOM object from the XML document returned.
<pre>
my $dom = EPrints::XML::parse_xml_string($response->content);
</pre>

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.
<pre>
my $rep =
$dom->getElementsByTagName("Items")->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}
</pre>

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
<pre>
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}
</pre>

Each item contains an ItemAttributes element which contains most of the metadata about an item.
<pre>
my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
</pre>

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
<pre>
my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}
</pre>

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
<pre>
$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';
</pre>

We get and set the title.
<pre>
my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);
</pre>

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
<pre>
my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
</pre>

We can set the ISBN. Note that the ISBN is often the same as the ASIN.
<pre>
my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}
</pre>

We can set the number of pages.
<pre>
my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}
</pre>

We can set the publisher.
<pre>
my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}
</pre>

We can set the publication date and finally return our output hash.
<pre>
my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
</pre>

= Testing Your Plugin =
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs.
Type this into the "Cut and Paste Records" box:
<pre>
0946719616
0297843877
</pre>

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

Contribute: Plugins/ImportPluginsAWS

2007-09-28T19:10:29Z

Tom: /* AWS.pm */

= Import Plugin Tutorial 2: Amazon Web Services =

In the [[Contribute:_Plugins/ImportPluginsCSV|last tutorial]] we created an import plugin that took data which needed very little modification to import into the respository. The column names in the [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of [http://en.wikipedia.org/wiki/ASIN ASIN].

We will be accessing the service using a [http://en.wikipedia.org/wiki/REST REST] approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using [http://en.wikipedia.org/wiki/SOAP SOAP], but that will not be discussed here.

= Before You Start =

== Amazon Web Services ==
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

== Required Modules ==
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.

<pre>
cpan LWP::UserAgent
</pre>

= AWS.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
my( $class, %params ) = @_;
my $self = $class->SUPER::new( %params );

$self->{name} = 'AWS';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};

my @records = <$fh>;
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my ($plugin, $input) = @_;
my %output = ();

$input =~ m/([A-Za-z0-9]+)/;
$input = $1;

my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";

my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);

my $dom = EPrints::XML::parse_xml_string($response->content);

my $rep =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}

#Get Item Object
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}

my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}

$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';

my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);

my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}

my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}

my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}

my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
}

1;
</pre>

= In More Detail =
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
<pre>
use URI::Escape;
</pre>

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the TLD we can use the US, Canadian, German or French servers.

The accesskey stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the service API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, giving a lot of information about the item.

<pre>
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';
</pre>

== Constructor ==
The constructor is similar to the one used for the CSV plugin, except this one will import individual eprints, given an ASIN.
<pre>
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
</pre>

Like we imported Text::CSV in the last tutorial, here we import LWP::UserAgent which will be used for making requests to the web service.
<pre>
my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}
</pre>

== Input ==
=== input_fh ===
This method is similar to the one used in the CSV plugin, but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.
<pre>
my @ids;
</pre>

Next we read all the lines in the supplied file handle into our records array.
<pre>
my $fh = $opts{fh};

my @records = <$fh>;
</pre>

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
<pre>
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}
</pre>

Then we return a List object of the items imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===

ASINs are strings of decimal digits which may have leading zeroes which identify a product. Here we remove any non-numerical characters which are surrounding the ASIN.
<pre>
$input =~ m/([0-9]+)/;
$input = $1;
</pre>

We form the request from the variables we created earlier and the ASIN we have just obtained.
<pre>
my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";
</pre>

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
<pre>
my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);
</pre>

We then create a DOM object from the XML document returned.
<pre>
my $dom = EPrints::XML::parse_xml_string($response->content);
</pre>

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.
<pre>
my $rep =
$dom->getElementsByTagName("Items")->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}
</pre>

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
<pre>
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}
</pre>

Each item contains an ItemAttributes element which contains most of the metadata about an item.
<pre>
my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
</pre>

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
<pre>
my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}
</pre>

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
<pre>
$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';
</pre>

We get and set the title.
<pre>
my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);
</pre>

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
<pre>
my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
</pre>

We can set the ISBN. Note that the ISBN is often the same as the ASIN.
<pre>
my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}
</pre>

We can set the number of pages.
<pre>
my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}
</pre>

We can set the publisher.
<pre>
my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}
</pre>

We can set the publication date and finally return our output hash.
<pre>
my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
</pre>

= Testing Your Plugin =
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs.
Type this into the "Cut and Paste Records" box:
<pre>
0946719616
0297843877
</pre>

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

Contribute: Plugins/ImportPluginsCSV

2007-09-28T19:09:42Z

Tom: /* convert_input */

= Import Plugin Tutorial 1: CSV =

In this tutorial we will look at creating a relatively simple plugin to import eprints into our repository by reading files containing [http://en.wikipedia.org/wiki/Comma-Separated_Values comma-separated values]. We won't be dealing with documents and files, but will be focusing on importing eprint metadata.

Import plugins are inherently more complicated than export plugins because of the error checking that must be done, however in this example error checking has been kept to a minimum to simplify the example. In a "real" plugin you should check that the appropriate metadata fields are set for a given type of eprint, and unfortunately there appears to be no quick way to do this.

= Before You Start =

Create a directory for your import plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/import). The directory used for these examples is called "MyPlugins".

To prepare for this tutorial you should install the [http://search.cpan.org/~erangel/Text-CSV/CSV.pm Text::CSV] module. The following command as root, or using sudo should work.

<pre>
cpan Text::CSV
</pre>

= CSV.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.
<pre>
package EPrints::Plugin::Import::MyPlugins::CSV;

use EPrints::Plugin::Import::TextFile;
use strict;

our @ISA = ('EPrints::Plugin::Import::TextFile');

sub new
{
my( $class, %params ) = @_;

my $self = $class->SUPER::new( %params );

$self->{name} = 'CSV';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' ];

my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};
my @records = <$fh>;
my $csv = Text::CSV->new();
my @fields;

if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);

my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my $plugin = shift;
my @input = @{shift @_};
my $csv = Text::CSV->new();

my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}

my %output = ();

my $dataset = $plugin->{session}->{repository}->get_dataset('archive');

my $i = 0;
foreach my $field (@fields)
{
unless ($dataset->has_field($field))
{
$i++;
next;
}

my $metafield = $dataset->get_field($field);

if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);

if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
else
{
$output{$field} = \@values;
}
}
else
{
$output{$field} = $record[$i];
}
$i++;
}
return \%output;
}

1;

</pre>

= In More Detail =
== Modules ==
Here we import the superclass for our plugin.

<pre>
use EPrints::Plugin::Import::TextFile;
</pre>

== Inheritance ==

Our plugin will not inherit from the Import class directly, but from the TextFile subclass. This contains some extra file handling code that means we can ignore certain differences in text file formats. If you are creating an import plugin which imports non-text files you should subclass the EPrints::Plugin::Import class directly.

<pre>
our @ISA = ('EPrints::Plugin::Import::TextFile');
</pre>

== Constructor ==
For import plugins we must set a 'produce' field, to tell the repository what kinds of objects the plugin can import. This plugin only supports importing lists of eprints, but if it supported importing individual eprints we could add 'dataobj/eprint' to this property. We would then have to implement the "input_dataobj" method. Most plugins implement this method, but it is rarely used in practice. Most imports are done in lists (even if that list only contains one member), via the import items screen.

<pre>
$self->{produce} = [ 'list/eprint' ];
</pre>

Here we use a module that is not included with EPrints, Text::CSV, so we import it in a different way. First we check that it is installed, and load it if it is with "EPrints::Utils::require_if_exists".If it isn't we make the plugin invisible and produce an error message. It is good practice to import non-standard modules in this way rather than with "use".
<pre>
my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}
</pre>

== Input ==
Import plugins have to implement a couple of methods to read data from a file or string, manipulate it and turn it into a form which can be imported into the repository. That process will be described below.

=== input_fh ===

This method takes a filehandle, processes it, tries to import DataObjs in to the repository and then returns a List of the DataObjs imported.

This array will be used to create a List of DataObjs later.
<pre>
my @ids;
</pre>

Here we open the filehandle passed, and read the lines into an array.
<pre>
my $fh = $opts{fh};
my @records = <$fh>;
</pre>

We create a Text::CSV object to handle the input. Using a dedicated [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] handling package is preferable to using Perl's split function as it handles a number of more complicated scenarios such as commas within records using double quotes.
<pre>
my $csv = Text::CSV->new();
</pre>

After setting up an array for metadata field names, we attempt to parse the first line of our file. The parse method does not return an array of fields, but reports success or failure. In the event of success we use the fields method to return the last fields parsed. In the event of failure we use the error_input method to get the last error, and return undef.
<pre>
my @fields;
if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

Now that the row of column titles has been dealt with we move onto processing each record in the file.

In import plugins the convert_input method converts individual records into a format that can be imported into the repository. That is a hash whose keys are metadata field names and values are the corresponding values. As a row on its own cannot be imported as we don't know to which field each value belongs we have to construct an array to pass to convert_input first. We pass an array whose first element is the fields row and whose second element is the row we want to import.
<pre>
foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);
</pre>

Here we call convert_input on our constructed input_data. If the conversion fails we simply move to the next record.
<pre>
my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;
</pre>

The epdata_to_dataobj method takes our epdata hash reference and turns it into a new DataObj in our repository. If it is successful it returns the new DataObj, whose id we add to our array of ids.
<pre>
my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
</pre>

Finally we return a List object containing the ids of the records we have successfully imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===
This method takes data in a particular format, in this case [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] and transforms it into a hash of metadata field names and values.

We take the second argument to the method and convert the array reference into an array.
<pre>
my @input = @{shift @_};
</pre>

Here we setup another Text::CSV object.
<pre>
my $csv = Text::CSV->new();
</pre>

We take the second element of our array and parse it. This is the record we wish to import. If anything goes wrong we return undef.
<pre>
my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

We take the first element and get the field names. We then check that we have the same number of fields names as records.
<pre>
my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}
</pre>

This is the hash that we'll return later.
<pre>
my %output = ();
</pre>

For convenience we get the DataSet object.
<pre>
my $dataset = $plugin->{session}->{repository}->get_dataset('archive');
</pre>

We now iterate over the fields.
<pre>
my $i = 0;
foreach my $field (@fields)
{
</pre>

If the field does not exist we look at the next one, remembering to increment our index.
<pre>
unless ($dataset->has_field($field))
{
$i++;
next;
}
</pre>

We get the MetaField object corresponding to the current field.
<pre>
my $metafield = $dataset->get_field($field);
</pre>

We deal with multiple field types by separating individual values with a semi-colon.
<pre>
if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);
</pre>

Name fields are dealt with by using regular expressions and constructing a hash from the parts matched. The plugin expects names to be of the form Surname, Forenames, Lineage (Sr, Jr, III etc).
<pre>
if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
</pre>

Multiple fields which are not names are just added to the hash as an array reference.
<pre>
$output{$field} = \@values;
</pre>

Non-multiple fields are just added to the hash from the array of fields.
<pre>
$output{$field} = $record[$i];
</pre>

Finally we return a hash reference.
<pre>
return \%output;
</pre>

= Testing Your Plugin =

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

Type this into the "Cut and Paste Records" box:

<pre>
title,abstract
This is a test title,This is a test abstract
This is another test title,This is another test abstract
</pre>

Select "CSV" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

== Embedding commas ==
If you want to include commas in your imports, which is very likely you must enclose the field in double quotations. For example:
<pre>
title,abstract
An interesting article,"Damn it Jim, I'm a Doctor, not a Perl hacker."
</pre>

When doing this make sure not to leave any whitespace between a quotation mark and a comma or the import will fail.

== Multiple fields ==
Multiple field types are handled by separating each individual value by a semi-colon, a simple example of this would be the subjects field.

Go back to the import items and proceed as before, but typing this into the "Cut and Paste Records" box:

<pre>
abstract,title,subjects
Testing,Testing,AI;C;M;F;P
Testing,Testing,AC;DC
</pre>

After the records have been imported examine each one on the View Item screen. You will find that a list of subjects are given, and that a more descriptive name is given than the code we imported.

== Compound fields ==

Compound fields are fields that have subfields within them, each with their own name. You don't compound fields in one go, but set the components individually. Subfields have names of the form mainfieldname_subfieldname.

One of the most commonly used compound fields is the "Creators" field. It has a names subfield "creators_name" and an ID subfield "creators_id" which is most often used for email addresses.

Here is an example of setting the creators field:
<pre>
title,creators_names,creators_ids,
Setting compound fields.,"Bloggs, Joe;Doe, John","joe@bloggs.com;john@doe.com"
</pre>

If you import this record and then examine the view items screen you will find that the "Creators" field has been setup with the values displayed in a table.

Contribute: Plugins/ImportPluginsCSV

2007-09-28T19:08:31Z

Tom: /* input_fh */

= Import Plugin Tutorial 1: CSV =

In this tutorial we will look at creating a relatively simple plugin to import eprints into our repository by reading files containing [http://en.wikipedia.org/wiki/Comma-Separated_Values comma-separated values]. We won't be dealing with documents and files, but will be focusing on importing eprint metadata.

Import plugins are inherently more complicated than export plugins because of the error checking that must be done, however in this example error checking has been kept to a minimum to simplify the example. In a "real" plugin you should check that the appropriate metadata fields are set for a given type of eprint, and unfortunately there appears to be no quick way to do this.

= Before You Start =

Create a directory for your import plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/import). The directory used for these examples is called "MyPlugins".

To prepare for this tutorial you should install the [http://search.cpan.org/~erangel/Text-CSV/CSV.pm Text::CSV] module. The following command as root, or using sudo should work.

<pre>
cpan Text::CSV
</pre>

= CSV.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.
<pre>
package EPrints::Plugin::Import::MyPlugins::CSV;

use EPrints::Plugin::Import::TextFile;
use strict;

our @ISA = ('EPrints::Plugin::Import::TextFile');

sub new
{
my( $class, %params ) = @_;

my $self = $class->SUPER::new( %params );

$self->{name} = 'CSV';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' ];

my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};
my @records = <$fh>;
my $csv = Text::CSV->new();
my @fields;

if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);

my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my $plugin = shift;
my @input = @{shift @_};
my $csv = Text::CSV->new();

my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}

my %output = ();

my $dataset = $plugin->{session}->{repository}->get_dataset('archive');

my $i = 0;
foreach my $field (@fields)
{
unless ($dataset->has_field($field))
{
$i++;
next;
}

my $metafield = $dataset->get_field($field);

if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);

if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
else
{
$output{$field} = \@values;
}
}
else
{
$output{$field} = $record[$i];
}
$i++;
}
return \%output;
}

1;

</pre>

= In More Detail =
== Modules ==
Here we import the superclass for our plugin.

<pre>
use EPrints::Plugin::Import::TextFile;
</pre>

== Inheritance ==

Our plugin will not inherit from the Import class directly, but from the TextFile subclass. This contains some extra file handling code that means we can ignore certain differences in text file formats. If you are creating an import plugin which imports non-text files you should subclass the EPrints::Plugin::Import class directly.

<pre>
our @ISA = ('EPrints::Plugin::Import::TextFile');
</pre>

== Constructor ==
For import plugins we must set a 'produce' field, to tell the repository what kinds of objects the plugin can import. This plugin only supports importing lists of eprints, but if it supported importing individual eprints we could add 'dataobj/eprint' to this property. We would then have to implement the "input_dataobj" method. Most plugins implement this method, but it is rarely used in practice. Most imports are done in lists (even if that list only contains one member), via the import items screen.

<pre>
$self->{produce} = [ 'list/eprint' ];
</pre>

Here we use a module that is not included with EPrints, Text::CSV, so we import it in a different way. First we check that it is installed, and load it if it is with "EPrints::Utils::require_if_exists".If it isn't we make the plugin invisible and produce an error message. It is good practice to import non-standard modules in this way rather than with "use".
<pre>
my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}
</pre>

== Input ==
Import plugins have to implement a couple of methods to read data from a file or string, manipulate it and turn it into a form which can be imported into the repository. That process will be described below.

=== input_fh ===

This method takes a filehandle, processes it, tries to import DataObjs in to the repository and then returns a List of the DataObjs imported.

This array will be used to create a List of DataObjs later.
<pre>
my @ids;
</pre>

Here we open the filehandle passed, and read the lines into an array.
<pre>
my $fh = $opts{fh};
my @records = <$fh>;
</pre>

We create a Text::CSV object to handle the input. Using a dedicated [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] handling package is preferable to using Perl's split function as it handles a number of more complicated scenarios such as commas within records using double quotes.
<pre>
my $csv = Text::CSV->new();
</pre>

After setting up an array for metadata field names, we attempt to parse the first line of our file. The parse method does not return an array of fields, but reports success or failure. In the event of success we use the fields method to return the last fields parsed. In the event of failure we use the error_input method to get the last error, and return undef.
<pre>
my @fields;
if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

Now that the row of column titles has been dealt with we move onto processing each record in the file.

In import plugins the convert_input method converts individual records into a format that can be imported into the repository. That is a hash whose keys are metadata field names and values are the corresponding values. As a row on its own cannot be imported as we don't know to which field each value belongs we have to construct an array to pass to convert_input first. We pass an array whose first element is the fields row and whose second element is the row we want to import.
<pre>
foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);
</pre>

Here we call convert_input on our constructed input_data. If the conversion fails we simply move to the next record.
<pre>
my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;
</pre>

The epdata_to_dataobj method takes our epdata hash reference and turns it into a new DataObj in our repository. If it is successful it returns the new DataObj, whose id we add to our array of ids.
<pre>
my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
</pre>

Finally we return a List object containing the ids of the records we have successfully imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===
This method takes data in a particular format, in this case CSV and transforms it into a hash of metadata field names and values.

We take the second argument to the method and convert the array reference into an array.
<pre>
my @input = @{shift @_};
</pre>

Here we setup another Text::CSV object.
<pre>
my $csv = Text::CSV->new();
</pre>

We take the second element of our array and parse it. This is the record we wish to import. If anything goes wrong we return undef.
<pre>
my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

We take the first element and get the field names. We then check that we have the same number of fields names as records.
<pre>
my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}
</pre>

This is the hash that we'll return later.
<pre>
my %output = ();
</pre>

For convenience we get the DataSet object.
<pre>
my $dataset = $plugin->{session}->{repository}->get_dataset('archive');
</pre>

We now iterate over the fields.
<pre>
my $i = 0;
foreach my $field (@fields)
{
</pre>

If the field does not exist we look at the next one, remembering to increment our index.
<pre>
unless ($dataset->has_field($field))
{
$i++;
next;
}
</pre>

We get the MetaField object corresponding to the current field.
<pre>
my $metafield = $dataset->get_field($field);
</pre>

We deal with multiple field types by separating individual values with a semi-colon.
<pre>
if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);
</pre>

Name fields are dealt with by using regular expressions and constructing a hash from the parts matched. The plugin expects names to be of the form Surname, Forenames, Lineage (Sr, Jr, III etc).
<pre>
if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
</pre>

Multiple fields which are not names are just added to the hash as an array reference.
<pre>
$output{$field} = \@values;
</pre>

Non-multiple fields are just added to the hash from the array of fields.
<pre>
$output{$field} = $record[$i];
</pre>

Finally we return a hash reference.
<pre>
return \%output;
</pre>

= Testing Your Plugin =

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

Type this into the "Cut and Paste Records" box:

<pre>
title,abstract
This is a test title,This is a test abstract
This is another test title,This is another test abstract
</pre>

Select "CSV" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

== Embedding commas ==
If you want to include commas in your imports, which is very likely you must enclose the field in double quotations. For example:
<pre>
title,abstract
An interesting article,"Damn it Jim, I'm a Doctor, not a Perl hacker."
</pre>

When doing this make sure not to leave any whitespace between a quotation mark and a comma or the import will fail.

== Multiple fields ==
Multiple field types are handled by separating each individual value by a semi-colon, a simple example of this would be the subjects field.

Go back to the import items and proceed as before, but typing this into the "Cut and Paste Records" box:

<pre>
abstract,title,subjects
Testing,Testing,AI;C;M;F;P
Testing,Testing,AC;DC
</pre>

After the records have been imported examine each one on the View Item screen. You will find that a list of subjects are given, and that a more descriptive name is given than the code we imported.

== Compound fields ==

Compound fields are fields that have subfields within them, each with their own name. You don't compound fields in one go, but set the components individually. Subfields have names of the form mainfieldname_subfieldname.

One of the most commonly used compound fields is the "Creators" field. It has a names subfield "creators_name" and an ID subfield "creators_id" which is most often used for email addresses.

Here is an example of setting the creators field:
<pre>
title,creators_names,creators_ids,
Setting compound fields.,"Bloggs, Joe;Doe, John","joe@bloggs.com;john@doe.com"
</pre>

If you import this record and then examine the view items screen you will find that the "Creators" field has been setup with the values displayed in a table.

Contribute: Plugins/ImportPluginsCSV

2007-09-28T19:07:28Z

Tom: /* Constructor */

= Import Plugin Tutorial 1: CSV =

In this tutorial we will look at creating a relatively simple plugin to import eprints into our repository by reading files containing [http://en.wikipedia.org/wiki/Comma-Separated_Values comma-separated values]. We won't be dealing with documents and files, but will be focusing on importing eprint metadata.

Import plugins are inherently more complicated than export plugins because of the error checking that must be done, however in this example error checking has been kept to a minimum to simplify the example. In a "real" plugin you should check that the appropriate metadata fields are set for a given type of eprint, and unfortunately there appears to be no quick way to do this.

= Before You Start =

Create a directory for your import plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/import). The directory used for these examples is called "MyPlugins".

To prepare for this tutorial you should install the [http://search.cpan.org/~erangel/Text-CSV/CSV.pm Text::CSV] module. The following command as root, or using sudo should work.

<pre>
cpan Text::CSV
</pre>

= CSV.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.
<pre>
package EPrints::Plugin::Import::MyPlugins::CSV;

use EPrints::Plugin::Import::TextFile;
use strict;

our @ISA = ('EPrints::Plugin::Import::TextFile');

sub new
{
my( $class, %params ) = @_;

my $self = $class->SUPER::new( %params );

$self->{name} = 'CSV';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' ];

my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};
my @records = <$fh>;
my $csv = Text::CSV->new();
my @fields;

if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);

my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my $plugin = shift;
my @input = @{shift @_};
my $csv = Text::CSV->new();

my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}

my %output = ();

my $dataset = $plugin->{session}->{repository}->get_dataset('archive');

my $i = 0;
foreach my $field (@fields)
{
unless ($dataset->has_field($field))
{
$i++;
next;
}

my $metafield = $dataset->get_field($field);

if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);

if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
else
{
$output{$field} = \@values;
}
}
else
{
$output{$field} = $record[$i];
}
$i++;
}
return \%output;
}

1;

</pre>

= In More Detail =
== Modules ==
Here we import the superclass for our plugin.

<pre>
use EPrints::Plugin::Import::TextFile;
</pre>

== Inheritance ==

Our plugin will not inherit from the Import class directly, but from the TextFile subclass. This contains some extra file handling code that means we can ignore certain differences in text file formats. If you are creating an import plugin which imports non-text files you should subclass the EPrints::Plugin::Import class directly.

<pre>
our @ISA = ('EPrints::Plugin::Import::TextFile');
</pre>

== Constructor ==
For import plugins we must set a 'produce' field, to tell the repository what kinds of objects the plugin can import. This plugin only supports importing lists of eprints, but if it supported importing individual eprints we could add 'dataobj/eprint' to this property. We would then have to implement the "input_dataobj" method. Most plugins implement this method, but it is rarely used in practice. Most imports are done in lists (even if that list only contains one member), via the import items screen.

<pre>
$self->{produce} = [ 'list/eprint' ];
</pre>

Here we use a module that is not included with EPrints, Text::CSV, so we import it in a different way. First we check that it is installed, and load it if it is with "EPrints::Utils::require_if_exists".If it isn't we make the plugin invisible and produce an error message. It is good practice to import non-standard modules in this way rather than with "use".
<pre>
my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}
</pre>

== Input ==
Import plugins have to implement a couple of methods to read data from a file or string, manipulate it and turn it into a form which can be imported into the repository. That process will be described below.

=== input_fh ===

This method takes a filehandle, processes it, tries to import DataObjs in to the repository and then returns a List of the DataObjs imported.

This array will be used to create a List of DataObjs later.
<pre>
my @ids;
</pre>

Here we open the filehandle passed, and read the lines into an array.
<pre>
my $fh = $opts{fh};
my @records = <$fh>;
</pre>

We create a Text::CSV object to handle the input. Using a dedicated CSV handling package is preferable to using Perl's split function as it handles a number of more complicated scenarios such as commas within records using double quotes.
<pre>
my $csv = Text::CSV->new();
</pre>

After setting up an array for metadata field names, we attempt to parse the first line of our file. The parse method does not return an array of fields, but reports success or failure. In the event of success we use the fields method to return the last fields parsed. In the event of failure we use the error_input method to get the last error, and return undef.
<pre>
my @fields;
if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

Now that the row of column titles has been dealt with we move onto processing each record in the file.

In import plugins the convert_input method converts individual records into a format that can be imported into the repository. That is a hash whose keys are metadata field names and values are the corresponding values. As a row on its own cannot be imported as we don't know to which field each value belongs we have to construct an array to pass to convert_input first. We pass an array whose first element is the fields row and whose second element is the row we want to import.
<pre>
foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);
</pre>

Here we call convert_input on our constructed input_data. If the conversion fails we simply move to the next record.
<pre>
my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;
</pre>

The epdata_to_dataobj method takes our epdata hash reference and turns it into a new DataObj in our repository. If it is successful it returns the new DataObj, whose id we add to our array of ids.
<pre>
my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
</pre>

Finally we return a List object containing the ids of the records we have successfully imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===
This method takes data in a particular format, in this case CSV and transforms it into a hash of metadata field names and values.

We take the second argument to the method and convert the array reference into an array.
<pre>
my @input = @{shift @_};
</pre>

Here we setup another Text::CSV object.
<pre>
my $csv = Text::CSV->new();
</pre>

We take the second element of our array and parse it. This is the record we wish to import. If anything goes wrong we return undef.
<pre>
my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

We take the first element and get the field names. We then check that we have the same number of fields names as records.
<pre>
my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}
</pre>

This is the hash that we'll return later.
<pre>
my %output = ();
</pre>

For convenience we get the DataSet object.
<pre>
my $dataset = $plugin->{session}->{repository}->get_dataset('archive');
</pre>

We now iterate over the fields.
<pre>
my $i = 0;
foreach my $field (@fields)
{
</pre>

If the field does not exist we look at the next one, remembering to increment our index.
<pre>
unless ($dataset->has_field($field))
{
$i++;
next;
}
</pre>

We get the MetaField object corresponding to the current field.
<pre>
my $metafield = $dataset->get_field($field);
</pre>

We deal with multiple field types by separating individual values with a semi-colon.
<pre>
if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);
</pre>

Name fields are dealt with by using regular expressions and constructing a hash from the parts matched. The plugin expects names to be of the form Surname, Forenames, Lineage (Sr, Jr, III etc).
<pre>
if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
</pre>

Multiple fields which are not names are just added to the hash as an array reference.
<pre>
$output{$field} = \@values;
</pre>

Non-multiple fields are just added to the hash from the array of fields.
<pre>
$output{$field} = $record[$i];
</pre>

Finally we return a hash reference.
<pre>
return \%output;
</pre>

= Testing Your Plugin =

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

Type this into the "Cut and Paste Records" box:

<pre>
title,abstract
This is a test title,This is a test abstract
This is another test title,This is another test abstract
</pre>

Select "CSV" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

== Embedding commas ==
If you want to include commas in your imports, which is very likely you must enclose the field in double quotations. For example:
<pre>
title,abstract
An interesting article,"Damn it Jim, I'm a Doctor, not a Perl hacker."
</pre>

When doing this make sure not to leave any whitespace between a quotation mark and a comma or the import will fail.

== Multiple fields ==
Multiple field types are handled by separating each individual value by a semi-colon, a simple example of this would be the subjects field.

Go back to the import items and proceed as before, but typing this into the "Cut and Paste Records" box:

<pre>
abstract,title,subjects
Testing,Testing,AI;C;M;F;P
Testing,Testing,AC;DC
</pre>

After the records have been imported examine each one on the View Item screen. You will find that a list of subjects are given, and that a more descriptive name is given than the code we imported.

== Compound fields ==

Compound fields are fields that have subfields within them, each with their own name. You don't compound fields in one go, but set the components individually. Subfields have names of the form mainfieldname_subfieldname.

One of the most commonly used compound fields is the "Creators" field. It has a names subfield "creators_name" and an ID subfield "creators_id" which is most often used for email addresses.

Here is an example of setting the creators field:
<pre>
title,creators_names,creators_ids,
Setting compound fields.,"Bloggs, Joe;Doe, John","joe@bloggs.com;john@doe.com"
</pre>

If you import this record and then examine the view items screen you will find that the "Creators" field has been setup with the values displayed in a table.

Contribute: Plugins/ImportPluginsAWS

2007-09-28T19:06:16Z

Tom: /* AWS.pm */

= Import Plugin Tutorial 2: Amazon Web Services =

In the [[Contribute:_Plugins/ImportPluginsCSV|last tutorial]] we created an import plugin that took data which needed very little modification to import into the respository. The column names in the [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of [http://en.wikipedia.org/wiki/ASIN ASIN].

We will be accessing the service using a [http://en.wikipedia.org/wiki/REST REST] approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using [http://en.wikipedia.org/wiki/SOAP SOAP], but that will not be discussed here.

= Before You Start =

== Amazon Web Services ==
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

== Required Modules ==
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.

<pre>
cpan LWP::UserAgent
</pre>

= AWS.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
my( $class, %params ) = @_;
my $self = $class->SUPER::new( %params );

$self->{name} = 'AWS';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};

my @records = <$fh>;
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my ($plugin, $input) = @_;
my %output = ();

$input =~ m/([0-9]+)/;
$input = $1;

my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";

my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);

my $dom = EPrints::XML::parse_xml_string($response->content);

my $rep =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}

#Get Item Object
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}

my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}

$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';

my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);

my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}

my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}

my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}

my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
}

1;
</pre>

= In More Detail =
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
<pre>
use URI::Escape;
</pre>

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the TLD we can use the US, Canadian, German or French servers.

The accesskey stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the service API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, giving a lot of information about the item.

<pre>
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';
</pre>

== Constructor ==
The constructor is similar to the one used for the CSV plugin, except this one will import individual eprints, given an ASIN.
<pre>
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
</pre>

Like we imported Text::CSV in the last tutorial, here we import LWP::UserAgent which will be used for making requests to the web service.
<pre>
my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}
</pre>

== Input ==
=== input_fh ===
This method is similar to the one used in the CSV plugin, but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.
<pre>
my @ids;
</pre>

Next we read all the lines in the supplied file handle into our records array.
<pre>
my $fh = $opts{fh};

my @records = <$fh>;
</pre>

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
<pre>
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}
</pre>

Then we return a List object of the items imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===

ASINs are strings of decimal digits which may have leading zeroes which identify a product. Here we remove any non-numerical characters which are surrounding the ASIN.
<pre>
$input =~ m/([0-9]+)/;
$input = $1;
</pre>

We form the request from the variables we created earlier and the ASIN we have just obtained.
<pre>
my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";
</pre>

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
<pre>
my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);
</pre>

We then create a DOM object from the XML document returned.
<pre>
my $dom = EPrints::XML::parse_xml_string($response->content);
</pre>

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.
<pre>
my $rep =
$dom->getElementsByTagName("Items")->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}
</pre>

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
<pre>
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}
</pre>

Each item contains an ItemAttributes element which contains most of the metadata about an item.
<pre>
my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
</pre>

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
<pre>
my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}
</pre>

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
<pre>
$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';
</pre>

We get and set the title.
<pre>
my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);
</pre>

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
<pre>
my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
</pre>

We can set the ISBN. Note that the ISBN is often the same as the ASIN.
<pre>
my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}
</pre>

We can set the number of pages.
<pre>
my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}
</pre>

We can set the publisher.
<pre>
my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}
</pre>

We can set the publication date and finally return our output hash.
<pre>
my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
</pre>

= Testing Your Plugin =
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs.
Type this into the "Cut and Paste Records" box:
<pre>
0946719616
0297843877
</pre>

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

Contribute: Plugins/ImportPluginsAWS

2007-09-28T19:05:30Z

Tom: /* Import Plugin Tutorial 2: Amazon Web Services */

= Import Plugin Tutorial 2: Amazon Web Services =

In the [[Contribute:_Plugins/ImportPluginsCSV|last tutorial]] we created an import plugin that took data which needed very little modification to import into the respository. The column names in the [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of [http://en.wikipedia.org/wiki/ASIN ASIN].

We will be accessing the service using a [http://en.wikipedia.org/wiki/REST REST] approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using [http://en.wikipedia.org/wiki/SOAP SOAP], but that will not be discussed here.

= Before You Start =

== Amazon Web Services ==
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

== Required Modules ==
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.

<pre>
cpan LWP::UserAgent
</pre>

= AWS.pm =
<pre>
package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
my( $class, %params ) = @_;
my $self = $class->SUPER::new( %params );

$self->{name} = 'AWS';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};

my @records = <$fh>;
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my ($plugin, $input) = @_;
my %output = ();

$input =~ m/([0-9]+)/;
$input = $1;

my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";

my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);

my $dom = EPrints::XML::parse_xml_string($response->content);

my $rep =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}

#Get Item Object
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}

my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}

$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';

my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);

my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}

my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}

my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}

my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
}

1;
</pre>

= In More Detail =
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
<pre>
use URI::Escape;
</pre>

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the TLD we can use the US, Canadian, German or French servers.

The accesskey stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the service API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, giving a lot of information about the item.

<pre>
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';
</pre>

== Constructor ==
The constructor is similar to the one used for the CSV plugin, except this one will import individual eprints, given an ASIN.
<pre>
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
</pre>

Like we imported Text::CSV in the last tutorial, here we import LWP::UserAgent which will be used for making requests to the web service.
<pre>
my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}
</pre>

== Input ==
=== input_fh ===
This method is similar to the one used in the CSV plugin, but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.
<pre>
my @ids;
</pre>

Next we read all the lines in the supplied file handle into our records array.
<pre>
my $fh = $opts{fh};

my @records = <$fh>;
</pre>

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
<pre>
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}
</pre>

Then we return a List object of the items imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===

ASINs are strings of decimal digits which may have leading zeroes which identify a product. Here we remove any non-numerical characters which are surrounding the ASIN.
<pre>
$input =~ m/([0-9]+)/;
$input = $1;
</pre>

We form the request from the variables we created earlier and the ASIN we have just obtained.
<pre>
my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";
</pre>

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
<pre>
my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);
</pre>

We then create a DOM object from the XML document returned.
<pre>
my $dom = EPrints::XML::parse_xml_string($response->content);
</pre>

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.
<pre>
my $rep =
$dom->getElementsByTagName("Items")->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}
</pre>

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
<pre>
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}
</pre>

Each item contains an ItemAttributes element which contains most of the metadata about an item.
<pre>
my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
</pre>

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
<pre>
my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}
</pre>

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
<pre>
$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';
</pre>

We get and set the title.
<pre>
my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);
</pre>

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
<pre>
my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
</pre>

We can set the ISBN. Note that the ISBN is often the same as the ASIN.
<pre>
my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}
</pre>

We can set the number of pages.
<pre>
my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}
</pre>

We can set the publisher.
<pre>
my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}
</pre>

We can set the publication date and finally return our output hash.
<pre>
my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
</pre>

= Testing Your Plugin =
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs.
Type this into the "Cut and Paste Records" box:
<pre>
0946719616
0297843877
</pre>

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

Contribute: Plugins/ImportPluginsCSV

2007-09-28T19:03:45Z

Tom: /* CSV.pm */

= Import Plugin Tutorial 1: CSV =

In this tutorial we will look at creating a relatively simple plugin to import eprints into our repository by reading files containing [http://en.wikipedia.org/wiki/Comma-Separated_Values comma-separated values]. We won't be dealing with documents and files, but will be focusing on importing eprint metadata.

Import plugins are inherently more complicated than export plugins because of the error checking that must be done, however in this example error checking has been kept to a minimum to simplify the example. In a "real" plugin you should check that the appropriate metadata fields are set for a given type of eprint, and unfortunately there appears to be no quick way to do this.

= Before You Start =

Create a directory for your import plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/import). The directory used for these examples is called "MyPlugins".

To prepare for this tutorial you should install the [http://search.cpan.org/~erangel/Text-CSV/CSV.pm Text::CSV] module. The following command as root, or using sudo should work.

<pre>
cpan Text::CSV
</pre>

= CSV.pm =
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.
<pre>
package EPrints::Plugin::Import::MyPlugins::CSV;

use EPrints::Plugin::Import::TextFile;
use strict;

our @ISA = ('EPrints::Plugin::Import::TextFile');

sub new
{
my( $class, %params ) = @_;

my $self = $class->SUPER::new( %params );

$self->{name} = 'CSV';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' ];

my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};
my @records = <$fh>;
my $csv = Text::CSV->new();
my @fields;

if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);

my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my $plugin = shift;
my @input = @{shift @_};
my $csv = Text::CSV->new();

my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}

my %output = ();

my $dataset = $plugin->{session}->{repository}->get_dataset('archive');

my $i = 0;
foreach my $field (@fields)
{
unless ($dataset->has_field($field))
{
$i++;
next;
}

my $metafield = $dataset->get_field($field);

if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);

if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
else
{
$output{$field} = \@values;
}
}
else
{
$output{$field} = $record[$i];
}
$i++;
}
return \%output;
}

1;

</pre>

= In More Detail =
== Modules ==
Here we import the superclass for our plugin.

<pre>
use EPrints::Plugin::Import::TextFile;
</pre>

== Inheritance ==

Our plugin will not inherit from the Import class directly, but from the TextFile subclass. This contains some extra file handling code that means we can ignore certain differences in text file formats. If you are creating an import plugin which imports non-text files you should subclass the EPrints::Plugin::Import class directly.

<pre>
our @ISA = ('EPrints::Plugin::Import::TextFile');
</pre>

== Constructor ==
For import plugins we must set a 'produce' property, to tell the repository what kinds of objects the plugin can import. This plugin only supports importing lists of eprints, but if it supported importing individual eprints we could add 'dataobj/eprint' to this property. We would then have to implement the "input_dataobj" method. Most plugins implement this method, but it is rarely used in practice. Most imports are done in lists (even if that list only contains one member), via the import items screen.

<pre>
$self->{produce} = [ 'list/eprint' ];
</pre>

Here we use a module that is not included with EPrints, Text::CSV, so we import it in a different way. First we check that it is installed, and load it if it is with "EPrints::Utils::require_if_exists".If it isn't we make the plugin invisible and produce an error message. It is good practice to import non-standard modules in this way rather than with "use".
<pre>
my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}
</pre>

== Input ==
Import plugins have to implement a couple of methods to read data from a file or string, manipulate it and turn it into a form which can be imported into the repository. That process will be described below.

=== input_fh ===

This method takes a filehandle, processes it, tries to import DataObjs in to the repository and then returns a List of the DataObjs imported.

This array will be used to create a List of DataObjs later.
<pre>
my @ids;
</pre>

Here we open the filehandle passed, and read the lines into an array.
<pre>
my $fh = $opts{fh};
my @records = <$fh>;
</pre>

We create a Text::CSV object to handle the input. Using a dedicated CSV handling package is preferable to using Perl's split function as it handles a number of more complicated scenarios such as commas within records using double quotes.
<pre>
my $csv = Text::CSV->new();
</pre>

After setting up an array for metadata field names, we attempt to parse the first line of our file. The parse method does not return an array of fields, but reports success or failure. In the event of success we use the fields method to return the last fields parsed. In the event of failure we use the error_input method to get the last error, and return undef.
<pre>
my @fields;
if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

Now that the row of column titles has been dealt with we move onto processing each record in the file.

In import plugins the convert_input method converts individual records into a format that can be imported into the repository. That is a hash whose keys are metadata field names and values are the corresponding values. As a row on its own cannot be imported as we don't know to which field each value belongs we have to construct an array to pass to convert_input first. We pass an array whose first element is the fields row and whose second element is the row we want to import.
<pre>
foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);
</pre>

Here we call convert_input on our constructed input_data. If the conversion fails we simply move to the next record.
<pre>
my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;
</pre>

The epdata_to_dataobj method takes our epdata hash reference and turns it into a new DataObj in our repository. If it is successful it returns the new DataObj, whose id we add to our array of ids.
<pre>
my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
</pre>

Finally we return a List object containing the ids of the records we have successfully imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===
This method takes data in a particular format, in this case CSV and transforms it into a hash of metadata field names and values.

We take the second argument to the method and convert the array reference into an array.
<pre>
my @input = @{shift @_};
</pre>

Here we setup another Text::CSV object.
<pre>
my $csv = Text::CSV->new();
</pre>

We take the second element of our array and parse it. This is the record we wish to import. If anything goes wrong we return undef.
<pre>
my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

We take the first element and get the field names. We then check that we have the same number of fields names as records.
<pre>
my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}
</pre>

This is the hash that we'll return later.
<pre>
my %output = ();
</pre>

For convenience we get the DataSet object.
<pre>
my $dataset = $plugin->{session}->{repository}->get_dataset('archive');
</pre>

We now iterate over the fields.
<pre>
my $i = 0;
foreach my $field (@fields)
{
</pre>

If the field does not exist we look at the next one, remembering to increment our index.
<pre>
unless ($dataset->has_field($field))
{
$i++;
next;
}
</pre>

We get the MetaField object corresponding to the current field.
<pre>
my $metafield = $dataset->get_field($field);
</pre>

We deal with multiple field types by separating individual values with a semi-colon.
<pre>
if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);
</pre>

Name fields are dealt with by using regular expressions and constructing a hash from the parts matched. The plugin expects names to be of the form Surname, Forenames, Lineage (Sr, Jr, III etc).
<pre>
if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
</pre>

Multiple fields which are not names are just added to the hash as an array reference.
<pre>
$output{$field} = \@values;
</pre>

Non-multiple fields are just added to the hash from the array of fields.
<pre>
$output{$field} = $record[$i];
</pre>

Finally we return a hash reference.
<pre>
return \%output;
</pre>

= Testing Your Plugin =

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

Type this into the "Cut and Paste Records" box:

<pre>
title,abstract
This is a test title,This is a test abstract
This is another test title,This is another test abstract
</pre>

Select "CSV" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

== Embedding commas ==
If you want to include commas in your imports, which is very likely you must enclose the field in double quotations. For example:
<pre>
title,abstract
An interesting article,"Damn it Jim, I'm a Doctor, not a Perl hacker."
</pre>

When doing this make sure not to leave any whitespace between a quotation mark and a comma or the import will fail.

== Multiple fields ==
Multiple field types are handled by separating each individual value by a semi-colon, a simple example of this would be the subjects field.

Go back to the import items and proceed as before, but typing this into the "Cut and Paste Records" box:

<pre>
abstract,title,subjects
Testing,Testing,AI;C;M;F;P
Testing,Testing,AC;DC
</pre>

After the records have been imported examine each one on the View Item screen. You will find that a list of subjects are given, and that a more descriptive name is given than the code we imported.

== Compound fields ==

Compound fields are fields that have subfields within them, each with their own name. You don't compound fields in one go, but set the components individually. Subfields have names of the form mainfieldname_subfieldname.

One of the most commonly used compound fields is the "Creators" field. It has a names subfield "creators_name" and an ID subfield "creators_id" which is most often used for email addresses.

Here is an example of setting the creators field:
<pre>
title,creators_names,creators_ids,
Setting compound fields.,"Bloggs, Joe;Doe, John","joe@bloggs.com;john@doe.com"
</pre>

If you import this record and then examine the view items screen you will find that the "Creators" field has been setup with the values displayed in a table.

Contribute: Plugins/ImportPluginsCSV

2007-09-28T19:03:14Z

Tom: /* Before You Start */

= Import Plugin Tutorial 1: CSV =

In this tutorial we will look at creating a relatively simple plugin to import eprints into our repository by reading files containing [http://en.wikipedia.org/wiki/Comma-Separated_Values comma-separated values]. We won't be dealing with documents and files, but will be focusing on importing eprint metadata.

Import plugins are inherently more complicated than export plugins because of the error checking that must be done, however in this example error checking has been kept to a minimum to simplify the example. In a "real" plugin you should check that the appropriate metadata fields are set for a given type of eprint, and unfortunately there appears to be no quick way to do this.

= Before You Start =

Create a directory for your import plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/import). The directory used for these examples is called "MyPlugins".

To prepare for this tutorial you should install the [http://search.cpan.org/~erangel/Text-CSV/CSV.pm Text::CSV] module. The following command as root, or using sudo should work.

<pre>
cpan Text::CSV
</pre>

= CSV.pm =
<pre>
package EPrints::Plugin::Import::MyPlugins::CSV;

use EPrints::Plugin::Import::TextFile;
use strict;

our @ISA = ('EPrints::Plugin::Import::TextFile');

sub new
{
my( $class, %params ) = @_;

my $self = $class->SUPER::new( %params );

$self->{name} = 'CSV';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' ];

my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};
my @records = <$fh>;
my $csv = Text::CSV->new();
my @fields;

if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);

my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my $plugin = shift;
my @input = @{shift @_};
my $csv = Text::CSV->new();

my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}

my %output = ();

my $dataset = $plugin->{session}->{repository}->get_dataset('archive');

my $i = 0;
foreach my $field (@fields)
{
unless ($dataset->has_field($field))
{
$i++;
next;
}

my $metafield = $dataset->get_field($field);

if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);

if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
else
{
$output{$field} = \@values;
}
}
else
{
$output{$field} = $record[$i];
}
$i++;
}
return \%output;
}

1;

</pre>

= In More Detail =
== Modules ==
Here we import the superclass for our plugin.

<pre>
use EPrints::Plugin::Import::TextFile;
</pre>

== Inheritance ==

Our plugin will not inherit from the Import class directly, but from the TextFile subclass. This contains some extra file handling code that means we can ignore certain differences in text file formats. If you are creating an import plugin which imports non-text files you should subclass the EPrints::Plugin::Import class directly.

<pre>
our @ISA = ('EPrints::Plugin::Import::TextFile');
</pre>

== Constructor ==
For import plugins we must set a 'produce' property, to tell the repository what kinds of objects the plugin can import. This plugin only supports importing lists of eprints, but if it supported importing individual eprints we could add 'dataobj/eprint' to this property. We would then have to implement the "input_dataobj" method. Most plugins implement this method, but it is rarely used in practice. Most imports are done in lists (even if that list only contains one member), via the import items screen.

<pre>
$self->{produce} = [ 'list/eprint' ];
</pre>

Here we use a module that is not included with EPrints, Text::CSV, so we import it in a different way. First we check that it is installed, and load it if it is with "EPrints::Utils::require_if_exists".If it isn't we make the plugin invisible and produce an error message. It is good practice to import non-standard modules in this way rather than with "use".
<pre>
my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}
</pre>

== Input ==
Import plugins have to implement a couple of methods to read data from a file or string, manipulate it and turn it into a form which can be imported into the repository. That process will be described below.

=== input_fh ===

This method takes a filehandle, processes it, tries to import DataObjs in to the repository and then returns a List of the DataObjs imported.

This array will be used to create a List of DataObjs later.
<pre>
my @ids;
</pre>

Here we open the filehandle passed, and read the lines into an array.
<pre>
my $fh = $opts{fh};
my @records = <$fh>;
</pre>

We create a Text::CSV object to handle the input. Using a dedicated CSV handling package is preferable to using Perl's split function as it handles a number of more complicated scenarios such as commas within records using double quotes.
<pre>
my $csv = Text::CSV->new();
</pre>

After setting up an array for metadata field names, we attempt to parse the first line of our file. The parse method does not return an array of fields, but reports success or failure. In the event of success we use the fields method to return the last fields parsed. In the event of failure we use the error_input method to get the last error, and return undef.
<pre>
my @fields;
if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

Now that the row of column titles has been dealt with we move onto processing each record in the file.

In import plugins the convert_input method converts individual records into a format that can be imported into the repository. That is a hash whose keys are metadata field names and values are the corresponding values. As a row on its own cannot be imported as we don't know to which field each value belongs we have to construct an array to pass to convert_input first. We pass an array whose first element is the fields row and whose second element is the row we want to import.
<pre>
foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);
</pre>

Here we call convert_input on our constructed input_data. If the conversion fails we simply move to the next record.
<pre>
my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;
</pre>

The epdata_to_dataobj method takes our epdata hash reference and turns it into a new DataObj in our repository. If it is successful it returns the new DataObj, whose id we add to our array of ids.
<pre>
my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
</pre>

Finally we return a List object containing the ids of the records we have successfully imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===
This method takes data in a particular format, in this case CSV and transforms it into a hash of metadata field names and values.

We take the second argument to the method and convert the array reference into an array.
<pre>
my @input = @{shift @_};
</pre>

Here we setup another Text::CSV object.
<pre>
my $csv = Text::CSV->new();
</pre>

We take the second element of our array and parse it. This is the record we wish to import. If anything goes wrong we return undef.
<pre>
my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

We take the first element and get the field names. We then check that we have the same number of fields names as records.
<pre>
my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}
</pre>

This is the hash that we'll return later.
<pre>
my %output = ();
</pre>

For convenience we get the DataSet object.
<pre>
my $dataset = $plugin->{session}->{repository}->get_dataset('archive');
</pre>

We now iterate over the fields.
<pre>
my $i = 0;
foreach my $field (@fields)
{
</pre>

If the field does not exist we look at the next one, remembering to increment our index.
<pre>
unless ($dataset->has_field($field))
{
$i++;
next;
}
</pre>

We get the MetaField object corresponding to the current field.
<pre>
my $metafield = $dataset->get_field($field);
</pre>

We deal with multiple field types by separating individual values with a semi-colon.
<pre>
if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);
</pre>

Name fields are dealt with by using regular expressions and constructing a hash from the parts matched. The plugin expects names to be of the form Surname, Forenames, Lineage (Sr, Jr, III etc).
<pre>
if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
</pre>

Multiple fields which are not names are just added to the hash as an array reference.
<pre>
$output{$field} = \@values;
</pre>

Non-multiple fields are just added to the hash from the array of fields.
<pre>
$output{$field} = $record[$i];
</pre>

Finally we return a hash reference.
<pre>
return \%output;
</pre>

= Testing Your Plugin =

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

Type this into the "Cut and Paste Records" box:

<pre>
title,abstract
This is a test title,This is a test abstract
This is another test title,This is another test abstract
</pre>

Select "CSV" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

== Embedding commas ==
If you want to include commas in your imports, which is very likely you must enclose the field in double quotations. For example:
<pre>
title,abstract
An interesting article,"Damn it Jim, I'm a Doctor, not a Perl hacker."
</pre>

When doing this make sure not to leave any whitespace between a quotation mark and a comma or the import will fail.

== Multiple fields ==
Multiple field types are handled by separating each individual value by a semi-colon, a simple example of this would be the subjects field.

Go back to the import items and proceed as before, but typing this into the "Cut and Paste Records" box:

<pre>
abstract,title,subjects
Testing,Testing,AI;C;M;F;P
Testing,Testing,AC;DC
</pre>

After the records have been imported examine each one on the View Item screen. You will find that a list of subjects are given, and that a more descriptive name is given than the code we imported.

== Compound fields ==

Compound fields are fields that have subfields within them, each with their own name. You don't compound fields in one go, but set the components individually. Subfields have names of the form mainfieldname_subfieldname.

One of the most commonly used compound fields is the "Creators" field. It has a names subfield "creators_name" and an ID subfield "creators_id" which is most often used for email addresses.

Here is an example of setting the creators field:
<pre>
title,creators_names,creators_ids,
Setting compound fields.,"Bloggs, Joe;Doe, John","joe@bloggs.com;john@doe.com"
</pre>

If you import this record and then examine the view items screen you will find that the "Creators" field has been setup with the values displayed in a table.

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T19:02:56Z

Tom: /* Before You Start */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

Create a directory for your export plugins in the main export plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export). The directory used for these examples is called "MyPlugins".

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
After the implicit class reference, a hash of options is given.
<pre>
my ($class, %opts) = @_;
</pre>

We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>

The accept field is a list containing the types of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denotes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin visible only to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the [http://en.wikipedia.org/wiki/MIME MIME] type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8.[http://en.wikipedia.org/wiki/UTF-8 UTF-8] is a Unicode character encoding capable of expressing characters from a large number of character sets and so is usually preferable to [http://en.wikipedia.org/wiki/ASCII ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>

We then return our plugin reference.
<pre>
return $self;
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints, and User which defines users of the repository.

Besides an implicit reference to the Plugin object, this method is also provided with a reference to an individual DataObj. It is called by several [http://en.wikipedia.org/wiki/Common_Gateway_Interface CGI] and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[Contribute:_Plugins/ExportPluginsList|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ImportPluginsCSV

2007-09-28T19:02:00Z

Tom: /* Import Plugin Tutorial 1: CSV */

= Import Plugin Tutorial 1: CSV =

In this tutorial we will look at creating a relatively simple plugin to import eprints into our repository by reading files containing [http://en.wikipedia.org/wiki/Comma-Separated_Values comma-separated values]. We won't be dealing with documents and files, but will be focusing on importing eprint metadata.

Import plugins are inherently more complicated than export plugins because of the error checking that must be done, however in this example error checking has been kept to a minimum to simplify the example. In a "real" plugin you should check that the appropriate metadata fields are set for a given type of eprint, and unfortunately there appears to be no quick way to do this.

= Before You Start =

It is sensible to separate the plugins you create for EPrints from those included with it. Create a directory for your import plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/import) for example /opt/eprints3/perl_lib/EPrints/Plugin/import/MyPlugins.

To prepare for this tutorial you should install the [http://search.cpan.org/~erangel/Text-CSV/CSV.pm Text::CSV] module. The following command as root, or using sudo should work.

<pre>
cpan Text::CSV
</pre>

= CSV.pm =
<pre>
package EPrints::Plugin::Import::MyPlugins::CSV;

use EPrints::Plugin::Import::TextFile;
use strict;

our @ISA = ('EPrints::Plugin::Import::TextFile');

sub new
{
my( $class, %params ) = @_;

my $self = $class->SUPER::new( %params );

$self->{name} = 'CSV';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' ];

my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};
my @records = <$fh>;
my $csv = Text::CSV->new();
my @fields;

if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);

my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my $plugin = shift;
my @input = @{shift @_};
my $csv = Text::CSV->new();

my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}

my %output = ();

my $dataset = $plugin->{session}->{repository}->get_dataset('archive');

my $i = 0;
foreach my $field (@fields)
{
unless ($dataset->has_field($field))
{
$i++;
next;
}

my $metafield = $dataset->get_field($field);

if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);

if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
else
{
$output{$field} = \@values;
}
}
else
{
$output{$field} = $record[$i];
}
$i++;
}
return \%output;
}

1;

</pre>

= In More Detail =
== Modules ==
Here we import the superclass for our plugin.

<pre>
use EPrints::Plugin::Import::TextFile;
</pre>

== Inheritance ==

Our plugin will not inherit from the Import class directly, but from the TextFile subclass. This contains some extra file handling code that means we can ignore certain differences in text file formats. If you are creating an import plugin which imports non-text files you should subclass the EPrints::Plugin::Import class directly.

<pre>
our @ISA = ('EPrints::Plugin::Import::TextFile');
</pre>

== Constructor ==
For import plugins we must set a 'produce' property, to tell the repository what kinds of objects the plugin can import. This plugin only supports importing lists of eprints, but if it supported importing individual eprints we could add 'dataobj/eprint' to this property. We would then have to implement the "input_dataobj" method. Most plugins implement this method, but it is rarely used in practice. Most imports are done in lists (even if that list only contains one member), via the import items screen.

<pre>
$self->{produce} = [ 'list/eprint' ];
</pre>

Here we use a module that is not included with EPrints, Text::CSV, so we import it in a different way. First we check that it is installed, and load it if it is with "EPrints::Utils::require_if_exists".If it isn't we make the plugin invisible and produce an error message. It is good practice to import non-standard modules in this way rather than with "use".
<pre>
my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}
</pre>

== Input ==
Import plugins have to implement a couple of methods to read data from a file or string, manipulate it and turn it into a form which can be imported into the repository. That process will be described below.

=== input_fh ===

This method takes a filehandle, processes it, tries to import DataObjs in to the repository and then returns a List of the DataObjs imported.

This array will be used to create a List of DataObjs later.
<pre>
my @ids;
</pre>

Here we open the filehandle passed, and read the lines into an array.
<pre>
my $fh = $opts{fh};
my @records = <$fh>;
</pre>

We create a Text::CSV object to handle the input. Using a dedicated CSV handling package is preferable to using Perl's split function as it handles a number of more complicated scenarios such as commas within records using double quotes.
<pre>
my $csv = Text::CSV->new();
</pre>

After setting up an array for metadata field names, we attempt to parse the first line of our file. The parse method does not return an array of fields, but reports success or failure. In the event of success we use the fields method to return the last fields parsed. In the event of failure we use the error_input method to get the last error, and return undef.
<pre>
my @fields;
if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

Now that the row of column titles has been dealt with we move onto processing each record in the file.

In import plugins the convert_input method converts individual records into a format that can be imported into the repository. That is a hash whose keys are metadata field names and values are the corresponding values. As a row on its own cannot be imported as we don't know to which field each value belongs we have to construct an array to pass to convert_input first. We pass an array whose first element is the fields row and whose second element is the row we want to import.
<pre>
foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);
</pre>

Here we call convert_input on our constructed input_data. If the conversion fails we simply move to the next record.
<pre>
my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;
</pre>

The epdata_to_dataobj method takes our epdata hash reference and turns it into a new DataObj in our repository. If it is successful it returns the new DataObj, whose id we add to our array of ids.
<pre>
my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
</pre>

Finally we return a List object containing the ids of the records we have successfully imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===
This method takes data in a particular format, in this case CSV and transforms it into a hash of metadata field names and values.

We take the second argument to the method and convert the array reference into an array.
<pre>
my @input = @{shift @_};
</pre>

Here we setup another Text::CSV object.
<pre>
my $csv = Text::CSV->new();
</pre>

We take the second element of our array and parse it. This is the record we wish to import. If anything goes wrong we return undef.
<pre>
my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

We take the first element and get the field names. We then check that we have the same number of fields names as records.
<pre>
my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}
</pre>

This is the hash that we'll return later.
<pre>
my %output = ();
</pre>

For convenience we get the DataSet object.
<pre>
my $dataset = $plugin->{session}->{repository}->get_dataset('archive');
</pre>

We now iterate over the fields.
<pre>
my $i = 0;
foreach my $field (@fields)
{
</pre>

If the field does not exist we look at the next one, remembering to increment our index.
<pre>
unless ($dataset->has_field($field))
{
$i++;
next;
}
</pre>

We get the MetaField object corresponding to the current field.
<pre>
my $metafield = $dataset->get_field($field);
</pre>

We deal with multiple field types by separating individual values with a semi-colon.
<pre>
if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);
</pre>

Name fields are dealt with by using regular expressions and constructing a hash from the parts matched. The plugin expects names to be of the form Surname, Forenames, Lineage (Sr, Jr, III etc).
<pre>
if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
</pre>

Multiple fields which are not names are just added to the hash as an array reference.
<pre>
$output{$field} = \@values;
</pre>

Non-multiple fields are just added to the hash from the array of fields.
<pre>
$output{$field} = $record[$i];
</pre>

Finally we return a hash reference.
<pre>
return \%output;
</pre>

= Testing Your Plugin =

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

Type this into the "Cut and Paste Records" box:

<pre>
title,abstract
This is a test title,This is a test abstract
This is another test title,This is another test abstract
</pre>

Select "CSV" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

== Embedding commas ==
If you want to include commas in your imports, which is very likely you must enclose the field in double quotations. For example:
<pre>
title,abstract
An interesting article,"Damn it Jim, I'm a Doctor, not a Perl hacker."
</pre>

When doing this make sure not to leave any whitespace between a quotation mark and a comma or the import will fail.

== Multiple fields ==
Multiple field types are handled by separating each individual value by a semi-colon, a simple example of this would be the subjects field.

Go back to the import items and proceed as before, but typing this into the "Cut and Paste Records" box:

<pre>
abstract,title,subjects
Testing,Testing,AI;C;M;F;P
Testing,Testing,AC;DC
</pre>

After the records have been imported examine each one on the View Item screen. You will find that a list of subjects are given, and that a more descriptive name is given than the code we imported.

== Compound fields ==

Compound fields are fields that have subfields within them, each with their own name. You don't compound fields in one go, but set the components individually. Subfields have names of the form mainfieldname_subfieldname.

One of the most commonly used compound fields is the "Creators" field. It has a names subfield "creators_name" and an ID subfield "creators_id" which is most often used for email addresses.

Here is an example of setting the creators field:
<pre>
title,creators_names,creators_ids,
Setting compound fields.,"Bloggs, Joe;Doe, John","joe@bloggs.com;john@doe.com"
</pre>

If you import this record and then examine the view items screen you will find that the "Creators" field has been setup with the values displayed in a table.

Contribute: Plugins/ExportPluginsZip

2007-09-28T19:01:13Z

Tom: /* Handling DataObjs */

= Export Plugin Tutorial 5: Zip =
In this tutorial we'll look at packaging the results of a search into a Zip file. We'll create a directory for each eprint, and a sub-directory for each document belonging to that eprint. We'll also add an HTML index file to the archive to make it easier to navigate.

To prepare for this tutorial you should install the [http://search.cpan.org/~miyagawa/Archive-Any-Create-0.02/lib/Archive/Any/Create.pm Archive::Any::Create] module. The following command as root, or using sudo should work.

<pre>
cpan Archive::Any::Create
</pre>

= Zip.pm =
The code in the section below should be placed in a file called Zip.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::Zip;

@ISA = ('EPrints::Plugin::Export');

use strict;
use Archive::Any::Create;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Zip';
$self->{accept} = [ 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';

my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;

my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;

my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END

my $session = $plugin->{session};

foreach my $dataobj ($opts{list}->get_records)
{
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);

my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';

my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
my %files = $doc->files;

my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);

if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}

foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;

if (-d $file)
{
next;
}

my $data = '';
open (my $datafh ,'>', \$data);

open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;

$zip->add_file($filepath, $data);
}
$i++;
}
$index .= EPrints::XML::to_string($div);
}

$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);

if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = [ 'list/eprint' ];
</pre>

The file extension and [http://en.wikipedia.org/wiki/MIME MIME] type are set to values appropriate for Zip files.
<pre>
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.

<pre>
my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}
</pre>

== List Handling ==
=== Setting Up ===
Here we setup an in-memory file for the Zip, and create an Archive object.
<pre>
my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;
</pre>

=== Navigation ===
Here we begin to setup the HTML file that we'll add to our archive for navigation. First we setup a header.
<pre>
my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END
</pre>

Now we get the Session object, we'll be using it to manipulate [http://en.wikipedia.org/wiki/Document_Object_Model DOM] objects later.
<pre>
my $session = $plugin->{session};
</pre>

=== Handling DataObjs ===
We loop over the DataObjs as we have done before.

This time we setup some [http://en.wikipedia.org/wiki/Document_Object_Model DOM] objects to be added to our index. Each eprint will have it's title printed out followed by a list of documents.
<pre>
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);
</pre>

We create a directory for each eprint. Note it is not necessary to explicitly create a directory, we simply have to set the appropriate file path. However this means that if you do not add files to a certain directory it will not be created, rather than having an empty directory for a given eprint.

<pre>
my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';
</pre>

==== Dealing With Documents ====
We then loop over all the documents belonging to each DataObj. The get_all_documents method returns an array of Document objects.
<pre>
my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
</pre>

Here we create a list item for the document containing a link to the main file.
<pre>
my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);
</pre>
If a description of the main file has been set we use that as the link text, otherwise we use the filename.
<pre>
if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}
</pre>

==== Dealing With Files ====
The files method of the Document object returns a hash whose keys are file names and values are file sizes.
<pre>
my %files = $doc->files;
</pre>

We loop over each file belonging to the document, in most cases there will only be one file.
<pre>
foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;
</pre>

We need to read the contents of the file and add it to a file in the zip. First we'll create another in-memory file to hold the contents.
<pre>
my $data = '';
open (my $datafh ,'>', \$data);
</pre>

We open our file and print it straight out to our in-memory file.
<pre>
open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;
</pre>

Then we add the file data to our file.
<pre>
$zip->add_file($filepath, $data);
</pre>

Finally we add the [http://en.wikipedia.org/wiki/Document_Object_Model DOM] object for our eprint to the index.
<pre>
$index .= EPrints::XML::to_string($div);
</pre>

=== Finishing Off ===
After finishing off our index file we add it to the zip file.
<pre>
$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);
</pre>

If a file handle has been provided we write to it, otherwise we write to the scalar file handle created earlier. We then return in the usual fashion.
<pre>
if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expzipv2.png]]

The accompanying HTML index.

[[Image:Expzip2.png]]

Contribute: Plugins/ExportPluginsZip

2007-09-28T19:00:43Z

Tom: /* Navigation */

= Export Plugin Tutorial 5: Zip =
In this tutorial we'll look at packaging the results of a search into a Zip file. We'll create a directory for each eprint, and a sub-directory for each document belonging to that eprint. We'll also add an HTML index file to the archive to make it easier to navigate.

To prepare for this tutorial you should install the [http://search.cpan.org/~miyagawa/Archive-Any-Create-0.02/lib/Archive/Any/Create.pm Archive::Any::Create] module. The following command as root, or using sudo should work.

<pre>
cpan Archive::Any::Create
</pre>

= Zip.pm =
The code in the section below should be placed in a file called Zip.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::Zip;

@ISA = ('EPrints::Plugin::Export');

use strict;
use Archive::Any::Create;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Zip';
$self->{accept} = [ 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';

my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;

my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;

my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END

my $session = $plugin->{session};

foreach my $dataobj ($opts{list}->get_records)
{
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);

my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';

my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
my %files = $doc->files;

my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);

if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}

foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;

if (-d $file)
{
next;
}

my $data = '';
open (my $datafh ,'>', \$data);

open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;

$zip->add_file($filepath, $data);
}
$i++;
}
$index .= EPrints::XML::to_string($div);
}

$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);

if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = [ 'list/eprint' ];
</pre>

The file extension and [http://en.wikipedia.org/wiki/MIME MIME] type are set to values appropriate for Zip files.
<pre>
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.

<pre>
my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}
</pre>

== List Handling ==
=== Setting Up ===
Here we setup an in-memory file for the Zip, and create an Archive object.
<pre>
my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;
</pre>

=== Navigation ===
Here we begin to setup the HTML file that we'll add to our archive for navigation. First we setup a header.
<pre>
my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END
</pre>

Now we get the Session object, we'll be using it to manipulate [http://en.wikipedia.org/wiki/Document_Object_Model DOM] objects later.
<pre>
my $session = $plugin->{session};
</pre>

=== Handling DataObjs ===
We loop over the DataObjs as we have done before.

This time we setup some DOM objects to be added to our index. Each eprint will have it's title printed out followed by a list of documents.
<pre>
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);
</pre>

We create a directory for each eprint. Note it is not necessary to explicitly create a directory, we simply have to set the appropriate file path. However this means that if you do not add files to a certain directory it will not be created, rather than having an empty directory for a given eprint.

<pre>
my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';
</pre>

==== Dealing With Documents ====
We then loop over all the documents belonging to each DataObj. The get_all_documents method returns an array of Document objects.
<pre>
my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
</pre>

Here we create a list item for the document containing a link to the main file.
<pre>
my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);
</pre>
If a description of the main file has been set we use that as the link text, otherwise we use the filename.
<pre>
if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}
</pre>

==== Dealing With Files ====
The files method of the Document object returns a hash whose keys are file names and values are file sizes.
<pre>
my %files = $doc->files;
</pre>

We loop over each file belonging to the document, in most cases there will only be one file.
<pre>
foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;
</pre>

We need to read the contents of the file and add it to a file in the zip. First we'll create another in-memory file to hold the contents.
<pre>
my $data = '';
open (my $datafh ,'>', \$data);
</pre>

We open our file and print it straight out to our in-memory file.
<pre>
open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;
</pre>

Then we add the file data to our file.
<pre>
$zip->add_file($filepath, $data);
</pre>

Finally we add the DOM object for our eprint to the index.
<pre>
$index .= EPrints::XML::to_string($div);
</pre>

=== Finishing Off ===
After finishing off our index file we add it to the zip file.
<pre>
$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);
</pre>

If a file handle has been provided we write to it, otherwise we write to the scalar file handle created earlier. We then return in the usual fashion.
<pre>
if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expzipv2.png]]

The accompanying HTML index.

[[Image:Expzip2.png]]

Contribute: Plugins/ExportPluginsZip

2007-09-28T19:00:25Z

Tom: /* Constructor */

= Export Plugin Tutorial 5: Zip =
In this tutorial we'll look at packaging the results of a search into a Zip file. We'll create a directory for each eprint, and a sub-directory for each document belonging to that eprint. We'll also add an HTML index file to the archive to make it easier to navigate.

To prepare for this tutorial you should install the [http://search.cpan.org/~miyagawa/Archive-Any-Create-0.02/lib/Archive/Any/Create.pm Archive::Any::Create] module. The following command as root, or using sudo should work.

<pre>
cpan Archive::Any::Create
</pre>

= Zip.pm =
The code in the section below should be placed in a file called Zip.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::Zip;

@ISA = ('EPrints::Plugin::Export');

use strict;
use Archive::Any::Create;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Zip';
$self->{accept} = [ 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';

my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;

my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;

my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END

my $session = $plugin->{session};

foreach my $dataobj ($opts{list}->get_records)
{
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);

my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';

my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
my %files = $doc->files;

my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);

if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}

foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;

if (-d $file)
{
next;
}

my $data = '';
open (my $datafh ,'>', \$data);

open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;

$zip->add_file($filepath, $data);
}
$i++;
}
$index .= EPrints::XML::to_string($div);
}

$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);

if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = [ 'list/eprint' ];
</pre>

The file extension and [http://en.wikipedia.org/wiki/MIME MIME] type are set to values appropriate for Zip files.
<pre>
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.

<pre>
my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}
</pre>

== List Handling ==
=== Setting Up ===
Here we setup an in-memory file for the Zip, and create an Archive object.
<pre>
my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;
</pre>

=== Navigation ===
Here we begin to setup the HTML file that we'll add to our archive for navigation. First we setup a header.
<pre>
my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END
</pre>

Now we get the Session object, we'll be using it to manipulate DOM objects later.
<pre>
my $session = $plugin->{session};
</pre>

=== Handling DataObjs ===
We loop over the DataObjs as we have done before.

This time we setup some DOM objects to be added to our index. Each eprint will have it's title printed out followed by a list of documents.
<pre>
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);
</pre>

We create a directory for each eprint. Note it is not necessary to explicitly create a directory, we simply have to set the appropriate file path. However this means that if you do not add files to a certain directory it will not be created, rather than having an empty directory for a given eprint.

<pre>
my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';
</pre>

==== Dealing With Documents ====
We then loop over all the documents belonging to each DataObj. The get_all_documents method returns an array of Document objects.
<pre>
my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
</pre>

Here we create a list item for the document containing a link to the main file.
<pre>
my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);
</pre>
If a description of the main file has been set we use that as the link text, otherwise we use the filename.
<pre>
if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}
</pre>

==== Dealing With Files ====
The files method of the Document object returns a hash whose keys are file names and values are file sizes.
<pre>
my %files = $doc->files;
</pre>

We loop over each file belonging to the document, in most cases there will only be one file.
<pre>
foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;
</pre>

We need to read the contents of the file and add it to a file in the zip. First we'll create another in-memory file to hold the contents.
<pre>
my $data = '';
open (my $datafh ,'>', \$data);
</pre>

We open our file and print it straight out to our in-memory file.
<pre>
open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;
</pre>

Then we add the file data to our file.
<pre>
$zip->add_file($filepath, $data);
</pre>

Finally we add the DOM object for our eprint to the index.
<pre>
$index .= EPrints::XML::to_string($div);
</pre>

=== Finishing Off ===
After finishing off our index file we add it to the zip file.
<pre>
$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);
</pre>

If a file handle has been provided we write to it, otherwise we write to the scalar file handle created earlier. We then return in the usual fashion.
<pre>
if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expzipv2.png]]

The accompanying HTML index.

[[Image:Expzip2.png]]

Contribute: Plugins/ExportPluginsExcel

2007-09-28T19:00:02Z

Tom: /* Constructor */

= Export Plugin Tutorial 4: Excel =

In this tutorial and the next one we'll look at exporting files in non-text formats. Here we will explore exporting metadata in Excel format (pre Microsoft Office 2007 which uses an XML based format).

To prepare for this tutorial you should install the [http://search.cpan.org/dist/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm Spreadsheet::Excel] module. The following command as root, or using sudo should work.

<pre>
cpan Spreadsheet::Excel
</pre>

= Excel.pm =
The code in the section below should be placed in a file called Excel.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::Excel;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;
my $self = $class->SUPER::new(%opts);

$self->{name} = 'Excel';
$self->{accept} = ['list/eprint'];
$self->{visible} = 'all';
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';

my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}

my $worksheet = $workbook->add_worksheet();

my $i = 0;
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;

foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}

$i = 1;
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
if ($dataobj->exists_and_set($field->get_name))
{
if ($field->get_property('multiple'))
{
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
}
else {
$worksheet->write($i, $j, $dataobj->get_value($field->get_name));
}
}
$j++;
}
$i++;
}

$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = ['list/eprint'];
</pre>

The file extension and [http://en.wikipedia.org/wiki/MIME MIME] type are set to values appropriate for Excel files.
<pre>
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.
<pre>
my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}
</pre>

== List Handling ==
=== Setting Up a Workbook ===
Here we create a new Excel workbook. We start by creating a file handle using a scalar rather than a filename, this creates an in-memory file. Then depending on if a file handle has been supplied or not we create a workbook object that will be written to that file handle or the scalar file handle we just created.

<pre>
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
</pre>

=== Handling DataObjs ===
To start adding data to the Excel file we have to create a worksheet.
<pre>
my $worksheet = $workbook->add_worksheet();
</pre>

To the first row of our worksheet we add the names of all the metadata fields that can be associated with eprints. We get the session associated with the plugin, and then the repository associated with that session. We then get the DataSet "archive" from that repository and call the get_fields method. That method returns an array of MetaField objects.
<pre>
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;
</pre>

Here we loop over each field and write it's name to our worksheet.
<pre>
foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}
</pre>

We now loop over each DataObj in our list, and over each MetaField we found earlier.
<pre>
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
</pre>

We only write something to the worksheet if the field can apply to the DataObj and is set. Scalar values are simply written to the worksheet.
<pre>
if ($dataobj->exists_and_set($field->get_name))
</pre>

The plugin handles fields which can take multiple values in a number of ways.
<pre>
if ($field->get_property('multiple'))
</pre>
Names are handled specially and are formatted with the family name followed by a comma, the given name or initial and then a semi-colon.
<pre>
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
</pre>
Fields which have a compound type are not handled, and 'COMPOUND' is written to the worksheet.
<pre>
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
</pre>
For most multiple fields each value is taken and concatenated, separated by semi-colons.
<pre>
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
</pre>

=== Finishing Up ===

We first need to close the workbook to ensure that the data is written to the file handles, and then we can return in the usual fashion.
<pre>
$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expexcel.png]]

Contribute: Plugins/ExportPluginsHTML

2007-09-28T18:59:03Z

Tom: /* xml_dataobj */

= Export Plugin Tutorial 3: HTML =

In this tutorial we'll look at creating an export plugin with slightly more complex output than unformatted plain text. Although the plugin below produces XHTML the same principles apply to producing any XML document.

= HTML.pm =
The code in the section below should be placed in a file called HTML.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HTML;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, HTML';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}
my $footer = '</body></html>';
if (defined $opts{fh})
{
print {$opts{fh}} $footer;
}
else
{
push @{$r}, $footer;
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

sub xml_dataobj
{
my ($plugin, $dataobj) = @_;

my $session = $plugin->{session};

my $div = $session->make_element('div');

my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);

my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);

return $div;
}

1;

</pre>

= In More Detail =
== Constructor ==
Here we change the file extension to '.htm' and change the [http://en.wikipedia.org/wiki/MIME MIME] type to "text/html". For general XML documents you should change the file extension to '.xml' and the [http://en.wikipedia.org/wiki/MIME MIME] type to 'text/xml';

<pre>
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';
</pre>

== output_dataobj ==
Here the output_dataobj method does very little. It calls the xml_dataobj method to obtain a [http://en.wikipedia.org/wiki/Document_Object_Model DOM] tree which is converted to plain text before being returned. The xml_dataobj method is described later.

<pre>
my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
</pre>

== output_list ==
The only changes to the output_list method are the additions of a header and a footer.

<pre>
my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

my $footer = '</body></html>';
</pre>

== xml_dataobj ==
This method has a similar signature to the output_dataobj method: it takes an implicit reference to the plugin object and a reference to a DataObj, however instead of returning a string it returns a [http://en.wikipedia.org/wiki/Document_Object_Model DOM] tree.

Before we can start creating and manipulating [http://en.wikipedia.org/wiki/Document_Object_Model DOM] objects we need to get a reference to the Session object from the plugin.

<pre>
my $session = $plugin->{session};
</pre>

We then create elements using the make_element method of our session object. The first argument to the make_element method is the tag name of the element we want to create, for example 'a' or 'div'. The second parameter is a hash of attributes and values associated with the element.
<pre>
my $div = $session->make_element('div');
</pre>

Here we add a second-level header. The make_text method is used to create text nodes. The appendChild method is used to add child nodes.

<pre>
my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);
</pre>

A similar pattern is followed to add a paragraph containing the eprint's abstract.

<pre>
my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);
</pre>

Finally we return a [http://en.wikipedia.org/wiki/Document_Object_Model DOM] object representing a 'div' element containing our header and paragraph.

<pre>
return $div;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Exphtml.png]]

Contribute: Plugins/ExportPluginsHTML

2007-09-28T18:57:47Z

Tom: /* output_dataobj */

= Export Plugin Tutorial 3: HTML =

In this tutorial we'll look at creating an export plugin with slightly more complex output than unformatted plain text. Although the plugin below produces XHTML the same principles apply to producing any XML document.

= HTML.pm =
The code in the section below should be placed in a file called HTML.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HTML;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, HTML';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}
my $footer = '</body></html>';
if (defined $opts{fh})
{
print {$opts{fh}} $footer;
}
else
{
push @{$r}, $footer;
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

sub xml_dataobj
{
my ($plugin, $dataobj) = @_;

my $session = $plugin->{session};

my $div = $session->make_element('div');

my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);

my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);

return $div;
}

1;

</pre>

= In More Detail =
== Constructor ==
Here we change the file extension to '.htm' and change the [http://en.wikipedia.org/wiki/MIME MIME] type to "text/html". For general XML documents you should change the file extension to '.xml' and the [http://en.wikipedia.org/wiki/MIME MIME] type to 'text/xml';

<pre>
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';
</pre>

== output_dataobj ==
Here the output_dataobj method does very little. It calls the xml_dataobj method to obtain a [http://en.wikipedia.org/wiki/Document_Object_Model DOM] tree which is converted to plain text before being returned. The xml_dataobj method is described later.

<pre>
my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
</pre>

== output_list ==
The only changes to the output_list method are the additions of a header and a footer.

<pre>
my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

my $footer = '</body></html>';
</pre>

== xml_dataobj ==
This method has a similar signature to the output_dataobj method: it takes an implicit reference to the plugin object and a reference to a DataObj, however instead of returning a string it returns a DOM object.

Before we can start creating and manipulating DOM objects we need to get a reference to the Session object from the plugin.

<pre>
my $session = $plugin->{session};
</pre>

We then create elements using the make_element method of our session object. The first argument to the make_element method is the tag name of the element we want to create, for example 'a' or 'div'. The second parameter is a hash of attributes and values associated with the element.
<pre>
my $div = $session->make_element('div');
</pre>

Here we add a second-level header. The make_text method is used to create text nodes. The appendChild method is used to add child nodes.

<pre>
my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);
</pre>

A similar pattern is followed to add a paragraph containing the eprint's abstract.

<pre>
my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);
</pre>

Finally we return a DOM object representing a 'div' element containing our header and paragraph.

<pre>
return $div;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Exphtml.png]]

Contribute: Plugins/ExportPluginsHTML

2007-09-28T18:55:44Z

Tom: /* Constructor */

= Export Plugin Tutorial 3: HTML =

In this tutorial we'll look at creating an export plugin with slightly more complex output than unformatted plain text. Although the plugin below produces XHTML the same principles apply to producing any XML document.

= HTML.pm =
The code in the section below should be placed in a file called HTML.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HTML;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, HTML';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}
my $footer = '</body></html>';
if (defined $opts{fh})
{
print {$opts{fh}} $footer;
}
else
{
push @{$r}, $footer;
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

sub xml_dataobj
{
my ($plugin, $dataobj) = @_;

my $session = $plugin->{session};

my $div = $session->make_element('div');

my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);

my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);

return $div;
}

1;

</pre>

= In More Detail =
== Constructor ==
Here we change the file extension to '.htm' and change the [http://en.wikipedia.org/wiki/MIME MIME] type to "text/html". For general XML documents you should change the file extension to '.xml' and the [http://en.wikipedia.org/wiki/MIME MIME] type to 'text/xml';

<pre>
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';
</pre>

== output_dataobj ==
Here the output_dataobj method does very little. It calls the xml_dataobj method to obtain a DOM object which is converted to plain text before being returned. The xml_dataobj method is described later.

<pre>
my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
</pre>

== output_list ==
The only changes to the output_list method are the additions of a header and a footer.

<pre>
my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

my $footer = '</body></html>';
</pre>

== xml_dataobj ==
This method has a similar signature to the output_dataobj method: it takes an implicit reference to the plugin object and a reference to a DataObj, however instead of returning a string it returns a DOM object.

Before we can start creating and manipulating DOM objects we need to get a reference to the Session object from the plugin.

<pre>
my $session = $plugin->{session};
</pre>

We then create elements using the make_element method of our session object. The first argument to the make_element method is the tag name of the element we want to create, for example 'a' or 'div'. The second parameter is a hash of attributes and values associated with the element.
<pre>
my $div = $session->make_element('div');
</pre>

Here we add a second-level header. The make_text method is used to create text nodes. The appendChild method is used to add child nodes.

<pre>
my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);
</pre>

A similar pattern is followed to add a paragraph containing the eprint's abstract.

<pre>
my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);
</pre>

Finally we return a DOM object representing a 'div' element containing our header and paragraph.

<pre>
return $div;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Exphtml.png]]

Contribute: Plugins/ExportPluginsList

2007-09-28T18:55:15Z

Tom: /* Filehandles */

= Export Plugin Tutorial 2: List Handling =

In this tutorial you will learn to create a slightly more complex export plugin than the one created in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]] that changes the way lists of eprints are exported.

= HelloList.pm =
The code in the section below should be placed in a file called HelloList.pm in the directory created previously, and MyPlugins
should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, List!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_id()."\t".$dataobj->get_value('title')."\n";
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

1;
</pre>

= In More Detail =
The above code is very similar to the HelloExport.pm file in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]] so only the points where it deviates significantly from that file will be discussed below.

== Housekeeping ==
The package name has been changed to reflect the filename.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;
</pre>

== Constructor ==
Make sure you give each plugin a unique name.

<pre>
$self->{name} = 'Hello, List!';
</pre>

== Dealing With Lists ==
In this example the output_list method is overridden in the export plugin to provide column headers. The original method just concatenates the output from the output_dataobj subroutine called on every DataObj in the list.

Note that the method is not provided with a bare array of DataObjs, but a List object is provided within the opts hash. To get an array of DataObjs to loop over you must then call that List object's get_record method.

== Filehandles ==
The command line export tool provides the output list method with a filehandle for output in the opts hash, while the [http://en.wikipedia.org/wiki/Common_Gateway_Interface CGI] export uses a value returned from the method. You must deal with the filehandle or you will get no output from the command line tool. In most cases this won't matter, but it is good practice to deal with it.

The best way to handle output is to check if a filehandle has been provided every time something needs to be output. If a filehandle is provided we print to it, otherwise we save the output for later. At the end of the method we either return undef if a filehandle was provided or we return the saved output otherwise.

<pre>
sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}
</pre>

= Testing Your Plugin =

Restart your webserver and test the plugin as in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]].

== Sample Output ==
[[Image:Explist.png]]

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T14:48:24Z

Tom: /* Processing DataObjs */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

Create a directory for your export plugins in the main export plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export). The directory used for these examples is called "MyPlugins"

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
After the implicit class reference, a hash of options is given.
<pre>
my ($class, %opts) = @_;
</pre>

We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>

The accept field is a list containing the types of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denotes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin visible only to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the [http://en.wikipedia.org/wiki/MIME MIME] type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8.[http://en.wikipedia.org/wiki/UTF-8 UTF-8] is a Unicode character encoding capable of expressing characters from a large number of character sets and so is usually preferable to [http://en.wikipedia.org/wiki/ASCII ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>

We then return our plugin reference.
<pre>
return $self;
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints, and User which defines users of the repository.

Besides an implicit reference to the Plugin object, this method is also provided with a reference to an individual DataObj. It is called by several [http://en.wikipedia.org/wiki/Common_Gateway_Interface CGI] and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[Contribute:_Plugins/ExportPluginsList|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T14:47:47Z

Tom: /* Processing DataObjs */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

Create a directory for your export plugins in the main export plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export). The directory used for these examples is called "MyPlugins"

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
After the implicit class reference, a hash of options is given.
<pre>
my ($class, %opts) = @_;
</pre>

We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>

The accept field is a list containing the types of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denotes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin visible only to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the [http://en.wikipedia.org/wiki/MIME MIME] type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8.[http://en.wikipedia.org/wiki/UTF-8 UTF-8] is a Unicode character encoding capable of expressing characters from a large number of character sets and so is usually preferable to [http://en.wikipedia.org/wiki/ASCII ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>

We then return our plugin reference.
<pre>
return $self;
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints, and User which defines users of the repository.

Besides an implicit reference to the Plugin object, this method is also provided with a reference to an individual DataObj. It is called by several [http://en.wikipedia.org/wiki/Common_Gateway_Interface CGI] and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[Contribute:_Plugins/ExportPluginList|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T14:41:12Z

Tom: /* Processing DataObjs */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

Create a directory for your export plugins in the main export plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export). The directory used for these examples is called "MyPlugins"

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
After the implicit class reference, a hash of options is given.
<pre>
my ($class, %opts) = @_;
</pre>

We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>

The accept field is a list containing the types of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denotes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin visible only to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the [http://en.wikipedia.org/wiki/MIME MIME] type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8.[http://en.wikipedia.org/wiki/UTF-8 UTF-8] is a Unicode character encoding capable of expressing characters from a large number of character sets and so is usually preferable to [http://en.wikipedia.org/wiki/ASCII ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>

We then return our plugin reference.
<pre>
return $self;
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints, and User which defines users of the repository.

Besides an implicit reference to the Plugin object, this method is also provided with a reference to an individual DataObj. It is called by several [http://en.wikipedia.org/wiki/Common_Gateway_Interface CGI] and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[User:Tom/Export Plugins/Hello Lists|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T14:38:26Z

Tom: /* Constructor */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

Create a directory for your export plugins in the main export plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export). The directory used for these examples is called "MyPlugins"

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
After the implicit class reference, a hash of options is given.
<pre>
my ($class, %opts) = @_;
</pre>

We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>

The accept field is a list containing the types of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denotes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin visible only to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the [http://en.wikipedia.org/wiki/MIME MIME] type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8.[http://en.wikipedia.org/wiki/UTF-8 UTF-8] is a Unicode character encoding capable of expressing characters from a large number of character sets and so is usually preferable to [http://en.wikipedia.org/wiki/ASCII ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>

We then return our plugin reference.
<pre>
return $self;
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints and User which defines users of the repository.

Besides an implicit reference to the plugin object, this method is also provided with a reference to an individual DataObj. It is called by several cgi and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[User:Tom/Export Plugins/Hello Lists|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T14:35:55Z

Tom: /* Constructor */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

Create a directory for your export plugins in the main export plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export). The directory used for these examples is called "MyPlugins"

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
After the implicit class reference, a hash of options is given.
<pre>
my ($class, %opts) = @_;
</pre>

We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>

The accept field is a list containing the types of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denotes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin visible only to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the [http://en.wikipedia.org/wiki/MIME|MIME] type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8. UTF-8[http://en.wikipedia.org/wiki/UTF-8] is a Unicode character encoding capable of expressing a large number of different characters and so is usually preferable to ASCII[http://en.wikipedia.org/wiki/ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>
We then return our plugin reference.
<pre>
return $self;
}
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints and User which defines users of the repository.

Besides an implicit reference to the plugin object, this method is also provided with a reference to an individual DataObj. It is called by several cgi and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[User:Tom/Export Plugins/Hello Lists|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ExportPluginsZip

2007-09-28T14:02:10Z

Tom: /* Zip.pm */

= Export Plugin Tutorial 5: Zip =
In this tutorial we'll look at packaging the results of a search into a Zip file. We'll create a directory for each eprint, and a sub-directory for each document belonging to that eprint. We'll also add an HTML index file to the archive to make it easier to navigate.

To prepare for this tutorial you should install the [http://search.cpan.org/~miyagawa/Archive-Any-Create-0.02/lib/Archive/Any/Create.pm Archive::Any::Create] module. The following command as root, or using sudo should work.

<pre>
cpan Archive::Any::Create
</pre>

= Zip.pm =
The code in the section below should be placed in a file called Zip.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::Zip;

@ISA = ('EPrints::Plugin::Export');

use strict;
use Archive::Any::Create;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Zip';
$self->{accept} = [ 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';

my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;

my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;

my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END

my $session = $plugin->{session};

foreach my $dataobj ($opts{list}->get_records)
{
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);

my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';

my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
my %files = $doc->files;

my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);

if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}

foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;

if (-d $file)
{
next;
}

my $data = '';
open (my $datafh ,'>', \$data);

open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;

$zip->add_file($filepath, $data);
}
$i++;
}
$index .= EPrints::XML::to_string($div);
}

$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);

if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = [ 'list/eprint' ];
</pre>

The file extension and MIME type are set to values appropriate for Zip files.
<pre>
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.

<pre>
my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}
</pre>

== List Handling ==
=== Setting Up ===
Here we setup an in-memory file for the Zip, and create an Archive object.
<pre>
my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;
</pre>

=== Navigation ===
Here we begin to setup the HTML file that we'll add to our archive for navigation. First we setup a header.
<pre>
my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END
</pre>

Now we get the Session object, we'll be using it to manipulate DOM objects later.
<pre>
my $session = $plugin->{session};
</pre>

=== Handling DataObjs ===
We loop over the DataObjs as we have done before.

This time we setup some DOM objects to be added to our index. Each eprint will have it's title printed out followed by a list of documents.
<pre>
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);
</pre>

We create a directory for each eprint. Note it is not necessary to explicitly create a directory, we simply have to set the appropriate file path. However this means that if you do not add files to a certain directory it will not be created, rather than having an empty directory for a given eprint.

<pre>
my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';
</pre>

==== Dealing With Documents ====
We then loop over all the documents belonging to each DataObj. The get_all_documents method returns an array of Document objects.
<pre>
my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
</pre>

Here we create a list item for the document containing a link to the main file.
<pre>
my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);
</pre>
If a description of the main file has been set we use that as the link text, otherwise we use the filename.
<pre>
if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}
</pre>

==== Dealing With Files ====
The files method of the Document object returns a hash whose keys are file names and values are file sizes.
<pre>
my %files = $doc->files;
</pre>

We loop over each file belonging to the document, in most cases there will only be one file.
<pre>
foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;
</pre>

We need to read the contents of the file and add it to a file in the zip. First we'll create another in-memory file to hold the contents.
<pre>
my $data = '';
open (my $datafh ,'>', \$data);
</pre>

We open our file and print it straight out to our in-memory file.
<pre>
open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;
</pre>

Then we add the file data to our file.
<pre>
$zip->add_file($filepath, $data);
</pre>

Finally we add the DOM object for our eprint to the index.
<pre>
$index .= EPrints::XML::to_string($div);
</pre>

=== Finishing Off ===
After finishing off our index file we add it to the zip file.
<pre>
$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);
</pre>

If a file handle has been provided we write to it, otherwise we write to the scalar file handle created earlier. We then return in the usual fashion.
<pre>
if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expzipv2.png]]

The accompanying HTML index.

[[Image:Expzip2.png]]

Contribute: Plugins/ExportPluginsExcel

2007-09-28T13:48:12Z

Tom: /* Excel.pm */

= Export Plugin Tutorial 4: Excel =

In this tutorial and the next one we'll look at exporting files in non-text formats. Here we will explore exporting metadata in Excel format (pre Microsoft Office 2007 which uses an XML based format).

To prepare for this tutorial you should install the [http://search.cpan.org/dist/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm Spreadsheet::Excel] module. The following command as root, or using sudo should work.

<pre>
cpan Spreadsheet::Excel
</pre>

= Excel.pm =
The code in the section below should be placed in a file called Excel.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::Excel;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;
my $self = $class->SUPER::new(%opts);

$self->{name} = 'Excel';
$self->{accept} = ['list/eprint'];
$self->{visible} = 'all';
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';

my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}

my $worksheet = $workbook->add_worksheet();

my $i = 0;
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;

foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}

$i = 1;
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
if ($dataobj->exists_and_set($field->get_name))
{
if ($field->get_property('multiple'))
{
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
}
else {
$worksheet->write($i, $j, $dataobj->get_value($field->get_name));
}
}
$j++;
}
$i++;
}

$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = ['list/eprint'];
</pre>

The file extension and MIME type are set to values appropriate for Excel files.
<pre>
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.
<pre>
my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}
</pre>
== List Handling ==
=== Setting Up a Workbook ===
Here we create a new Excel workbook. We start by creating a file handle using a scalar rather than a filename, this creates an in-memory file. Then depending on if a file handle has been supplied or not we create a workbook object that will be written to that file handle or the scalar file handle we just created.

<pre>
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
</pre>

=== Handling DataObjs ===
To start adding data to the Excel file we have to create a worksheet.
<pre>
my $worksheet = $workbook->add_worksheet();
</pre>

To the first row of our worksheet we add the names of all the metadata fields that can be associated with eprints. We get the session associated with the plugin, and then the repository associated with that session. We then get the DataSet "archive" from that repository and call the get_fields method. That method returns an array of MetaField objects.
<pre>
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;
</pre>

Here we loop over each field and write it's name to our worksheet.
<pre>
foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}
</pre>

We now loop over each DataObj in our list, and over each MetaField we found earlier.
<pre>
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
</pre>

We only write something to the worksheet if the field can apply to the DataObj and is set. Scalar values are simply written to the worksheet.
<pre>
if ($dataobj->exists_and_set($field->get_name))
</pre>

The plugin handles fields which can take multiple values in a number of ways.
<pre>
if ($field->get_property('multiple'))
</pre>
Names are handled specially and are formatted with the family name followed by a comma, the given name or initial and then a semi-colon.
<pre>
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
</pre>
Fields which have a compound type are not handled, and 'COMPOUND' is written to the worksheet.
<pre>
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
</pre>
For most multiple fields each value is taken and concatenated, separated by semi-colons.
<pre>
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
</pre>

=== Finishing Up ===

We first need to close the workbook to ensure that the data is written to the file handles, and then we can return in the usual fashion.
<pre>
$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expexcel.png]]

Contribute: Plugins/ExportPluginsHTML

2007-09-28T13:46:47Z

Tom: /* HTML.pm */

= Export Plugin Tutorial 3: HTML =

In this tutorial we'll look at creating an export plugin with slightly more complex output than unformatted plain text. Although the plugin below produces XHTML the same principles apply to producing any XML document.

= HTML.pm =
The code in the section below should be placed in a file called HTML.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HTML;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, HTML';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}
my $footer = '</body></html>';
if (defined $opts{fh})
{
print {$opts{fh}} $footer;
}
else
{
push @{$r}, $footer;
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

sub xml_dataobj
{
my ($plugin, $dataobj) = @_;

my $session = $plugin->{session};

my $div = $session->make_element('div');

my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);

my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);

return $div;
}

1;

</pre>

= In More Detail =
== Constructor ==
Here we change the file extension to '.htm' and change the MIME type to "text/html". For general XML documents you should change the file extension to '.xml' and the MIME type to 'text/xml';

<pre>
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';
</pre>

== output_dataobj ==
Here the output_dataobj method does very little. It calls the xml_dataobj method to obtain a DOM object which is converted to plain text before being returned. The xml_dataobj method is described later.

<pre>
my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
</pre>

== output_list ==
The only changes to the output_list method are the additions of a header and a footer.

<pre>
my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

my $footer = '</body></html>';
</pre>

== xml_dataobj ==
This method has a similar signature to the output_dataobj method: it takes an implicit reference to the plugin object and a reference to a DataObj, however instead of returning a string it returns a DOM object.

Before we can start creating and manipulating DOM objects we need to get a reference to the Session object from the plugin.

<pre>
my $session = $plugin->{session};
</pre>

We then create elements using the make_element method of our session object. The first argument to the make_element method is the tag name of the element we want to create, for example 'a' or 'div'. The second parameter is a hash of attributes and values associated with the element.
<pre>
my $div = $session->make_element('div');
</pre>

Here we add a second-level header. The make_text method is used to create text nodes. The appendChild method is used to add child nodes.

<pre>
my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);
</pre>

A similar pattern is followed to add a paragraph containing the eprint's abstract.

<pre>
my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);
</pre>

Finally we return a DOM object representing a 'div' element containing our header and paragraph.

<pre>
return $div;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Exphtml.png]]

Contribute: Plugins/ExportPluginsHTML

2007-09-28T13:46:06Z

Tom: /* HelloHTML.pm */

= Export Plugin Tutorial 3: HTML =

In this tutorial we'll look at creating an export plugin with slightly more complex output than unformatted plain text. Although the plugin below produces XHTML the same principles apply to producing any XML document.

= HTML.pm =
The code in the section below should be placed in a file called HelloHTML.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HTML;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, HTML!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}
my $footer = '</body></html>';
if (defined $opts{fh})
{
print {$opts{fh}} $footer;
}
else
{
push @{$r}, $footer;
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

sub xml_dataobj
{
my ($plugin, $dataobj) = @_;

my $session = $plugin->{session};

my $div = $session->make_element('div');

my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);

my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);

return $div;
}

1;

</pre>

= In More Detail =
== Constructor ==
Here we change the file extension to '.htm' and change the MIME type to "text/html". For general XML documents you should change the file extension to '.xml' and the MIME type to 'text/xml';

<pre>
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';
</pre>

== output_dataobj ==
Here the output_dataobj method does very little. It calls the xml_dataobj method to obtain a DOM object which is converted to plain text before being returned. The xml_dataobj method is described later.

<pre>
my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
</pre>

== output_list ==
The only changes to the output_list method are the additions of a header and a footer.

<pre>
my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

my $footer = '</body></html>';
</pre>

== xml_dataobj ==
This method has a similar signature to the output_dataobj method: it takes an implicit reference to the plugin object and a reference to a DataObj, however instead of returning a string it returns a DOM object.

Before we can start creating and manipulating DOM objects we need to get a reference to the Session object from the plugin.

<pre>
my $session = $plugin->{session};
</pre>

We then create elements using the make_element method of our session object. The first argument to the make_element method is the tag name of the element we want to create, for example 'a' or 'div'. The second parameter is a hash of attributes and values associated with the element.
<pre>
my $div = $session->make_element('div');
</pre>

Here we add a second-level header. The make_text method is used to create text nodes. The appendChild method is used to add child nodes.

<pre>
my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);
</pre>

A similar pattern is followed to add a paragraph containing the eprint's abstract.

<pre>
my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);
</pre>

Finally we return a DOM object representing a 'div' element containing our header and paragraph.

<pre>
return $div;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Exphtml.png]]

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T12:51:32Z

Tom: /* Before You Start */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

Create a directory for your export plugins in the main export plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export). The directory used for these examples is called "MyPlugins"

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
The Constructor for our Plugin. After the implicit class reference, a hash of options is given.
<pre>
sub new
{
my ($class, %opts) = @_;
</pre>
We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>
The accept field is set to an array containing the type of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denoes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin only visible to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the MIME type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8. UTF-8[http://en.wikipedia.org/wiki/UTF-8] is a Unicode character encoding capable of expressing a large number of different characters and so is usually preferable to ASCII[http://en.wikipedia.org/wiki/ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>
We then return our plugin reference.
<pre>
return $self;
}
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints and User which defines users of the repository.

Besides an implicit reference to the plugin object, this method is also provided with a reference to an individual DataObj. It is called by several cgi and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[User:Tom/Export Plugins/Hello Lists|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ExportPluginsHelloOld

2007-09-28T12:49:59Z

Tom: /* Export Plugin Tutorial 1: "Hello, World!" */

= Export Plugin Tutorial 1: "Hello, World!" =

In this tutorial you will learn how to create a simple export plugin for EPrints, which will generate a list of titles from the results of a search. A basic knowledge of Perl is needed, but the code will be explained fully.

= Before You Start =

It is sensible to separate the plugins you create for EPrints from those included with it. Create a directory for your export plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/Export) for example /opt/eprints3/perl_lib/EPrints/Plugin/Export/MyPlugins.

= HelloExport.pm =

Replace MyPlugins with the name of the directory you have decided to put your export plugins in and place the code below in a file called HelloExport.pm in that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloExport;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, World!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_value('title')."\n";
}

1;

</pre>

== In More Detail ==
== Housekeeping ==
<pre>
package EPrints::Plugin::Export::Foo::HelloExport;
</pre>
Export plugins need to inherit from the EPrints::Plugin::Export class.
<pre>
@ISA = ('EPrints::Plugin::Export');
</pre>

== Constructor ==
The Constructor for our Plugin. After the implicit class reference, a hash of options is given.
<pre>
sub new
{
my ($class, %opts) = @_;
</pre>
We create a new export plugin by calling the Eprints::Plugin::Export constructor
<pre>
my $self = $class->SUPER::new(%opts);
</pre>
Now we set a number of fields to register our new plugin.

This is the name that will appear in the export dropdown menu. The name should therefore be short and descriptive.
<pre>
$self->{name} = 'Hello, World!';
</pre>
The accept field is set to an array containing the type of objects this
plugin can deal with. In this case lists of eprints and individual
eprints.
<pre>
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
</pre>
The visible field denoes the class of user which will be able to see the plugin.
For most export plugins the value 'all' will be required, allowing
all users to see and use the plugin. A value of 'staff' would
make the plugin only visible to repository staff.
<pre>
$self->{visible} = 'all';
</pre>
The suffix field contains the extension of files exported by the plugin.

The mimetype field defines the MIME type of the files exported by the plugin
You can also specify file encoding, for example 'text/plain; charset=utf-8' to specify plain text, encoded using UTF-8. UTF-8[http://en.wikipedia.org/wiki/UTF-8] is a Unicode character encoding capable of expressing a large number of different characters and so is usually preferable to ASCII[http://en.wikipedia.org/wiki/ASCII]
<pre>
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';
</pre>
We then return our plugin reference.
<pre>
return $self;
}
</pre>

== Processing DataObjs ==

This method handles the export of each DataObj. DataObjs make up most of the content of an EPrint repository. The three main types are EPrint defining individual eprints, Document defining collections of one or more files belonging to an EPrints and User which defines users of the repository.

Besides an implicit reference to the plugin object, this method is also provided with a reference to an individual DataObj. It is called by several cgi and command line scripts to export single DataObjs, for instance the item control screen for repository staff. It is also called by the list handling method on each DataObj in a list, for example the results of a search. That will be explained in [[User:Tom/Export Plugins/Hello Lists|the next tutorial]].

In the example below we get the title of each DataObj, but there are large number of fields which you can extract from each DataObj. For example try changing "title" to "abstract" to print the abstract of each eprint.

<pre>
sub output_dataobj
{
my ($plugin, $dataobj) = @_;

# Return a scalar containing the title.
return $dataobj->get_value('title')."\n";
}
</pre>

== Finishing Off ==

Standard Perl package requirement.
<pre>
1;

</pre>

= Testing Your Plugin =
Restart your web server and perform a search.

If all is well your plugin should appear in the dropdown menu. Select it and click export. As long as the search provided some results, you should get a list of EPrint titles returned.

== Sample Output ==
[[Image:Exphello.png]]

Contribute: Plugins/ImportPluginsAWS

2007-09-28T10:27:43Z

Tom: /* Import Plugin Tutorial 2: Amazon Web Services */

= Import Plugin Tutorial 2: Amazon Web Services =

In the [[Contribute:_Plugins/ImportPluginsCSV|last tutorial]] we created an import plugin that took data which needed very little modification to import into the respository. The column names in the CSV file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of ASINs (Amazon Standard Identification Numbers).

We will be accessing the service using a REST approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using SOAP, but that will not be discussed here.

= Before You Start =

== Amazon Web Services ==
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

== Required Modules ==
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.

<pre>
cpan LWP::UserAgent
</pre>

= AWS.pm =
<pre>
package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
my( $class, %params ) = @_;
my $self = $class->SUPER::new( %params );

$self->{name} = 'AWS';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};

my @records = <$fh>;
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my ($plugin, $input) = @_;
my %output = ();

$input =~ m/([0-9]+)/;
$input = $1;

my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";

my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);

my $dom = EPrints::XML::parse_xml_string($response->content);

my $rep =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}

#Get Item Object
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}

my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}

$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';

my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);

my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}

my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}

my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}

my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
}

1;
</pre>

= In More Detail =
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
<pre>
use URI::Escape;
</pre>

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the TLD we can use the US, Canadian, German or French servers.

The accesskey stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the service API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, giving a lot of information about the item.

<pre>
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';
</pre>

== Constructor ==
The constructor is similar to the one used for the CSV plugin, except this one will import individual eprints, given an ASIN.
<pre>
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
</pre>

Like we imported Text::CSV in the last tutorial, here we import LWP::UserAgent which will be used for making requests to the web service.
<pre>
my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}
</pre>

== Input ==
=== input_fh ===
This method is similar to the one used in the CSV plugin, but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.
<pre>
my @ids;
</pre>

Next we read all the lines in the supplied file handle into our records array.
<pre>
my $fh = $opts{fh};

my @records = <$fh>;
</pre>

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
<pre>
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}
</pre>

Then we return a List object of the items imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===

ASINs are strings of decimal digits which may have leading zeroes which identify a product. Here we remove any non-numerical characters which are surrounding the ASIN.
<pre>
$input =~ m/([0-9]+)/;
$input = $1;
</pre>

We form the request from the variables we created earlier and the ASIN we have just obtained.
<pre>
my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";
</pre>

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
<pre>
my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);
</pre>

We then create a DOM object from the XML document returned.
<pre>
my $dom = EPrints::XML::parse_xml_string($response->content);
</pre>

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.
<pre>
my $rep =
$dom->getElementsByTagName("Items")->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}
</pre>

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
<pre>
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}
</pre>

Each item contains an ItemAttributes element which contains most of the metadata about an item.
<pre>
my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
</pre>

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
<pre>
my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}
</pre>

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
<pre>
$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';
</pre>

We get and set the title.
<pre>
my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);
</pre>

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
<pre>
my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
</pre>

We can set the ISBN. Note that the ISBN is often the same as the ASIN.
<pre>
my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}
</pre>

We can set the number of pages.
<pre>
my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}
</pre>

We can set the publisher.
<pre>
my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}
</pre>

We can set the publication date and finally return our output hash.
<pre>
my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
</pre>

= Testing Your Plugin =
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs.
Type this into the "Cut and Paste Records" box:
<pre>
0946719616
0297843877
</pre>

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

Contribute: Plugins/ExportPluginsZip

2007-09-28T10:26:37Z

Tom: /* Testing Your Plugin */

= Export Plugin Tutorial 5: Zip =
In this tutorial we'll look at packaging the results of a search into a Zip file. We'll create a directory for each eprint, and a sub-directory for each document belonging to that eprint. We'll also add an HTML index file to the archive to make it easier to navigate.

To prepare for this tutorial you should install the [http://search.cpan.org/~miyagawa/Archive-Any-Create-0.02/lib/Archive/Any/Create.pm Archive::Any::Create] module. The following command as root, or using sudo should work.

<pre>
cpan Archive::Any::Create
</pre>

= Zip.pm =

<pre>
package EPrints::Plugin::Export::MyPlugins::Zip;

@ISA = ('EPrints::Plugin::Export');

use strict;
use Archive::Any::Create;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Zip';
$self->{accept} = [ 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';

my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;

my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;

my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END

my $session = $plugin->{session};

foreach my $dataobj ($opts{list}->get_records)
{
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);

my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';

my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
my %files = $doc->files;

my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);

if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}

foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;

if (-d $file)
{
next;
}

my $data = '';
open (my $datafh ,'>', \$data);

open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;

$zip->add_file($filepath, $data);
}
$i++;
}
$index .= EPrints::XML::to_string($div);
}

$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);

if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = [ 'list/eprint' ];
</pre>

The file extension and MIME type are set to values appropriate for Zip files.
<pre>
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.

<pre>
my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}
</pre>

== List Handling ==
=== Setting Up ===
Here we setup an in-memory file for the Zip, and create an Archive object.
<pre>
my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;
</pre>

=== Navigation ===
Here we begin to setup the HTML file that we'll add to our archive for navigation. First we setup a header.
<pre>
my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END
</pre>

Now we get the Session object, we'll be using it to manipulate DOM objects later.
<pre>
my $session = $plugin->{session};
</pre>

=== Handling DataObjs ===
We loop over the DataObjs as we have done before.

This time we setup some DOM objects to be added to our index. Each eprint will have it's title printed out followed by a list of documents.
<pre>
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);
</pre>

We create a directory for each eprint. Note it is not necessary to explicitly create a directory, we simply have to set the appropriate file path. However this means that if you do not add files to a certain directory it will not be created, rather than having an empty directory for a given eprint.

<pre>
my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';
</pre>

==== Dealing With Documents ====
We then loop over all the documents belonging to each DataObj. The get_all_documents method returns an array of Document objects.
<pre>
my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
</pre>

Here we create a list item for the document containing a link to the main file.
<pre>
my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);
</pre>
If a description of the main file has been set we use that as the link text, otherwise we use the filename.
<pre>
if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}
</pre>

==== Dealing With Files ====
The files method of the Document object returns a hash whose keys are file names and values are file sizes.
<pre>
my %files = $doc->files;
</pre>

We loop over each file belonging to the document, in most cases there will only be one file.
<pre>
foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;
</pre>

We need to read the contents of the file and add it to a file in the zip. First we'll create another in-memory file to hold the contents.
<pre>
my $data = '';
open (my $datafh ,'>', \$data);
</pre>

We open our file and print it straight out to our in-memory file.
<pre>
open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;
</pre>

Then we add the file data to our file.
<pre>
$zip->add_file($filepath, $data);
</pre>

Finally we add the DOM object for our eprint to the index.
<pre>
$index .= EPrints::XML::to_string($div);
</pre>

=== Finishing Off ===
After finishing off our index file we add it to the zip file.
<pre>
$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);
</pre>

If a file handle has been provided we write to it, otherwise we write to the scalar file handle created earlier. We then return in the usual fashion.
<pre>
if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expzipv2.png]]

The accompanying HTML index.

[[Image:Expzip2.png]]

Contribute: Plugins/ExportPluginsExcel

2007-09-28T10:25:51Z

Tom: /* Testing Your Plugin */

= Export Plugin Tutorial 4: Excel =

In this tutorial and the next one we'll look at exporting files in non-text formats. Here we will explore exporting metadata in Excel format (pre Microsoft Office 2007 which uses an XML based format).

To prepare for this tutorial you should install the [http://search.cpan.org/dist/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm Spreadsheet::Excel] module. The following command as root, or using sudo should work.

<pre>
cpan Spreadsheet::Excel
</pre>

= Excel.pm =

<pre>
package EPrints::Plugin::Export::MyPlugins::Excel;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;
my $self = $class->SUPER::new(%opts);

$self->{name} = 'Excel';
$self->{accept} = ['list/eprint'];
$self->{visible} = 'all';
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';

my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}

my $worksheet = $workbook->add_worksheet();

my $i = 0;
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;

foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}

$i = 1;
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
if ($dataobj->exists_and_set($field->get_name))
{
if ($field->get_property('multiple'))
{
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
}
else {
$worksheet->write($i, $j, $dataobj->get_value($field->get_name));
}
}
$j++;
}
$i++;
}

$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = ['list/eprint'];
</pre>

The file extension and MIME type are set to values appropriate for Excel files.
<pre>
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.
<pre>
my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}
</pre>
== List Handling ==
=== Setting Up a Workbook ===
Here we create a new Excel workbook. We start by creating a file handle using a scalar rather than a filename, this creates an in-memory file. Then depending on if a file handle has been supplied or not we create a workbook object that will be written to that file handle or the scalar file handle we just created.

<pre>
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
</pre>

=== Handling DataObjs ===
To start adding data to the Excel file we have to create a worksheet.
<pre>
my $worksheet = $workbook->add_worksheet();
</pre>

To the first row of our worksheet we add the names of all the metadata fields that can be associated with eprints. We get the session associated with the plugin, and then the repository associated with that session. We then get the DataSet "archive" from that repository and call the get_fields method. That method returns an array of MetaField objects.
<pre>
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;
</pre>

Here we loop over each field and write it's name to our worksheet.
<pre>
foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}
</pre>

We now loop over each DataObj in our list, and over each MetaField we found earlier.
<pre>
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
</pre>

We only write something to the worksheet if the field can apply to the DataObj and is set. Scalar values are simply written to the worksheet.
<pre>
if ($dataobj->exists_and_set($field->get_name))
</pre>

The plugin handles fields which can take multiple values in a number of ways.
<pre>
if ($field->get_property('multiple'))
</pre>
Names are handled specially and are formatted with the family name followed by a comma, the given name or initial and then a semi-colon.
<pre>
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
</pre>
Fields which have a compound type are not handled, and 'COMPOUND' is written to the worksheet.
<pre>
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
</pre>
For most multiple fields each value is taken and concatenated, separated by semi-colons.
<pre>
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
</pre>

=== Finishing Up ===

We first need to close the workbook to ensure that the data is written to the file handles, and then we can return in the usual fashion.
<pre>
$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Expexcel.png]]

Contribute: Plugins/ExportPluginsHTML

2007-09-28T10:25:21Z

Tom: /* Testing Your Plugin */

= Export Plugin Tutorial 3: HTML =

In this tutorial we'll look at creating an export plugin with slightly more complex output than unformatted plain text. Although the plugin below produces XHTML the same principles apply to producing any XML document.

= HelloHTML.pm =

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloHTML;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, HTML!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}
my $footer = '</body></html>';
if (defined $opts{fh})
{
print {$opts{fh}} $footer;
}
else
{
push @{$r}, $footer;
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

sub xml_dataobj
{
my ($plugin, $dataobj) = @_;

my $session = $plugin->{session};

my $div = $session->make_element('div');

my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);

my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);

return $div;
}

1;

</pre>

= In More Detail =
== Constructor ==
Here we change the file extension to '.htm' and change the MIME type to "text/html". For general XML documents you should change the file extension to '.xml' and the MIME type to 'text/xml';

<pre>
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';
</pre>

== output_dataobj ==
Here the output_dataobj method does very little. It calls the xml_dataobj method to obtain a DOM object which is converted to plain text before being returned. The xml_dataobj method is described later.

<pre>
my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
</pre>

== output_list ==
The only changes to the output_list method are the additions of a header and a footer.

<pre>
my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

my $footer = '</body></html>';
</pre>

== xml_dataobj ==
This method has a similar signature to the output_dataobj method: it takes an implicit reference to the plugin object and a reference to a DataObj, however instead of returning a string it returns a DOM object.

Before we can start creating and manipulating DOM objects we need to get a reference to the Session object from the plugin.

<pre>
my $session = $plugin->{session};
</pre>

We then create elements using the make_element method of our session object. The first argument to the make_element method is the tag name of the element we want to create, for example 'a' or 'div'. The second parameter is a hash of attributes and values associated with the element.
<pre>
my $div = $session->make_element('div');
</pre>

Here we add a second-level header. The make_text method is used to create text nodes. The appendChild method is used to add child nodes.

<pre>
my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);
</pre>

A similar pattern is followed to add a paragraph containing the eprint's abstract.

<pre>
my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);
</pre>

Finally we return a DOM object representing a 'div' element containing our header and paragraph.

<pre>
return $div;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as [[Contribute:_Plugins/ExportPluginsHello| before]].

== Sample Output ==
[[Image:Exphtml.png]]

Contribute: Plugins/ExportPluginsList

2007-09-28T10:24:28Z

Tom: /* Testing Your Plugin */

= Export Plugin Tutorial 2: List Handling =

In this tutorial you will learn to create a slightly more complex export plugin than the one created in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]] that changes the way lists of eprints are exported.

= HelloList.pm =
The code in the section below should be placed in a file called HelloList.pm in the directory created previously, and MyPlugins
should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, List!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_id()."\t".$dataobj->get_value('title')."\n";
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

1;
</pre>

= In More Detail =
The above code is very similar to the HelloExport.pm file in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]] so only the points where it deviates significantly from that file will be discussed below.

== Housekeeping ==
The package name has been changed to reflect the filename.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;
</pre>

== Constructor ==
Make sure you give each plugin a unique name.

<pre>
$self->{name} = 'Hello, List!';
</pre>

== Dealing With Lists ==
In this example the output_list method is overridden in the export plugin to provide column headers. The original method just concatenates the output from the output_dataobj subroutine called on every DataObj in the list.

Note that the method is not provided with a bare array of DataObjs, but a List object is provided within the opts hash. To get an array of DataObjs to loop over you must then call that List object's get_record method.

== Filehandles ==
The command line export tool provides the output list method with a filehandle for output in the opts hash, while the cgi export uses a value returned from the method. You must deal with the filehandle or you will get no output from the command line tool. In most cases this won't matter, but it is good practice to deal with it.

The best way to handle output is to check if a filehandle has been provided every time something needs to be output. If a filehandle is provided we print to it, otherwise we save the output for later. At the end of the method we either return undef if a filehandle was provided or we return the saved output otherwise.

<pre>
sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}
</pre>

= Testing Your Plugin =

Restart your webserver and test the plugin as in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]].

== Sample Output ==
[[Image:Explist.png]]

Contribute: Plugins/ExportPluginsList

2007-09-28T10:24:10Z

Tom: /* In More Detail */

= Export Plugin Tutorial 2: List Handling =

In this tutorial you will learn to create a slightly more complex export plugin than the one created in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]] that changes the way lists of eprints are exported.

= HelloList.pm =
The code in the section below should be placed in a file called HelloList.pm in the directory created previously, and MyPlugins
should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, List!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_id()."\t".$dataobj->get_value('title')."\n";
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

1;
</pre>

= In More Detail =
The above code is very similar to the HelloExport.pm file in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]] so only the points where it deviates significantly from that file will be discussed below.

== Housekeeping ==
The package name has been changed to reflect the filename.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;
</pre>

== Constructor ==
Make sure you give each plugin a unique name.

<pre>
$self->{name} = 'Hello, List!';
</pre>

== Dealing With Lists ==
In this example the output_list method is overridden in the export plugin to provide column headers. The original method just concatenates the output from the output_dataobj subroutine called on every DataObj in the list.

Note that the method is not provided with a bare array of DataObjs, but a List object is provided within the opts hash. To get an array of DataObjs to loop over you must then call that List object's get_record method.

== Filehandles ==
The command line export tool provides the output list method with a filehandle for output in the opts hash, while the cgi export uses a value returned from the method. You must deal with the filehandle or you will get no output from the command line tool. In most cases this won't matter, but it is good practice to deal with it.

The best way to handle output is to check if a filehandle has been provided every time something needs to be output. If a filehandle is provided we print to it, otherwise we save the output for later. At the end of the method we either return undef if a filehandle was provided or we return the saved output otherwise.

<pre>
sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}
</pre>

= Testing Your Plugin =

Restart your webserver and test the plugin as in [[User:Tom/Export_Plugins/Hello_World| the previous tutorial]].

== Sample Output ==
[[Image:Explist.png]]

Contribute: Plugins/ExportPluginsList

2007-09-28T10:23:29Z

Tom: /* Export Plugin Tutorial 2: List Handling */

= Export Plugin Tutorial 2: List Handling =

In this tutorial you will learn to create a slightly more complex export plugin than the one created in [[Contribute:_Plugins/ExportPluginsHello| the previous tutorial]] that changes the way lists of eprints are exported.

= HelloList.pm =
The code in the section below should be placed in a file called HelloList.pm in the directory created previously, and MyPlugins
should be changed to the name of that directory.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, List!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.txt';
$self->{mimetype} = 'text/plain; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

return $dataobj->get_id()."\t".$dataobj->get_value('title')."\n";
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

1;
</pre>

= In More Detail =
The above code is very similar to the HelloExport.pm file in [[User:Tom/Export_Plugins/Hello_World| the previous tutorial]] so only the points where it deviates significantly from that file will be discussed below.

== Housekeeping ==
The package name has been changed to reflect the filename.

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloList;
</pre>

== Constructor ==
Make sure you give each plugin a unique name.

<pre>
$self->{name} = 'Hello, List!';
</pre>

== Dealing With Lists ==
In this example the output_list method is overridden in the export plugin to provide column headers. The original method just concatenates the output from the output_dataobj subroutine called on every DataObj in the list.

Note that the method is not provided with a bare array of DataObjs, but a List object is provided within the opts hash. To get an array of DataObjs to loop over you must then call that List object's get_record method.

== Filehandles ==
The command line export tool provides the output list method with a filehandle for output in the opts hash, while the cgi export uses a value returned from the method. You must deal with the filehandle or you will get no output from the command line tool. In most cases this won't matter, but it is good practice to deal with it.

The best way to handle output is to check if a filehandle has been provided every time something needs to be output. If a filehandle is provided we print to it, otherwise we save the output for later. At the end of the method we either return undef if a filehandle was provided or we return the saved output otherwise.

<pre>
sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = "ID\tTitle\n\n";
if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}
</pre>

= Testing Your Plugin =

Restart your webserver and test the plugin as in [[User:Tom/Export_Plugins/Hello_World| the previous tutorial]].

== Sample Output ==
[[Image:Explist.png]]

User:Tom

2007-09-27T21:09:07Z

Tom: /* Works In Progress */

= Tom =

== Works In Progress ==
A place to store the things I'm working on, before they're ready for the wiki proper.

* [[User:Tom/Export_Plugins | Export Plugins]]
* [[User:Tom/Import_Plugins | Import Plugins]]

How to write plugins

2007-09-27T21:06:33Z

Tom: /* Import Plugin */

== Write a Plugin! ==

The plugin system for EPrints 3 has been developed to make it easy and share the most common extensions to the code without having to hack the core system (and causing yourself problems with upgrades etc.)

When the system loads, it automatically loads all modules in the perl_lib/EPrints/Plugin/ directory, so for simple plugins you just drop them in that directory and you're done!

When a plugin is loaded it has a registration method which is called which tells the core EPrints system what this plugin does. EPrints then makes it available as appropriate.

Clever plugins can detect features they need and adapt to use the tools available, or disable themselves if they are missing required tools (rather than crash the system). Some specialised plugins are disabled in their default state and must be enabled in the repository configuration.

Another cool thing is that plugins are Perl Objects, which means you can subclass them. Here's an real-world example: We have a research group which uses BibTeX but over the years standardised within the group on an extra field. This is not a valid bibtex field, but are essential to their working because they have ancient and essential scripts which depend on it. To handle this we can subclass the default BibTeX Export plugin and override a single method (the data mapping one). We then just call the original parent plugins mapping method to do all the heavy lifting, then just add our non-standard extra field. Total code required: less than one screen. Number of happy researchers: none (they are never satisfied and will just demand the moon on the stick because you've already given them this nice new feature), but number of researchers able to get their work done: lots. Don't believe me? [[BibTeX Extension Example|look here]]!

* See also [[Extension Packages]] for how to add configuration and resource files to plugins.

=== Types of Plugin ===

There are a number of different kinds of plugin for EPrints...

==== Export Plugin ====

These are used to export the data in a variety of formats. A number of tutorials have been created to help you create your own export plugins:

* [[Contribute:_Plugins/ExportPluginsHello| Export Plugin Tutorial 1: "Hello, World!"]]
* [[Contribute:_Plugins/ExportPluginsList| Export Plugin Tutorial 2: List handling]]
* [[Contribute:_Plugins/ExportPluginsHTML| Export Plugin Tutorial 3: HTML]]
* [[Contribute:_Plugins/ExportPluginsExcel| Export Plugin Tutorial 4: Excel]]
* [[Contribute:_Plugins/ExportPluginsZip| Export Plugin Tutorial 5: Zip]]

==== Import Plugin ====

These are used to import data into a repository. They can take datafiles directly, or they can take an ID of a record that can be retrieved in a known way, or a URL of a file, or... whatever.

These are a bit trickier to write than export plugins as parsing data is harder than just "print"ing it, but they are still reasonably straight forward. To get you started a small selection of tutorials has been written:

* [[Contribute:_Plugins/ImportPluginsCSV| Import Plugin Tutorial 1: CSV]]
* [[Contribute:_Plugins/ImportPluginsAWS| Import Plugin Tutorial 2: Amazon Web Services]]

==== Screen Plugin ====

These handle (almost) all the user interface screens. Pages like "Review" and "Profile" are just built-in plugins. You can add your own very easily.

Examples you could create...
* Birds Eye View - a view of various statistics on the database, all in one page.
* Spellchecking Tab - an additional tab in the item control page which checks the spelling on certain fields.
* Bulk Delete tool - a tool which takes a list of eprintid's and deletes them all in a fell swoop.

Look at the existing Screen Plugins for an idea of how they work. They can be very simple.

==== Input Component Plugin ====

These handle how the workflow components are rendered. Built in components include the default (one field) component, the multiple fields component, the upload component, the subject component (which does pretty things to a field of type "subject") and the XHTML component. You can add your own or sub-class existing ones.

===== Convert Plugin =====

These are used for two things, currently.

* Converting the full text of documents into utf-8 text for search indexing
* Converting images and pdfs into thumbnails and previews

Some examples you could create:

* RTF to utf-8 to allow rich text documents to be indexed.
* Powerpoint to Thumbnail to allow thumbnail and previews of powerpoint slides
* Video to Thumbnail/Preview to make a still preview of a video file.

Contribute: Plugins/ImportPluginsAWS

2007-09-27T20:59:54Z

Tom: User:Tom/Import Plugins/Web Services moved to Contribute: Plugins/ImportPluginsAWS

= Import Plugin Tutorial 2: Amazon Web Services =

In the last tutorial we created an import plugin that took data which needed very little modification to import into the respository. The column names in the CSV file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of ASINs (Amazon Standard Identification Numbers).

We will be accessing the service using a REST approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using SOAP, but that will not be discussed here.

= Before You Start =

== Amazon Web Services ==
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

== Required Modules ==
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.

<pre>
cpan LWP::UserAgent
</pre>

= AWS.pm =
<pre>
package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
my( $class, %params ) = @_;
my $self = $class->SUPER::new( %params );

$self->{name} = 'AWS';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};

my @records = <$fh>;
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my ($plugin, $input) = @_;
my %output = ();

$input =~ m/([0-9]+)/;
$input = $1;

my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";

my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);

my $dom = EPrints::XML::parse_xml_string($response->content);

my $rep =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}

#Get Item Object
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}

my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}

$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';

my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);

my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}

my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}

my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}

my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
}

1;
</pre>

= In More Detail =
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
<pre>
use URI::Escape;
</pre>

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the TLD we can use the US, Canadian, German or French servers.

The accesskey stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the service API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, giving a lot of information about the item.

<pre>
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';
</pre>

== Constructor ==
The constructor is similar to the one used for the CSV plugin, except this one will import individual eprints, given an ASIN.
<pre>
$self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
</pre>

Like we imported Text::CSV in the last tutorial, here we import LWP::UserAgent which will be used for making requests to the web service.
<pre>
my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module LWP::UserAgent';
}
</pre>

== Input ==
=== input_fh ===
This method is similar to the one used in the CSV plugin, but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.
<pre>
my @ids;
</pre>

Next we read all the lines in the supplied file handle into our records array.
<pre>
my $fh = $opts{fh};

my @records = <$fh>;
</pre>

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
<pre>
foreach my $input_data (@records)
{
my $epdata = $plugin->convert_input($input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}
</pre>

Then we return a List object of the items imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===

ASINs are strings of decimal digits which may have leading zeroes which identify a product. Here we remove any non-numerical characters which are surrounding the ASIN.
<pre>
$input =~ m/([0-9]+)/;
$input = $1;
</pre>

We form the request from the variables we created earlier and the ASIN we have just obtained.
<pre>
my $request =
"$endpoint?".
"Service=$service&".
"AWSAccessKeyId=$accesskey&".
"Operation=$operation&".
"ItemId=$input&".
"Version=$version&".
"ResponseGroup=$responsegroup";
</pre>

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
<pre>
my $ua = LWP::UserAgent->new;
$ua->timeout(30);
my $response = $ua->get($request);
</pre>

We then create a DOM object from the XML document returned.
<pre>
my $dom = EPrints::XML::parse_xml_string($response->content);
</pre>

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.
<pre>
my $rep =
$dom->getElementsByTagName("Items")->item(0)->
getElementsByTagName('Request')->item(0);

my $reptext =
EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

unless ($reptext eq 'True')
{
$plugin->error('Invalid AWS Request');
return undef;
}
</pre>

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
<pre>
my $item =
$dom->getElementsByTagName('Items')->item(0)->
getElementsByTagName('Item')->item(0);

unless (defined $item)
{
$plugin->error('No Item element found');
return undef;
}
</pre>

Each item contains an ItemAttributes element which contains most of the metadata about an item.
<pre>
my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
</pre>

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
<pre>
my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

unless ($pg eq 'Book')
{
$plugin->error('Product is not a book.');
return undef;
}
</pre>

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
<pre>
$output{type} = 'book';
$output{refereed} = 'FALSE';
$output{ispublished} = 'pub';
</pre>

We get and set the title.
<pre>
my $title = $attr->getElementsByTagName('Title')->item(0);
$output{title} = EPrints::Utils::tree_to_utf8($title);
</pre>

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
<pre>
my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
$output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
</pre>

We can set the ISBN. Note that the ISBN is often the same as the ASIN.
<pre>
my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
if (defined $isbn)
{
$output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
}
</pre>

We can set the number of pages.
<pre>
my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
if (defined $pages)
{
$output{pages} = EPrints::Utils::tree_to_utf8($pages);
}
</pre>

We can set the publisher.
<pre>
my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
if (defined $publisher)
{
$output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
}
</pre>

We can set the publication date and finally return our output hash.
<pre>
my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
if (defined $pubdate)
{
$output{date} = EPrints::Utils::tree_to_utf8($pubdate);
}

return \%output;
</pre>

= Testing Your Plugin =
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs.
Type this into the "Cut and Paste Records" box:
<pre>
0946719616
0297843877
</pre>

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

User:Tom/Import Plugins/Web Services

2007-09-27T20:59:54Z

Tom: User:Tom/Import Plugins/Web Services moved to Contribute: Plugins/ImportPluginsAWS

#redirect [[Contribute: Plugins/ImportPluginsAWS]]

Contribute: Plugins/ImportPluginsCSV

2007-09-27T20:59:32Z

Tom: User:Tom/Import Plugins/CSV moved to Contribute: Plugins/ImportPluginsCSV

= Import Plugin Tutorial 1: CSV =

In this tutorial we will look at creating a relatively simple plugin to import eprints into our repository by reading files containing comma separated variables. We won't be dealing with documents and files, but will be focusing on importing eprint metadata.

Import plugins are inherently more complicated than export plugins because of the error checking that must be done, however in this example error checking has been kept to a minimum to simplify the example. In a "real" plugin you should check that the appropriate metadata fields are set for a given type of eprint, and unfortunately there appears to be no quick way to do this.

= Before You Start =

It is sensible to separate the plugins you create for EPrints from those included with it. Create a directory for your import plugins in the main plugin directory (usually /opt/eprints3/perl_lib/EPrints/Plugin/import) for example /opt/eprints3/perl_lib/EPrints/Plugin/import/MyPlugins.

To prepare for this tutorial you should install the [http://search.cpan.org/~erangel/Text-CSV/CSV.pm Text::CSV] module. The following command as root, or using sudo should work.

<pre>
cpan Text::CSV
</pre>

= CSV.pm =
<pre>
package EPrints::Plugin::Import::MyPlugins::CSV;

use EPrints::Plugin::Import::TextFile;
use strict;

our @ISA = ('EPrints::Plugin::Import::TextFile');

sub new
{
my( $class, %params ) = @_;

my $self = $class->SUPER::new( %params );

$self->{name} = 'CSV';
$self->{visible} = 'all';
$self->{produce} = [ 'list/eprint' ];

my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}

return $self;
}

sub input_fh
{
my( $plugin, %opts ) = @_;
my @ids;
my $fh = $opts{fh};
my @records = <$fh>;
my $csv = Text::CSV->new();
my @fields;

if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);

my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;

my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
}

return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
}

sub convert_input
{
my $plugin = shift;
my @input = @{shift @_};
my $csv = Text::CSV->new();

my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}

my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}

my %output = ();

my $dataset = $plugin->{session}->{repository}->get_dataset('archive');

my $i = 0;
foreach my $field (@fields)
{
unless ($dataset->has_field($field))
{
$i++;
next;
}

my $metafield = $dataset->get_field($field);

if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);

if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
else
{
$output{$field} = \@values;
}
}
else
{
$output{$field} = $record[$i];
}
$i++;
}
return \%output;
}

1;

</pre>

= In More Detail =
== Modules ==
Here we import the superclass for our plugin.

<pre>
use EPrints::Plugin::Import::TextFile;
</pre>

== Inheritance ==

Our plugin will not inherit from the Import class directly, but from the TextFile subclass. This contains some extra file handling code that means we can ignore certain differences in text file formats. If you are creating an import plugin which imports non-text files you should subclass the EPrints::Plugin::Import class directly.

<pre>
our @ISA = ('EPrints::Plugin::Import::TextFile');
</pre>

== Constructor ==
For import plugins we must set a 'produce' property, to tell the repository what kinds of objects the plugin can import. This plugin only supports importing lists of eprints, but if it supported importing individual eprints we could add 'dataobj/eprint' to this property. We would then have to implement the "input_dataobj" method. Most plugins implement this method, but it is rarely used in practice. Most imports are done in lists (even if that list only contains one member), via the import items screen.

<pre>
$self->{produce} = [ 'list/eprint' ];
</pre>

Here we use a module that is not included with EPrints, Text::CSV, so we import it in a different way. First we check that it is installed, and load it if it is with "EPrints::Utils::require_if_exists".If it isn't we make the plugin invisible and produce an error message. It is good practice to import non-standard modules in this way rather than with "use".
<pre>
my $rc = EPrints::Utils::require_if_exists('Text::CSV');
unless( $rc )
{
$self->{visible} = '';
$self->{error} = 'Failed to load required module Text::CSV';
}
</pre>

== Input ==
Import plugins have to implement a couple of methods to read data from a file or string, manipulate it and turn it into a form which can be imported into the repository. That process will be described below.

=== input_fh ===

This method takes a filehandle, processes it, tries to import DataObjs in to the repository and then returns a List of the DataObjs imported.

This array will be used to create a List of DataObjs later.
<pre>
my @ids;
</pre>

Here we open the filehandle passed, and read the lines into an array.
<pre>
my $fh = $opts{fh};
my @records = <$fh>;
</pre>

We create a Text::CSV object to handle the input. Using a dedicated CSV handling package is preferable to using Perl's split function as it handles a number of more complicated scenarios such as commas within records using double quotes.
<pre>
my $csv = Text::CSV->new();
</pre>

After setting up an array for metadata field names, we attempt to parse the first line of our file. The parse method does not return an array of fields, but reports success or failure. In the event of success we use the fields method to return the last fields parsed. In the event of failure we use the error_input method to get the last error, and return undef.
<pre>
my @fields;
if ($csv->parse(shift @records))
{
@fields = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

Now that the row of column titles has been dealt with we move onto processing each record in the file.

In import plugins the convert_input method converts individual records into a format that can be imported into the repository. That is a hash whose keys are metadata field names and values are the corresponding values. As a row on its own cannot be imported as we don't know to which field each value belongs we have to construct an array to pass to convert_input first. We pass an array whose first element is the fields row and whose second element is the row we want to import.
<pre>
foreach my $row (@records)
{
my @input_data = (join(',',@fields),$row);
</pre>

Here we call convert_input on our constructed input_data. If the conversion fails we simply move to the next record.
<pre>
my $epdata = $plugin->convert_input(\@input_data);
next unless defined $epdata;
</pre>

The epdata_to_dataobj method takes our epdata hash reference and turns it into a new DataObj in our repository. If it is successful it returns the new DataObj, whose id we add to our array of ids.
<pre>
my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
if( defined $dataobj )
{
push @ids, $dataobj->get_id;
}
</pre>

Finally we return a List object containing the ids of the records we have successfully imported.
<pre>
return EPrints::List->new(
dataset => $opts{dataset},
session => $plugin->{session},
ids=>\@ids );
</pre>

=== convert_input ===
This method takes data in a particular format, in this case CSV and transforms it into a hash of metadata field names and values.

We take the second argument to the method and convert the array reference into an array.
<pre>
my @input = @{shift @_};
</pre>

Here we setup another Text::CSV object.
<pre>
my $csv = Text::CSV->new();
</pre>

We take the second element of our array and parse it. This is the record we wish to import. If anything goes wrong we return undef.
<pre>
my @record;
if ($csv->parse($input[1]))
{
@record = $csv->fields();
}
else
{
$plugin->error($csv->error_input);
return undef;
}
</pre>

We take the first element and get the field names. We then check that we have the same number of fields names as records.
<pre>
my @fields = split(',',$input[0]);

if (scalar @fields != scalar @record)
{
$plugin->warning('Row length mismatch');
return undef;
}
</pre>

This is the hash that we'll return later.
<pre>
my %output = ();
</pre>

For convenience we get the DataSet object.
<pre>
my $dataset = $plugin->{session}->{repository}->get_dataset('archive');
</pre>

We now iterate over the fields.
<pre>
my $i = 0;
foreach my $field (@fields)
{
</pre>

If the field does not exist we look at the next one, remembering to increment our index.
<pre>
unless ($dataset->has_field($field))
{
$i++;
next;
}
</pre>

We get the MetaField object corresponding to the current field.
<pre>
my $metafield = $dataset->get_field($field);
</pre>

We deal with multiple field types by separating individual values with a semi-colon.
<pre>
if ($metafield->get_property('multiple'))
{
my @values = split(';',$record[$i]);
</pre>

Name fields are dealt with by using regular expressions and constructing a hash from the parts matched. The plugin expects names to be of the form Surname, Forenames, Lineage (Sr, Jr, III etc).
<pre>
if ($metafield->{type} eq 'name')
{
my @names = ();

foreach my $value (@values)
{
my $name = $value;

next unless ($value =~ /^(.*?),(.*?)(,(.*?))?$/);
push @names, {family => $1,given => $2,lineage => $4};
}

$output{$field} = \@names;
}
</pre>

Multiple fields which are not names are just added to the hash as an array reference.
<pre>
$output{$field} = \@values;
</pre>

Non-multiple fields are just added to the hash from the array of fields.
<pre>
$output{$field} = $record[$i];
</pre>

Finally we return a hash reference.
<pre>
return \%output;
</pre>

= Testing Your Plugin =

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

Type this into the "Cut and Paste Records" box:

<pre>
title,abstract
This is a test title,This is a test abstract
This is another test title,This is another test abstract
</pre>

Select "CSV" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

== Embedding commas ==
If you want to include commas in your imports, which is very likely you must enclose the field in double quotations. For example:
<pre>
title,abstract
An interesting article,"Damn it Jim, I'm a Doctor, not a Perl hacker."
</pre>

When doing this make sure not to leave any whitespace between a quotation mark and a comma or the import will fail.

== Multiple fields ==
Multiple field types are handled by separating each individual value by a semi-colon, a simple example of this would be the subjects field.

Go back to the import items and proceed as before, but typing this into the "Cut and Paste Records" box:

<pre>
abstract,title,subjects
Testing,Testing,AI;C;M;F;P
Testing,Testing,AC;DC
</pre>

After the records have been imported examine each one on the View Item screen. You will find that a list of subjects are given, and that a more descriptive name is given than the code we imported.

== Compound fields ==

Compound fields are fields that have subfields within them, each with their own name. You don't compound fields in one go, but set the components individually. Subfields have names of the form mainfieldname_subfieldname.

One of the most commonly used compound fields is the "Creators" field. It has a names subfield "creators_name" and an ID subfield "creators_id" which is most often used for email addresses.

Here is an example of setting the creators field:
<pre>
title,creators_names,creators_ids,
Setting compound fields.,"Bloggs, Joe;Doe, John","joe@bloggs.com;john@doe.com"
</pre>

If you import this record and then examine the view items screen you will find that the "Creators" field has been setup with the values displayed in a table.

User:Tom/Import Plugins/CSV

2007-09-27T20:59:32Z

Tom: User:Tom/Import Plugins/CSV moved to Contribute: Plugins/ImportPluginsCSV

#redirect [[Contribute: Plugins/ImportPluginsCSV]]

How to write plugins

2007-09-27T20:58:13Z

Tom: /* Export Plugin */

== Write a Plugin! ==

The plugin system for EPrints 3 has been developed to make it easy and share the most common extensions to the code without having to hack the core system (and causing yourself problems with upgrades etc.)

When the system loads, it automatically loads all modules in the perl_lib/EPrints/Plugin/ directory, so for simple plugins you just drop them in that directory and you're done!

When a plugin is loaded it has a registration method which is called which tells the core EPrints system what this plugin does. EPrints then makes it available as appropriate.

Clever plugins can detect features they need and adapt to use the tools available, or disable themselves if they are missing required tools (rather than crash the system). Some specialised plugins are disabled in their default state and must be enabled in the repository configuration.

Another cool thing is that plugins are Perl Objects, which means you can subclass them. Here's an real-world example: We have a research group which uses BibTeX but over the years standardised within the group on an extra field. This is not a valid bibtex field, but are essential to their working because they have ancient and essential scripts which depend on it. To handle this we can subclass the default BibTeX Export plugin and override a single method (the data mapping one). We then just call the original parent plugins mapping method to do all the heavy lifting, then just add our non-standard extra field. Total code required: less than one screen. Number of happy researchers: none (they are never satisfied and will just demand the moon on the stick because you've already given them this nice new feature), but number of researchers able to get their work done: lots. Don't believe me? [[BibTeX Extension Example|look here]]!

* See also [[Extension Packages]] for how to add configuration and resource files to plugins.

=== Types of Plugin ===

There are a number of different kinds of plugin for EPrints...

==== Export Plugin ====

These are used to export the data in a variety of formats. A number of tutorials have been created to help you create your own export plugins:

* [[Contribute:_Plugins/ExportPluginsHello| Export Plugin Tutorial 1: "Hello, World!"]]
* [[Contribute:_Plugins/ExportPluginsList| Export Plugin Tutorial 2: List handling]]
* [[Contribute:_Plugins/ExportPluginsHTML| Export Plugin Tutorial 3: HTML]]
* [[Contribute:_Plugins/ExportPluginsExcel| Export Plugin Tutorial 4: Excel]]
* [[Contribute:_Plugins/ExportPluginsZip| Export Plugin Tutorial 5: Zip]]

==== Import Plugin ====

These are used to import data into a repository. They can take datafiles directly, or they can take an ID of a record that can be retrieved in a known way, or a URL of a file, or... whatever.

These are a bit trickier to write than export plugins as parsing data is harder than just "print"ing it, but they are still reasonably straight forward.

==== Screen Plugin ====

These handle (almost) all the user interface screens. Pages like "Review" and "Profile" are just built-in plugins. You can add your own very easily.

Examples you could create...
* Birds Eye View - a view of various statistics on the database, all in one page.
* Spellchecking Tab - an additional tab in the item control page which checks the spelling on certain fields.
* Bulk Delete tool - a tool which takes a list of eprintid's and deletes them all in a fell swoop.

Look at the existing Screen Plugins for an idea of how they work. They can be very simple.

==== Input Component Plugin ====

These handle how the workflow components are rendered. Built in components include the default (one field) component, the multiple fields component, the upload component, the subject component (which does pretty things to a field of type "subject") and the XHTML component. You can add your own or sub-class existing ones.

===== Convert Plugin =====

These are used for two things, currently.

* Converting the full text of documents into utf-8 text for search indexing
* Converting images and pdfs into thumbnails and previews

Some examples you could create:

* RTF to utf-8 to allow rich text documents to be indexed.
* Powerpoint to Thumbnail to allow thumbnail and previews of powerpoint slides
* Video to Thumbnail/Preview to make a still preview of a video file.

Contribute: Plugins/ExportPluginsZip

2007-09-27T20:57:16Z

Tom: User:Tom/Export Plugins/Zip moved to Contribute: Plugins/ExportPluginsZip

= Export Plugin Tutorial 5: Zip =
In this tutorial we'll look at packaging the results of a search into a Zip file. We'll create a directory for each eprint, and a sub-directory for each document belonging to that eprint. We'll also add an HTML index file to the archive to make it easier to navigate.

To prepare for this tutorial you should install the [http://search.cpan.org/~miyagawa/Archive-Any-Create-0.02/lib/Archive/Any/Create.pm Archive::Any::Create] module. The following command as root, or using sudo should work.

<pre>
cpan Archive::Any::Create
</pre>

= Zip.pm =

<pre>
package EPrints::Plugin::Export::MyPlugins::Zip;

@ISA = ('EPrints::Plugin::Export');

use strict;
use Archive::Any::Create;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Zip';
$self->{accept} = [ 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';

my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;

my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;

my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END

my $session = $plugin->{session};

foreach my $dataobj ($opts{list}->get_records)
{
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);

my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';

my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
my %files = $doc->files;

my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);

if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}

foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;

if (-d $file)
{
next;
}

my $data = '';
open (my $datafh ,'>', \$data);

open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;

$zip->add_file($filepath, $data);
}
$i++;
}
$index .= EPrints::XML::to_string($div);
}

$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);

if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = [ 'list/eprint' ];
</pre>

The file extension and MIME type are set to values appropriate for Zip files.
<pre>
$self->{suffix} = '.zip';
$self->{mimetype} = 'application/zip';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.

<pre>
my $rc = EPrints::Utils::require_if_exists('Archive::Any::Create');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Archive::Any::Create';
}
</pre>

== List Handling ==
=== Setting Up ===
Here we setup an in-memory file for the Zip, and create an Archive object.
<pre>
my $archive = '';
open (my $FH, '>', \$archive) or
die("Could not create filehandle: $!");
my $zip = Archive::Any::Create->new;
</pre>

=== Navigation ===
Here we begin to setup the HTML file that we'll add to our archive for navigation. First we setup a header.
<pre>
my $index = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>EPrints Search Results</title>
</head>
<body>
END
</pre>

Now we get the Session object, we'll be using it to manipulate DOM objects later.
<pre>
my $session = $plugin->{session};
</pre>

=== Handling DataObjs ===
We loop over the DataObjs as we have done before.

This time we setup some DOM objects to be added to our index. Each eprint will have it's title printed out followed by a list of documents.
<pre>
my $div = $session->make_element('div');
my $heading = $session->make_element('h2');
$heading->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($heading);

my $uldoc = $session->make_element('ul');
$div->appendChild($uldoc);
</pre>

We create a directory for each eprint. Note it is not necessary to explicitly create a directory, we simply have to set the appropriate file path. However this means that if you do not add files to a certain directory it will not be created, rather than having an empty directory for a given eprint.

<pre>
my $dirpath = 'eprints-search/'.$dataobj->get_id().'/';
</pre>

==== Dealing With Documents ====
We then loop over all the documents belonging to each DataObj. The get_all_documents method returns an array of Document objects.
<pre>
my $i = 1;
foreach my $doc ($dataobj->get_all_documents)
{
my $subdirpath = $dirpath."doc$i/";
</pre>

Here we create a list item for the document containing a link to the main file.
<pre>
my $lidoc = $session->make_element('li');
$uldoc->appendChild($lidoc);

my $adoc = $session->make_element('a', href=>$dataobj->get_id."/doc$i/".$doc->get_main);
$lidoc->appendChild($adoc);
</pre>
If a description of the main file has been set we use that as the link text, otherwise we use the filename.
<pre>
if ($doc->exists_and_set('formatdesc'))
{
$adoc->appendChild($session->make_text($doc->get_value('formatdesc')));
}
else
{
$adoc->appendChild($session->make_text($doc->get_main));
}
</pre>

==== Dealing With Files ====
The files method of the Document object returns a hash whose keys are file names and values are file sizes.
<pre>
my %files = $doc->files;
</pre>

We loop over each file belonging to the document, in most cases there will only be one file.
<pre>
foreach my $filename (sort keys %files)
{
my $filepath = $subdirpath.$filename;
my $file = $doc->local_path.'/'.$filename;
</pre>

We need to read the contents of the file and add it to a file in the zip. First we'll create another in-memory file to hold the contents.
<pre>
my $data = '';
open (my $datafh ,'>', \$data);
</pre>

We open our file and print it straight out to our in-memory file.
<pre>
open (INFH, "<$file") or die ("Could not open file $file");
while (<INFH>)
{
print {$datafh} $_;
}
close INFH;
</pre>

Then we add the file data to our file.
<pre>
$zip->add_file($filepath, $data);
</pre>

Finally we add the DOM object for our eprint to the index.
<pre>
$index .= EPrints::XML::to_string($div);
</pre>

=== Finishing Off ===
After finishing off our index file we add it to the zip file.
<pre>
$index .= '</body></html>';
$zip->add_file('eprints-search/index.htm',$index);
</pre>

If a file handle has been provided we write to it, otherwise we write to the scalar file handle created earlier. We then return in the usual fashion.
<pre>
if (defined $opts{fh})
{
$zip->write_filehandle($opts{fh},'zip');
return undef;
}
$zip->write_filehandle($FH,'zip');
return $archive;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as in [[User:Tom/Export_Plugins/Excel | the previous tutorial]].

== Sample Output ==
[[Image:Expzipv2.png]]

The accompanying HTML index.

[[Image:Expzip2.png]]

User:Tom/Export Plugins/Zip

2007-09-27T20:57:16Z

Tom: User:Tom/Export Plugins/Zip moved to Contribute: Plugins/ExportPluginsZip

#redirect [[Contribute: Plugins/ExportPluginsZip]]

Contribute: Plugins/ExportPluginsExcel

2007-09-27T20:56:05Z

Tom: User:Tom/Export Plugins/Excel moved to Contribute: Plugins/ExportPluginsExcel

= Export Plugin Tutorial 4: Excel =

In this tutorial and the next one we'll look at exporting files in non-text formats. Here we will explore exporting metadata in Excel format (pre Microsoft Office 2007 which uses an XML based format).

To prepare for this tutorial you should install the [http://search.cpan.org/dist/Spreadsheet-WriteExcel/lib/Spreadsheet/WriteExcel.pm Spreadsheet::Excel] module. The following command as root, or using sudo should work.

<pre>
cpan Spreadsheet::Excel
</pre>

= Excel.pm =

<pre>
package EPrints::Plugin::Export::MyPlugins::Excel;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;
my $self = $class->SUPER::new(%opts);

$self->{name} = 'Excel';
$self->{accept} = ['list/eprint'];
$self->{visible} = 'all';
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';

my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}

return $self;
}

sub output_list
{
my ($plugin, %opts) = @_;
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}

my $worksheet = $workbook->add_worksheet();

my $i = 0;
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;

foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}

$i = 1;
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
if ($dataobj->exists_and_set($field->get_name))
{
if ($field->get_property('multiple'))
{
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
}
else {
$worksheet->write($i, $j, $dataobj->get_value($field->get_name));
}
}
$j++;
}
$i++;
}

$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
}

1;

</pre>

= In More Detail =
== Constructor ==
For the sake of simplicity this plugin will only deal with lists of eprints. This avoids some code duplication, and it would be fairly easy to modify the plugin to deal with both individual eprints and lists of eprints sensibly.
<pre>
$self->{accept} = ['list/eprint'];
</pre>

The file extension and MIME type are set to values appropriate for Excel files.
<pre>
$self->{suffix} = '.xls';
$self->{mimetype} = 'application/vnd.ms-excel';
</pre>

We need to import a module that is not included with EPrints for creating zip files. We use the EPrints::Utils::require_if_exists function to check if the module exists, and load it if it does. We then check the value returned from that function, and make the plugin invisible if it failed.
<pre>
my $rc = EPrints::Utils::require_if_exists('Spreadsheet::WriteExcel');
unless ($rc)
{
$self->{visible} = '';
$self->{error} = 'Unable to load required module Spreadsheet::WriteExcel';
}
</pre>
== List Handling ==
=== Setting Up a Workbook ===
Here we create a new Excel workbook. We start by creating a file handle using a scalar rather than a filename, this creates an in-memory file. Then depending on if a file handle has been supplied or not we create a workbook object that will be written to that file handle or the scalar file handle we just created.

<pre>
my $workbook;

my $output;
open(my $FH,'>',\$output);

if (defined $opts{fh})
{
$workbook = Spreadsheet::WriteExcel->new(\*{$opts{fh}});
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
else
{
$workbook = Spreadsheet::WriteExcel->new($FH);
die("Unable to create spreadsheet: $!")unless defined $workbook;
}
</pre>

=== Handling DataObjs ===
To start adding data to the Excel file we have to create a worksheet.
<pre>
my $worksheet = $workbook->add_worksheet();
</pre>

To the first row of our worksheet we add the names of all the metadata fields that can be associated with eprints. We get the session associated with the plugin, and then the repository associated with that session. We then get the DataSet "archive" from that repository and call the get_fields method. That method returns an array of MetaField objects.
<pre>
my @fields =
$plugin->{session}->get_repository->get_dataset('archive')->get_fields;
</pre>

Here we loop over each field and write it's name to our worksheet.
<pre>
foreach my $field (@fields)
{
$worksheet->write(0, $i, $field->get_name);
$i++;
}
</pre>

We now loop over each DataObj in our list, and over each MetaField we found earlier.
<pre>
foreach my $dataobj ($opts{list}->get_records)
{
my $j = 0;
foreach my $field (@fields)
{
</pre>

We only write something to the worksheet if the field can apply to the DataObj and is set. Scalar values are simply written to the worksheet.
<pre>
if ($dataobj->exists_and_set($field->get_name))
</pre>

The plugin handles fields which can take multiple values in a number of ways.
<pre>
if ($field->get_property('multiple'))
</pre>
Names are handled specially and are formatted with the family name followed by a comma, the given name or initial and then a semi-colon.
<pre>
if ($field->{type} eq 'name')
{
my $namelist = '';
foreach my $name (@{$dataobj->get_value_raw($field->get_name)})
{
$namelist .= $name->{family} . ',' . $name->{given} . ';';
}
$worksheet->write($i, $j, $namelist);
}
</pre>
Fields which have a compound type are not handled, and 'COMPOUND' is written to the worksheet.
<pre>
elsif ($field->{type} eq 'compound')
{
$worksheet->write($i, $j, 'COMPOUND');
}
</pre>
For most multiple fields each value is taken and concatenated, separated by semi-colons.
<pre>
else
{
$worksheet->write($i, $j,
join(';',@{$dataobj->get_value($field->get_name)}));
}
</pre>

=== Finishing Up ===

We first need to close the workbook to ensure that the data is written to the file handles, and then we can return in the usual fashion.
<pre>
$workbook->close;

if (defined $opts{fh})
{
return undef;
}

return $output;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as in [[User:Tom/Export_Plugins/HTML | the previous tutorial]].

== Sample Output ==
[[Image:Expexcel.png]]

User:Tom/Export Plugins/Excel

2007-09-27T20:56:05Z

Tom: User:Tom/Export Plugins/Excel moved to Contribute: Plugins/ExportPluginsExcel

#redirect [[Contribute: Plugins/ExportPluginsExcel]]

Contribute: Plugins/ExportPluginsHTML

2007-09-27T20:55:43Z

Tom: User:Tom/Export Plugins/HTML moved to Contribute: Plugins/ExportPluginsHTML

= Export Plugin Tutorial 3: HTML =

In this tutorial we'll look at creating an export plugin with slightly more complex output than unformatted plain text. Although the plugin below produces XHTML the same principles apply to producing any XML document.

= HelloHTML.pm =

<pre>
package EPrints::Plugin::Export::MyPlugins::HelloHTML;

@ISA = ('EPrints::Plugin::Export');

use strict;

sub new
{
my ($class, %opts) = @_;

my $self = $class->SUPER::new(%opts);

$self->{name} = 'Hello, HTML!';
$self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
$self->{visible} = 'all';
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';

return $self;
}

sub output_dataobj
{
my ($plugin, $dataobj) = @_;

my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
}

sub output_list
{
my ($plugin, %opts) = @_;

my $r = [];

my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

if (defined $opts{fh})
{
print {$opts{fh}} $header;
}
else
{
push @{$r}, $header;
}

foreach my $dataobj ($opts{list}->get_records)
{
my $part = $plugin->output_dataobj($dataobj, %opts);
if (defined $opts{fh})
{
print {$opts{fh}} $part;
}
else
{
push @{$r}, $part;
}
}
my $footer = '</body></html>';
if (defined $opts{fh})
{
print {$opts{fh}} $footer;
}
else
{
push @{$r}, $footer;
}

if (defined $opts{fh})
{
return undef;
}
return join('', @{$r});
}

sub xml_dataobj
{
my ($plugin, $dataobj) = @_;

my $session = $plugin->{session};

my $div = $session->make_element('div');

my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);

my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);

return $div;
}

1;

</pre>

= In More Detail =
== Constructor ==
Here we change the file extension to '.htm' and change the MIME type to "text/html". For general XML documents you should change the file extension to '.xml' and the MIME type to 'text/xml';

<pre>
$self->{suffix} = '.htm';
$self->{mimetype} = 'text/html; charset=utf-8';
</pre>

== output_dataobj ==
Here the output_dataobj method does very little. It calls the xml_dataobj method to obtain a DOM object which is converted to plain text before being returned. The xml_dataobj method is described later.

<pre>
my $xml = $plugin->xml_dataobj($dataobj);

return EPrints::XML::to_string($xml);
</pre>

== output_list ==
The only changes to the output_list method are the additions of a header and a footer.

<pre>
my $header = <<END;
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>XHTML Export Plugin</title>
</head>
<body>
END

my $footer = '</body></html>';
</pre>

== xml_dataobj ==
This method has a similar signature to the output_dataobj method: it takes an implicit reference to the plugin object and a reference to a DataObj, however instead of returning a string it returns a DOM object.

Before we can start creating and manipulating DOM objects we need to get a reference to the Session object from the plugin.

<pre>
my $session = $plugin->{session};
</pre>

We then create elements using the make_element method of our session object. The first argument to the make_element method is the tag name of the element we want to create, for example 'a' or 'div'. The second parameter is a hash of attributes and values associated with the element.
<pre>
my $div = $session->make_element('div');
</pre>

Here we add a second-level header. The make_text method is used to create text nodes. The appendChild method is used to add child nodes.

<pre>
my $title = $session->make_element('h2');
$title->appendChild($session->make_text($dataobj->get_value('title')));
$div->appendChild($title);
</pre>

A similar pattern is followed to add a paragraph containing the eprint's abstract.

<pre>
my $abstract = $session->make_element('p');
$abstract->appendChild($session->make_text($dataobj->get_value('abstract')));
$div->appendChild($abstract);
</pre>

Finally we return a DOM object representing a 'div' element containing our header and paragraph.

<pre>
return $div;
</pre>

= Testing Your Plugin =
Restart your webserver and test the plugin as in [[User:Tom/Export_Plugins/Hello_Lists| the previous tutorial]].

== Sample Output ==
[[Image:Exphtml.png]]