Difference between revisions of "Contribute: Plugins/ImportPluginsAWS"

From EPrints Documentation
Jump to: navigation, search
(In More Detail)
m (Code formatting)
 
(28 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Category:Contribute]]
 +
[[Category:Plugins]]
 
= Import Plugin Tutorial 2: Amazon Web Services =
 
= Import Plugin Tutorial 2: Amazon Web Services =
 +
 +
In the [[Contribute:_Plugins/ImportPluginsCSV|last tutorial]] we created an import plugin that took data which needed very little modification to import into the respository. The column names in the [http://en.wikipedia.org/wiki/Comma-Separated_Values CSV] file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.
 +
 +
We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of [http://en.wikipedia.org/wiki/ASIN ASIN].
 +
 +
We will be accessing the service using a [http://en.wikipedia.org/wiki/REST REST] approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using [http://en.wikipedia.org/wiki/SOAP SOAP], but that will not be discussed here.
 +
 
= Before You Start =
 
= Before You Start =
 +
 +
== Amazon Web Services ==
 +
To use Amazon's web services you must first signup for an account [http://aws.amazon.com here]. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.
 +
 +
== Required Modules ==
 +
To prepare for this tutorial you should make sure the [http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm LWP::UserAgent] module is installed. The following command as root, or using sudo should work.
 +
 +
<pre>
 +
cpan LWP::UserAgent
 +
</pre>
 +
 
= AWS.pm =  
 
= AWS.pm =  
<pre>
+
The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.
 +
 
 +
<syntaxhighlight lang="perl">
 
package EPrints::Plugin::Import::MyPlugins::AWS;
 
package EPrints::Plugin::Import::MyPlugins::AWS;
  
Line 11: Line 33:
 
our @ISA = ('EPrints::Plugin::Import::TextFile');
 
our @ISA = ('EPrints::Plugin::Import::TextFile');
  
my $endpoint = "http://ecs.amazonaws.co.uk/onca/xml";
+
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
 
my $accesskey = '<YOURAMAZONWSKEY>';
 
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = "AWSECommerceService";
+
my $service = 'AWSECommerceService';
my $operation = "ItemLookup";
+
my $operation = 'ItemLookup';
my $version = "2007-07-16";
+
my $version = '2007-07-16';
 +
my $responsegroup = 'Large';
  
 
sub new
 
sub new
Line 30: Line 53:
 
         {
 
         {
 
                 $self->{visible} = '';
 
                 $self->{visible} = '';
                 $self->{error} = 'Module LWP::UserAgent not found.';
+
                 $self->{error} = 'Unable to load required module LWP::UserAgent';
 
         }
 
         }
  
Line 66: Line 89:
 
         my %output = ();
 
         my %output = ();
  
         $input =~ m/([0-9]+)/;
+
         $input =~ m/([A-Za-z0-9]+)/;
 
         $input = $1;
 
         $input = $1;
  
Line 76: Line 99:
 
                 "ItemId=$input&".
 
                 "ItemId=$input&".
 
                 "Version=$version&".
 
                 "Version=$version&".
                 "ResponseGroup=Large,EditorialReview&";
+
                 "ResponseGroup=$responsegroup";
  
 
         my $ua = LWP::UserAgent->new;
 
         my $ua = LWP::UserAgent->new;
Line 85: Line 108:
  
 
         my $rep =
 
         my $rep =
                 $dom->getElementsByTagName("Items")->item(0)->
+
                 $dom->getElementsByTagName('Items')->item(0)->
                 getElementsByTagName("Request")->item(0);
+
                 getElementsByTagName('Request')->item(0);
  
 
         my $reptext =
 
         my $reptext =
                 EPrints::Utils::tree_to_utf8($rep->getElementsByTagName("IsValid")->item(0));
+
                 EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));
  
 
         unless ($reptext eq 'True')  
 
         unless ($reptext eq 'True')  
 
         {
 
         {
                 $plugin->error("Invalid AWS Request");
+
                 $plugin->error('Invalid AWS Request');
 
                 return undef;
 
                 return undef;
 
         }
 
         }
Line 99: Line 122:
 
         #Get Item Object
 
         #Get Item Object
 
         my $item =
 
         my $item =
                 $dom->getElementsByTagName("Items")->item(0)->
+
                 $dom->getElementsByTagName('Items')->item(0)->
                 getElementsByTagName("Item")->item(0);
+
                 getElementsByTagName('Item')->item(0);
  
 
         unless (defined $item)  
 
         unless (defined $item)  
 
         {
 
         {
                 $plugin->error("No Item element found");
+
                 $plugin->error('No Item element found');
 
                 return undef;
 
                 return undef;
 
         }
 
         }
  
         my $attr = $item->getElementsByTagName("ItemAttributes")->item(0);
+
         my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
  
         my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName("ProductGroup")->item(0));
+
         my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));
  
 
         unless ($pg eq 'Book')  
 
         unless ($pg eq 'Book')  
 
         {
 
         {
                 $plugin->error("Product is not a book.");
+
                 $plugin->error('Product is not a book.');
 
                 return undef;
 
                 return undef;
 
         }
 
         }
  
         $output{type} = "book";
+
         $output{type} = 'book';
         $output{refereed} = "FALSE";
+
         $output{refereed} = 'FALSE';
         $output{ispublished} = "pub";
+
         $output{ispublished} = 'pub';
  
         my $title = $attr->getElementsByTagName("Title")->item(0);
+
         my $title = $attr->getElementsByTagName('Title')->item(0);
 
         $output{title} = EPrints::Utils::tree_to_utf8($title);
 
         $output{title} = EPrints::Utils::tree_to_utf8($title);
  
         my $url = $item->getElementsByTagName("DetailPageURL")->item(0);
+
         my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
 
         $output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
 
         $output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
  
         my $isbn = $attr->getElementsByTagName("ISBN")->item(0);
+
         my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
 
         if (defined $isbn)
 
         if (defined $isbn)
 
         {
 
         {
Line 134: Line 157:
 
         }
 
         }
  
         my $pages = $attr->getElementsByTagName("NumberOfPages")->item(0);
+
         my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
 
         if (defined $pages)
 
         if (defined $pages)
 
         {
 
         {
Line 140: Line 163:
 
         }
 
         }
  
         my $publisher = $attr->getElementsByTagName("Publisher")->item(0);
+
         my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
 
         if (defined $publisher)
 
         if (defined $publisher)
 
         {
 
         {
Line 146: Line 169:
 
         }
 
         }
  
         my $pubdate = $attr->getElementsByTagName("PublicationDate")->item(0);
+
         my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
 
         if (defined $pubdate)
 
         if (defined $pubdate)
 
         {
 
         {
Line 156: Line 179:
  
 
1;
 
1;
</pre>
+
</syntaxhighlight>
  
 
= In More Detail =
 
= In More Detail =
 +
We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.
 
<pre>
 
<pre>
 
use URI::Escape;
 
use URI::Escape;
 
</pre>
 
</pre>
 +
 +
Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the [http://en.wikipedia.org/wiki/TLD TLD] we can use the US, Canadian, German or French servers.
 +
 +
The accesskey variable stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.
 +
 +
Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, which gives us more information about the item.
 +
 
<pre>
 
<pre>
my $endpoint = "http://ecs.amazonaws.co.uk/onca/xml";
+
my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
 
my $accesskey = '<YOURAMAZONWSKEY>';
 
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = "AWSECommerceService";
+
my $service = 'AWSECommerceService';
my $operation = "ItemLookup";
+
my $operation = 'ItemLookup';
my $version = "2007-07-16";
+
my $version = '2007-07-16';
 +
my $responsegroup = 'Large';
 
</pre>
 
</pre>
  
 
== Constructor ==
 
== Constructor ==
 +
The constructor is similar to the one used for the [Contribute:_Plugins/ImportPluginsCSV CSV plugin], except this one will import individual eprints, given an [http://en.wikipedia.org/wiki/ASIN ASIN].
 
<pre>
 
<pre>
 
         $self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
 
         $self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];
 
</pre>
 
</pre>
  
 +
Like we imported Text::CSV in the [Contribute:_Plugins/ImportPluginsCSV last tutorial], here we import LWP::UserAgent which will be used for making requests to the web service.
 
<pre>
 
<pre>
 
         my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
 
         my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
Line 180: Line 214:
 
         {
 
         {
 
                 $self->{visible} = '';
 
                 $self->{visible} = '';
                 $self->{error} = 'Module LWP::UserAgent not found.';
+
                 $self->{error} = 'Unable to load required module LWP::UserAgent';
 
         }
 
         }
 
</pre>
 
</pre>
Line 186: Line 220:
 
== Input ==
 
== Input ==
 
=== input_fh ===
 
=== input_fh ===
 +
This method is similar to the one used in the [Contribute:_Plugins/ImportPluginsCSV CSV plugin], but doesn't have to do quite so much work.
 +
 +
First we create the array to hold our imported eprint ids.
 
<pre>
 
<pre>
sub input_fh
 
{
 
        my( $plugin, %opts ) = @_;
 
 
         my @ids;
 
         my @ids;
 +
</pre>
 +
 +
Next we read all the lines in the supplied file handle into our records array.
 +
<pre>
 
         my $fh = $opts{fh};
 
         my $fh = $opts{fh};
  
 
         my @records = <$fh>;
 
         my @records = <$fh>;
 +
</pre>
 +
 +
Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.
 +
<pre>
 
         foreach my $input_data (@records)
 
         foreach my $input_data (@records)
 
         {
 
         {
Line 205: Line 247:
 
                 }
 
                 }
 
         }
 
         }
 +
</pre>
  
 +
Then we return a List object of the items imported.
 +
<pre>
 
         return EPrints::List->new(
 
         return EPrints::List->new(
 
                         dataset => $opts{dataset},
 
                         dataset => $opts{dataset},
 
                         session => $plugin->{session},
 
                         session => $plugin->{session},
 
                         ids=>\@ids );
 
                         ids=>\@ids );
}
 
 
</pre>
 
</pre>
  
 
=== convert_input ===
 
=== convert_input ===
 +
 +
[http://en.wikipedia.org/wiki/ASIN ASINs] are strings which identify a product. Here we remove any non-alphanumerical characters which are surrounding the [http://en.wikipedia.org/wiki/ASIN ASIN].
 
<pre>
 
<pre>
         $input =~ m/([0-9]+)/;
+
         $input =~ m/([A-Za-z0-9]+)/;
 
         $input = $1;
 
         $input = $1;
 
</pre>
 
</pre>
  
 +
We form the request from the variables we created earlier and the [http://en.wikipedia.org/wiki/ASIN ASIN] we have just obtained.
 
<pre>
 
<pre>
        #Perform the request
 
 
         my $request =
 
         my $request =
 
                 "$endpoint?".
 
                 "$endpoint?".
Line 228: Line 274:
 
                 "ItemId=$input&".
 
                 "ItemId=$input&".
 
                 "Version=$version&".
 
                 "Version=$version&".
                 "ResponseGroup=Large,EditorialReview&";
+
                 "ResponseGroup=$responsegroup";
 +
</pre>
  
 
+
We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.
        #Send the request
+
<pre>
 
         my $ua = LWP::UserAgent->new;
 
         my $ua = LWP::UserAgent->new;
 
         $ua->timeout(30);
 
         $ua->timeout(30);
 
         my $response = $ua->get($request);
 
         my $response = $ua->get($request);
 +
</pre>
  
        #Create domtree
+
We then create a [http://en.wikipedia.org/wiki/Document_Object_Model DOM] object from the XML document returned.
 +
<pre>
 
         my $dom = EPrints::XML::parse_xml_string($response->content);
 
         my $dom = EPrints::XML::parse_xml_string($response->content);
 +
</pre>
  
        #Get and check Amazon Response
+
Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.
 +
 
 +
Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we  call the error method and return undef.
 +
<pre>
 
         my $rep =
 
         my $rep =
 
                 $dom->getElementsByTagName("Items")->item(0)->
 
                 $dom->getElementsByTagName("Items")->item(0)->
                 getElementsByTagName("Request")->item(0);
+
                 getElementsByTagName('Request')->item(0);
  
 
         my $reptext =
 
         my $reptext =
                 EPrints::Utils::tree_to_utf8($rep->getElementsByTagName("IsValid")->item(0));
+
                 EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));
  
 
         unless ($reptext eq 'True')  
 
         unless ($reptext eq 'True')  
 
         {
 
         {
                 $plugin->error("Invalid AWS Request");
+
                 $plugin->error('Invalid AWS Request');
 
                 return undef;
 
                 return undef;
 
         }
 
         }
 +
</pre>
  
        #Get Item Object
+
The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.
 +
<pre>
 
         my $item =
 
         my $item =
                 $dom->getElementsByTagName("Items")->item(0)->
+
                 $dom->getElementsByTagName('Items')->item(0)->
                 getElementsByTagName("Item")->item(0);
+
                 getElementsByTagName('Item')->item(0);
  
 
         unless (defined $item)  
 
         unless (defined $item)  
 
         {
 
         {
                 $plugin->error("No Item element found");
+
                 $plugin->error('No Item element found');
 
                 return undef;
 
                 return undef;
 
         }
 
         }
 +
</pre>
  
        #Get Attribute Object
+
Each item contains an ItemAttributes element which contains most of the metadata about an item.
         my $attr = $item->getElementsByTagName("ItemAttributes")->item(0);
+
<pre>
 +
         my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);
 +
</pre>
  
        #Check this is a book
+
For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.
         my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName("ProductGroup")->item(0));
+
<pre>
 +
         my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));
  
 
         unless ($pg eq 'Book')  
 
         unless ($pg eq 'Book')  
 
         {
 
         {
                 $plugin->error("Product is not a book.");
+
                 $plugin->error('Product is not a book.');
 
                 return undef;
 
                 return undef;
 
         }
 
         }
 +
</pre>
  
        #Populate Hash
+
Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.
         $output{type} = "book";
+
<pre>
         $output{refereed} = "FALSE";
+
         $output{type} = 'book';
         $output{ispublished} = "pub";
+
         $output{refereed} = 'FALSE';
 +
         $output{ispublished} = 'pub';
 +
</pre>
  
        #Add Title
+
We get and set the title.
         my $title = $attr->getElementsByTagName("Title")->item(0);
+
<pre>
 +
         my $title = $attr->getElementsByTagName('Title')->item(0);
 
         $output{title} = EPrints::Utils::tree_to_utf8($title);
 
         $output{title} = EPrints::Utils::tree_to_utf8($title);
 +
</pre>
  
        #Add URL
+
Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.
         my $url = $item->getElementsByTagName("DetailPageURL")->item(0);
+
<pre>
 +
         my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
 
         $output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
 
         $output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));
 +
</pre>
  
        #Add ISBN
+
We can set the [http://en.wikipedia.org/wiki/ISBN ISBN]. Note that the [http://en.wikipedia.org/wiki/ISBN ISBN] is often the same as the [http://en.wikipedia.org/wiki/ASIN ASIN].
         my $isbn = $attr->getElementsByTagName("ISBN")->item(0);
+
<pre>
 +
         my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
 
         if (defined $isbn)
 
         if (defined $isbn)
 
         {
 
         {
 
                 $output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
 
                 $output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
 
         }
 
         }
 +
</pre>
  
        #Add Number of Pages
+
We can set the number of pages.
         my $pages = $attr->getElementsByTagName("NumberOfPages")->item(0);
+
<pre>
 +
         my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
 
         if (defined $pages)
 
         if (defined $pages)
 
         {
 
         {
 
                 $output{pages} = EPrints::Utils::tree_to_utf8($pages);
 
                 $output{pages} = EPrints::Utils::tree_to_utf8($pages);
 
         }
 
         }
 +
</pre>
  
        #Add Publisher/Publication Date
+
We can set the publisher.
         my $publisher = $attr->getElementsByTagName("Publisher")->item(0);
+
<pre>
 +
         my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
 
         if (defined $publisher)
 
         if (defined $publisher)
 
         {
 
         {
 
                 $output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
 
                 $output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
 
         }
 
         }
 +
</pre>
  
         my $pubdate = $attr->getElementsByTagName("PublicationDate")->item(0);
+
We can set the publication date and finally return our output hash.
 +
<pre>
 +
         my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
 
         if (defined $pubdate)
 
         if (defined $pubdate)
 
         {
 
         {
Line 317: Line 391:
  
 
         return \%output;
 
         return \%output;
}
+
</pre>
 +
 
 +
= Testing Your Plugin =
 +
After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.
 +
 
 +
We'll start by collecting a few ASINS. Go to [http://www.amazon.co.uk Amazon] and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.
  
1;
+
Now we'll demonstrate importing from Amazon with a few sample ASINs.
 +
Type this into the "Cut and Paste Records" box:
 +
<pre>
 +
0946719616
 +
0297843877
 
</pre>
 
</pre>
  
= Testing Your Plugin =
+
Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".

Latest revision as of 11:13, 24 May 2021

Import Plugin Tutorial 2: Amazon Web Services

In the last tutorial we created an import plugin that took data which needed very little modification to import into the respository. The column names in the CSV file matched the names of metadata fields present in the repository. In this tutorial we'll look at importing data that needs some modification to be imported, needs more error checking and is obtained in a different way.

We will be using Amazon's E-Commerce Webservice to import books from their website into our respository given a list of ASIN.

We will be accessing the service using a REST approach, communicating with the server using URL parameters and retrieving an XML document in response to our request. It is also possible to access their services using SOAP, but that will not be discussed here.

Before You Start

Amazon Web Services

To use Amazon's web services you must first signup for an account here. Their site has extensive documentation on the services that they offer as well as example programs including some written in Perl.

Required Modules

To prepare for this tutorial you should make sure the LWP::UserAgent module is installed. The following command as root, or using sudo should work.

cpan LWP::UserAgent

AWS.pm

The code in the section below should be placed in a file called CSV.pm in the directory created previously, and MyPlugins should be changed to the name of that directory.

package EPrints::Plugin::Import::MyPlugins::AWS;

use EPrints::Plugin::Import::TextFile;
use strict;
use URI::Escape;

our @ISA = ('EPrints::Plugin::Import::TextFile');

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

sub new
{
        my( $class, %params ) = @_;
        my $self = $class->SUPER::new( %params );

        $self->{name} = 'AWS';
        $self->{visible} = 'all';
        $self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

        my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
        unless ($rc)
        {
                $self->{visible} = '';
                $self->{error} = 'Unable to load required module LWP::UserAgent';
        }

        return $self;
}

sub input_fh
{
        my( $plugin, %opts ) = @_;
        my @ids;
        my $fh = $opts{fh};

        my @records = <$fh>;
        foreach my $input_data (@records)
        {
                my $epdata = $plugin->convert_input($input_data);
                next unless defined $epdata;

                my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
                if( defined $dataobj )
                {
                        push @ids, $dataobj->get_id;
                }
        }

        return EPrints::List->new(
                        dataset => $opts{dataset},
                        session => $plugin->{session},
                        ids=>\@ids );
}

sub convert_input
{
        my ($plugin, $input) = @_;
        my %output = ();

        $input =~ m/([A-Za-z0-9]+)/;
        $input = $1;

        my $request =
                "$endpoint?".
                "Service=$service&".
                "AWSAccessKeyId=$accesskey&".
                "Operation=$operation&".
                "ItemId=$input&".
                "Version=$version&".
                "ResponseGroup=$responsegroup";

        my $ua = LWP::UserAgent->new;
        $ua->timeout(30);
        my $response = $ua->get($request);

        my $dom = EPrints::XML::parse_xml_string($response->content);

        my $rep =
                $dom->getElementsByTagName('Items')->item(0)->
                getElementsByTagName('Request')->item(0);

        my $reptext =
                EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

        unless ($reptext eq 'True') 
        {
                $plugin->error('Invalid AWS Request');
                return undef;
        }

        #Get Item Object
        my $item =
                $dom->getElementsByTagName('Items')->item(0)->
                getElementsByTagName('Item')->item(0);

        unless (defined $item) 
        {
                $plugin->error('No Item element found');
                return undef;
        }

        my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

        my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

        unless ($pg eq 'Book') 
        {
                $plugin->error('Product is not a book.');
                return undef;
        }

        $output{type} = 'book';
        $output{refereed} = 'FALSE';
        $output{ispublished} = 'pub';

        my $title = $attr->getElementsByTagName('Title')->item(0);
        $output{title} = EPrints::Utils::tree_to_utf8($title);

        my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
        $output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

        my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
        if (defined $isbn)
        {
                $output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
        }

        my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
        if (defined $pages)
        {
                $output{pages} = EPrints::Utils::tree_to_utf8($pages);
        }

        my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
        if (defined $publisher)
        {
                $output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
        }

        my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
        if (defined $pubdate)
        {
                $output{date} = EPrints::Utils::tree_to_utf8($pubdate);
        }

        return \%output;
}

1;

In More Detail

We will use the URI::Escape module in this plugin. As it is included with EPrints we don't need to check if it exists first.

use URI::Escape;

Here we setup a number of values for parameters that will be part of our web service requests. The endpoint variable determines which server will be sent the request. Here we have used the UK server, but by changing the TLD we can use the US, Canadian, German or French servers.

The accesskey variable stores the access key you will have gained from signing up to Amazon earlier. You should use the normal access key and not the secret one.

Here we use the ItemLookup operation of the AWSECommerceService with the 2007-07-16 version of the API. Other operations allow searching for items, but here we want to look up specific products. Finally the variable responsegroup determines the amount and nature of the information returned, we select "Large" in this case, which gives us more information about the item.

my $endpoint = 'http://ecs.amazonaws.co.uk/onca/xml';
my $accesskey = '<YOURAMAZONWSKEY>';
my $service = 'AWSECommerceService';
my $operation = 'ItemLookup';
my $version = '2007-07-16';
my $responsegroup = 'Large';

Constructor

The constructor is similar to the one used for the [Contribute:_Plugins/ImportPluginsCSV CSV plugin], except this one will import individual eprints, given an ASIN.

        $self->{produce} = [ 'list/eprint' , 'dataobj/eprint'];

Like we imported Text::CSV in the [Contribute:_Plugins/ImportPluginsCSV last tutorial], here we import LWP::UserAgent which will be used for making requests to the web service.

        my $rc = EPrints::Utils::require_if_exists('LWP::UserAgent');
        unless ($rc)
        {
                $self->{visible} = '';
                $self->{error} = 'Unable to load required module LWP::UserAgent';
        }

Input

input_fh

This method is similar to the one used in the [Contribute:_Plugins/ImportPluginsCSV CSV plugin], but doesn't have to do quite so much work.

First we create the array to hold our imported eprint ids.

        my @ids;

Next we read all the lines in the supplied file handle into our records array.

        my $fh = $opts{fh};

        my @records = <$fh>;

Then we iterate over each record, running convert_input on it, importing it into our repository and adding the id to our array.

        foreach my $input_data (@records)
        {
                my $epdata = $plugin->convert_input($input_data);
                next unless defined $epdata;

                my $dataobj = $plugin->epdata_to_dataobj($opts{dataset},$epdata);
                if( defined $dataobj )
                {
                        push @ids, $dataobj->get_id;
                }
        }

Then we return a List object of the items imported.

        return EPrints::List->new(
                        dataset => $opts{dataset},
                        session => $plugin->{session},
                        ids=>\@ids );

convert_input

ASINs are strings which identify a product. Here we remove any non-alphanumerical characters which are surrounding the ASIN.

        $input =~ m/([A-Za-z0-9]+)/;
        $input = $1;

We form the request from the variables we created earlier and the ASIN we have just obtained.

        my $request =
                "$endpoint?".
                "Service=$service&".
                "AWSAccessKeyId=$accesskey&".
                "Operation=$operation&".
                "ItemId=$input&".
                "Version=$version&".
                "ResponseGroup=$responsegroup";

We now send the request, by creating a new LWP::UserAgent object, setting its timeout to 30 seconds and then performing the request using HTTP GET.

        my $ua = LWP::UserAgent->new;
        $ua->timeout(30);
        my $response = $ua->get($request);

We then create a DOM object from the XML document returned.

        my $dom = EPrints::XML::parse_xml_string($response->content);

Each request contains an Items element within the root element of the document which contains a Request element. This element contains an element IsValid. This element will contain the value True or False depending on whether a valid request was made or not.

Here we obtain the Request element and check that the IsValid element within it contains the value True. If it doesn't we call the error method and return undef.

        my $rep =
                $dom->getElementsByTagName("Items")->item(0)->
                getElementsByTagName('Request')->item(0);

        my $reptext =
                EPrints::Utils::tree_to_utf8($rep->getElementsByTagName('IsValid')->item(0));

        unless ($reptext eq 'True') 
        {
                $plugin->error('Invalid AWS Request');
                return undef;
        }

The product found with the ItemLookup method is contained within an Item element within the Items element. Here we attempt to get that element and raise the error and return undef if we can't.

        my $item =
                $dom->getElementsByTagName('Items')->item(0)->
                getElementsByTagName('Item')->item(0);

        unless (defined $item) 
        {
                $plugin->error('No Item element found');
                return undef;
        }

Each item contains an ItemAttributes element which contains most of the metadata about an item.

        my $attr = $item->getElementsByTagName('ItemAttributes')->item(0);

For some specialised repositories it might make sense to import DVDs, computer games and electronic equipment, but we're just going to deal with books. The ProductGroup element within the ItemAttributes element tells you what sort of item we're dealing with. We're looking for the value 'Book'.

        my $pg = EPrints::Utils::tree_to_utf8($attr->getElementsByTagName('ProductGroup')->item(0));

        unless ($pg eq 'Book') 
        {
                $plugin->error('Product is not a book.');
                return undef;
        }

Here we set a few fields without consulting the imported data. We know this is a book, so we set the type. We assume that it has not been refereed. We also assume it has been published, because we can buy it.

        $output{type} = 'book';
        $output{refereed} = 'FALSE';
        $output{ispublished} = 'pub';

We get and set the title.

        my $title = $attr->getElementsByTagName('Title')->item(0);
        $output{title} = EPrints::Utils::tree_to_utf8($title);

Here we set the official URL to the Amazon product page. We have to do a bit of extra work using the uri_unescape method from the URI::Escape package to convert URI escape codes into characters.

        my $url = $item->getElementsByTagName('DetailPageURL')->item(0);
        $output{official_url} = uri_unescape(EPrints::Utils::tree_to_utf8($url));

We can set the ISBN. Note that the ISBN is often the same as the ASIN.

        my $isbn = $attr->getElementsByTagName('ISBN')->item(0);
        if (defined $isbn)
        {
                $output{isbn} = EPrints::Utils::tree_to_utf8($isbn);
        }

We can set the number of pages.

        my $pages = $attr->getElementsByTagName('NumberOfPages')->item(0);
        if (defined $pages)
        {
                $output{pages} = EPrints::Utils::tree_to_utf8($pages);
        }

We can set the publisher.

        my $publisher = $attr->getElementsByTagName('Publisher')->item(0);
        if (defined $publisher)
        {
                $output{publisher} = EPrints::Utils::tree_to_utf8($publisher);
        }

We can set the publication date and finally return our output hash.

        my $pubdate = $attr->getElementsByTagName('PublicationDate')->item(0);
        if (defined $pubdate)
        {
                $output{date} = EPrints::Utils::tree_to_utf8($pubdate);
        }

        return \%output;

Testing Your Plugin

After restarting your webserver go to the Import Items screen from the Manage Deposits screen. If you can't find this, make sure you're logged in.

We'll start by collecting a few ASINS. Go to Amazon and pick a few books. The URL for each project page is in the form http://www.amazon.co.uk/Combination-of-title-an-author/dp/ASIN... Collect a few different ASINS.

Now we'll demonstrate importing from Amazon with a few sample ASINs. Type this into the "Cut and Paste Records" box:

0946719616
0297843877

Select "AWS" from the Select import format drop down menu and click "Test Run + Import". You should end up at the Manage Deposits screen with the following message being displayed "Import completed: 2 item(s) imported.".