Difference between revisions of "SWORD 1.3"

From EPrints Documentation
Jump to: navigation, search
(Depositing (from Perl))
Line 177: Line 177:
 
   # $ep is eprint to transfer
 
   # $ep is eprint to transfer
 
   my $ua = LWP::UserAgent->new;
 
   my $ua = LWP::UserAgent->new;
   my $auth = "Basic " . MIME::Base64::encode( "$username:$password", '' );
+
   my $auth = "Basic " . MIME::Base64::encode( "$username:$password", '' ); # eg: sworduser:mySecretPassword
  
 
   my %headers = (
 
   my %headers = (
Line 188: Line 188:
 
       'Authorization'      => $auth,
 
       'Authorization'      => $auth,
 
   );
 
   );
   my $url    = "${host}${collection}";
+
   my $url    = "${host}${collection}";               # eg: http://eprints.example.com/sword-app/deposit/review
   my $buffer = $ep->export($exporter) # eg BibTeX
+
   my $buffer = $ep->export($exporter)               # eg BibTeX
  
 
   my $r = $ua->post( $url, %headers, Content => $buffer );
 
   my $r = $ua->post( $url, %headers, Content => $buffer );

Revision as of 15:05, 21 January 2015

SWORD 1.3 is the only option in EPrints 3.2, and is available via an EPrints bazaar plugin in EPrints 3.3

Terminology

SWORD 1.3 uses some specific terms for specific meanings

  • collection The specific URL within the server for the data to go into. For EPrints this generally means inbox, review, archive, deleted - however for DSpace, there is a Collection concept; and Fedora has a similar RDF tag for defining collective groupings.
  • package The URI that identifies how a particular deposit has been wrapped up.
  • mediation This is where one user can deposit on behalf of another user.
  • servicedocument The document that the SWORD server can return to inform clients of what collections and what packages are understood by the service

Protocol implementation

verbose no-op

Configuring SWORD

The default location for SWORD configuration is

 archives/<your repo>/cfg/cfg.d/sword.pl

This is where you enable and disable access to various collections, and add/remove packages

servicedocument

The servicedocument is an XML listing of which "collections" are available, and what "packages" can be used with each one.

  • a "collection" in EPrints terms is inbox, review, archive - which correspond to the users workspace, the administration review buffer, and visible in the live repository
  • a "package" is an agreed method for wrapping up the data being sent over - as XML, as formatted text, in a zip file, etc...

The default location for the servicedocument is /sword-app/servicedocument

example

Below is an example framework of the servicedocument

  <service xmlns="http://www.w3.org/2007/app" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sword="http://purl.org/net/sword/"
   xmlns:dcterms="http://purl.org/dc/terms/">
  <workspace>
    <atom:title>OpenDepot.org</atom:title>
    <collection href="http://opendepot.org/sword-app/deposit/buffer">
    ....
    </collection>
    <collection href="http://opendepot.org/sword-app/deposit/inbox">
    .... 
    </collection>
  </workspace>

and in each collection is listed for package formats unsderstood:

  <collection href="http://opendepot.org/sword-app/deposit/buffer">
    <atom:title>Repository Review</atom:title>
    <accept>*/*</accept>
    <sword:acceptPackaging q="0.2">http://www.loc.gov/METS/</sword:acceptPackaging>
    <sword:acceptPackaging q="1.0">http://eprints.org/ep2/data/2.0</sword:acceptPackaging>
    <sword:acceptPackaging q="0.2">http://www.imsglobal.org/xsd/imscp_v1p1</sword:acceptPackaging>
    <sword:acceptPackaging q="0.2">http://purl.org/net/sword-types/METSDSpaceSIP</sword:acceptPackaging>
    <sword:collectionPolicy/>
    <sword:treatment>
      Deposited items will undergo the review process. Upon approval, items will appear in the live repository.
    </sword:treatment>
  <sword:mediation>true</sword:mediation>
  <dcterms:abstract>This is the repository review.</dcterms:abstract>
  </collection>

Writing your own Importer

In EPrints 3.2, you need to create two things to enable a new importer

  1. You need to configure the repository to recognise a new packagge format, and associate that format with the code that handles it
    • This lives in ~~eprints/archives/<ID>/cfg/cfg.d/
  1. You need to write the actual package that handles the file being deposited
    • This live in ~~eprints/archives/<ID>/cfg/plugins/EPrints/plugin/Sword/Import/
    • For complex importers, you may end up writing multiple packages - its Perl.... TMTOWTDI

Configuration file

This is relatively easy file to write - for example:

  # Add in the RJ_Broker acceptance type
  $c->{sword}->{supported_packages}->{"http://opendepot.org/broker/1.0"} = 
  {
    name => "Repository Junction Broker",
    plugin => "Sword::Import::RJ_Broker",
    qvalue => "0.8"
  };
  • The name is the string that's shown in the servicedocument
  • The plugin is the package used to handle the file deposited
  • The qvalue is what's known as the "Quality Value" - how closely the importer matches all the metadata wanted by the repository.
    • The theory is that clients have a list of packages they can export as, and servers have a list of packages they understand - therefore a client can determine the best package for that transfer, based on relative QValues.

Importer Plugin

The importer plugin is a perl package that handles the deposited file, and creates a new record in the repository for the record deposited.

The Importer system is, as with all EPrints code, deeply hierarchical:

  1. EPrints::Plugin::Sword::Import::MyImporter will inherit most of its code from EPrints::Plugin::Sword::Import
  2. EPrints::Plugin::Sword::Import is never run directly, and inherits most it its functions from EPrints::Plugin::Import
  3. EPrints::Plugin::Import is another base class (one that is there to provide central functions), and inherits from EPrints::Plugin
  4. EPrints::Plugin is the base class for all plugins.

In practice, the best way to learn how importers are written is to look at existing importers (~~eprints/perl-lib/EPrints/Plugin/Sword/Import/...)

Basic framework

The very basic framework for an importer is just to register the importer with EPrints, and leave all functions to be handled by inheritence:

package EPrints::Plugin::Sword::Import::MyImporter;

use strict;

use EPrints::Plugin::Sword::Import;
our @ISA = qw/ EPrints::Plugin::Sword::Import /;


sub new {
  my ( $class, %params ) = @_;
  my $self = $class->SUPER::new(%params);
  $self->{name} = "My Sword Importer";
  return $self;
} ## end sub new

1;

This is actually pretty useless as all it will do is create a blank record, with the deposited file attached as a document. To be useful, it needs to somehow read in some metadata and maybe attach some files.

The first important task will be to handle the file deposited. For this, you need to create a function called input_file, and it will have a framework something like:

##        $opts{file} = $file;
##        $opts{mime_type} = $headers->{content_type};
##        $opts{dataset_id} = $target_collection;
##        $opts{owner_id} = $owner->get_id;
##        $opts{depositor_id} = $depositor->get_id if(defined $depositor);
##        $opts{no_op}   = is this a No-op?
##        $opts{verbose} = is this verbosed?
sub input_file
{
    my ( $plugin, %opts) = @_;

    my $session = $plugin->{session};

    # needs to read the xml from the file:
    open my $fh, $file;
    my @xml = <$fh>;
    close $fh;
    my $xml = join '', @xml;

    my $epdata = {};
    my $epdata = $plugin->parse_xml($xml);
    my $eprint = $dataset->create_object( $plugin->{session}, $epdata );
    return $eprint;
}

sub parse_xml
{
    my ($plugin, $xml) = @_;
    my $epdata = {};

    #### do stuff

    return $epdata;
}

Depositing (from Perl)

A SWORD deposit is, at its most basic level, just an HTTP POST request, so can be scripted fairly easily:

   # $ep is eprint to transfer
   my $ua = LWP::UserAgent->new;
   my $auth = "Basic " . MIME::Base64::encode( "$username:$password", '' );  # eg: sworduser:mySecretPassword

   my %headers = (
      'X-Packaging'         => $package,              # eg: http://opendepot.org/broker/1.0
      'X-No-Op'             => 'false',
      'X-Verbose'           => 'true',
      'Content-Disposition' => "filename=$filename",  # The name of the "file" to be importer things its reading
      'Content-Type'        => $mime,                 # eg: application/zip
      'User-Agent'          => 'OA-RJ Broker v0.2',
      'Authorization'       => $auth,
   );
   my $url    = "${host}${collection}";               # eg: http://eprints.example.com/sword-app/deposit/review
   my $buffer = $ep->export($exporter)                # eg BibTeX

   my $r = $ua->post( $url, %headers, Content => $buffer );
   if ( $r->is_success ) {
     # Transferred
     my $content = $r->content;
     my $return_id;
     if ( $content =~ m#<atom:id>([^<]+)</atom:id># ) {
        $return_id = $1 if $1;
     }
   } else {
     # fail
   }