From EPrints Documentation
Revision as of 14:25, 17 October 2013 by Kiz (talk | contribs) (Created page with 'SWORD 1.3 is the only option in EPrints 3.2, and is available via an EPrints bazaar plugin in EPrints 3.3 == Terminology == SWORD 1.3 uses some specific terms for specific mean…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

SWORD 1.3 is the only option in EPrints 3.2, and is available via an EPrints bazaar plugin in EPrints 3.3


SWORD 1.3 uses some specific terms for specific meanings

  • collection The specific URL within the server for the data to go into. For EPrints this generally means inbox, review, archive, deleted - however for DSpace, there is a Collection concept; and Fedora has a similar RDF tag for defining collective groupings.
  • package The URI that identifies how a particular deposit has been wrapped up.
  • mediation This is where one user can deposit on behalf of another user.
  • servicedocument The document that the SWORD server can return to inform clients of what collections and what packages are understood by the service

Protocol implementation

verbose no-op

Configuring SWORD

The default location for SWORD configuration is

 archives/<your repo>/cfg/cfg.d/sword.pl

This is where you enable and disable access to various collections, and add/remove packages


The servicedocument is an XML listing of which "collections" are available, and what "packages" can be used with each one.

  • a "collection" in EPrints terms is inbox, review, archive - which correspond to the users workspace, the administration review buffer, and visible in the live repository
  • a "package" is an agreed method for wrapping up the data being sent over - as XML, as formatted text, in a zip file, etc...

The default location for the servicedocument is /sword-app/servicedocument


Below is an example framework of the servicedocument

  <service xmlns="http://www.w3.org/2007/app" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sword="http://purl.org/net/sword/"
    <collection href="http://opendepot.org/sword-app/deposit/buffer">
    <collection href="http://opendepot.org/sword-app/deposit/inbox">

and in each collection is listed for package formats unsderstood:

  <collection href="http://opendepot.org/sword-app/deposit/buffer">
    <atom:title>Repository Review</atom:title>
    <sword:acceptPackaging q="0.2">http://www.loc.gov/METS/</sword:acceptPackaging>
    <sword:acceptPackaging q="1.0">http://eprints.org/ep2/data/2.0</sword:acceptPackaging>
    <sword:acceptPackaging q="0.2">http://www.imsglobal.org/xsd/imscp_v1p1</sword:acceptPackaging>
    <sword:acceptPackaging q="0.2">http://purl.org/net/sword-types/METSDSpaceSIP</sword:acceptPackaging>
      Deposited items will undergo the review process. Upon approval, items will appear in the live repository.
  <dcterms:abstract>This is the repository review.</dcterms:abstract>

Writing your own Importer

In EPrints 3.2, you need to create two things to enable a new importer

  1. You need to configure the repository to recognise a new packagge format, and associate that format with the code that handles it
    • This lives in ~~eprints/archives/<ID>/cfg/cfg.d/
  1. You need to write the actual package that handles the file being deposited
    • This live in ~~eprints/archives/<ID>/cfg/plugins/EPrints/plugin/Sword/Import/
    • For complex importers, you may end up writing multiple packages - its Perl.... TMTOWTDI

Configuration file

This is relatively easy file to write - for example:

  # Add in the RJ_Broker acceptance type
  $c->{sword}->{supported_packages}->{"http://opendepot.org/broker/1.0"} = 
    name => "Repository Junction Broker",
    plugin => "Sword::Import::RJ_Broker",
    qvalue => "0.8"
  • The name is the string that's shown in the servicedocument
  • The plugin is the package used to handle the file deposited
  • The qvalue is what's known as the "Quality Value" - how closely the importer matches all the metadata wanted by the repository.
    • The theory is that clients have a list of packages they can export as, and servers have a list of packages they understand - therefore a client can determine the best package for that transfer, based on relative QValues.

Importer Plugin

The importer plugin is a perl package that handles the deposited file, and creates a new record in the repository for the record deposited.

The Importer system is, as with all EPrints code, deeply hierarchical:

  1. EPrints::Plugin::Sword::Import::MyImporter will inherit most of its code from EPrints::Plugin::Sword::Import
  2. EPrints::Plugin::Sword::Import is never run directly, and inherits most it its functions from EPrints::Plugin::Import
  3. EPrints::Plugin::Import is another base class (one that is there to provide central functions), and inherits from EPrints::Plugin
  4. EPrints::Plugin is the base class for all plugins.

In practice, the best way to learn how importers are written is to look at existing importers (~~eprints/perl-lib/EPrints/Plugin/Sword/Import/...)

Basic framework

The very basic framework for an importer is just to register the importer with EPrints, and leave all functions to be handled by inheritence:

package EPrints::Plugin::Sword::Import::MyImporter;

use strict;

use EPrints::Plugin::Sword::Import;
our @ISA = qw/ EPrints::Plugin::Sword::Import /;

sub new {
  my ( $class, %params ) = @_;
  my $self = $class->SUPER::new(%params);
  $self->{name} = "My Sword Importer";
  return $self;
} ## end sub new


This is actually pretty useless as all it will do is create a blank record, with the deposited file attached as a document. To be useful, it needs to somehow read in some metadata and maybe attach some files.

The first important task will be to handle the file deposited. For this, you need to create a function called input_file, and it will have a framework something like:

##        $opts{file} = $file;
##        $opts{mime_type} = $headers->{content_type};
##        $opts{dataset_id} = $target_collection;
##        $opts{owner_id} = $owner->get_id;
##        $opts{depositor_id} = $depositor->get_id if(defined $depositor);
##        $opts{no_op}   = is this a No-op?
##        $opts{verbose} = is this verbosed?
sub input_file
    my ( $plugin, %opts) = @_;

    my $session = $plugin->{session};

    # needs to read the xml from the file:
    open my $fh, $file;
    my @xml = <$fh>;
    close $fh;
    my $xml = join '', @xml;

    my $epdata = {};
    my $epdata = $plugin->parse_xml($xml);
    my $eprint = $dataset->create_object( $plugin->{session}, $epdata );
    return $eprint;

sub parse_xml
    my ($plugin, $xml) = @_;
    my $epdata = {};

    #### do stuff

    return $epdata;