Difference between revisions of "SWORD"

From EPrints Documentation
Jump to: navigation, search
(Importer Plugin)
 
Line 14: Line 14:
 
By default, sword is ENABLED in all SWORD 3.2 & 3.3 EPrints servers, and access is available to all registered users.
 
By default, sword is ENABLED in all SWORD 3.2 & 3.3 EPrints servers, and access is available to all registered users.
  
EPrints 3.2 uses SWORD 1.3
+
EPrints 3.2 uses [[SWORD 1.3]]
  
EPrints 3.3 uses SWORD 2.0
+
EPrints 3.3 uses [[SWORD 2.0]]
  
This document covers EPrints 3.2 & SWORD 1.3
+
For further information on SWORD 2 see [[API:EPrints/Apache/CRUD]].
For information on SWORD 2 see [[API:EPrints/Apache/CRUD]].
 
 
 
== Terminology ==
 
 
 
SWORD 1.3 uses some specific terms for specific meanings
 
* '''collection''' The specific URL within the server for the data to go into. For EPrints this generally means inbox, review, archive, deleted - however for DSpace, there is a Collection concept; and Fedora has a similar RDF tag for defining collective groupings.
 
* '''package''' The URI that identifies how a particular deposit has been wrapped up.
 
* '''mediation''' This is where one user can deposit ''on behalf of'' another user.
 
* '''servicedocument''' The document that the SWORD server can return to inform clients of what collections and what packages are understood by the service
 
 
 
== Protocol implementation ==
 
 
 
verbose
 
no-op
 
 
 
== Configuring SWORD ==
 
 
 
 
 
 
 
The default location for SWORD configuration is
 
 
 
  archives/<your repo>/cfg/cfg.d/sword.pl
 
 
 
This is where you enable and disable access to various ''collections'', and add/remove ''packages''
 
 
 
== servicedocument ==
 
 
 
The servicedocument is an XML listing of which "collections" are available, and what "packages" can be used with each one.
 
 
 
* a "collection" in EPrints terms is <tt>inbox</tt>, <tt>review</tt>, <tt>archive</tt> - which correspond to the users workspace, the administration review buffer, and visible in the live repository
 
* a "package" is an agreed method for wrapping up the data being sent over - as XML, as formatted text, in a zip file, etc...
 
 
 
The default location for the servicedocument is <tt>/sword-app/servicedocument</tt>
 
 
 
=== example ===
 
Below is an example framework of the servicedocument
 
<pre>
 
  <service xmlns="http://www.w3.org/2007/app" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sword="http://purl.org/net/sword/"
 
  xmlns:dcterms="http://purl.org/dc/terms/">
 
  <workspace>
 
    <atom:title>OpenDepot.org</atom:title>
 
    <collection href="http://opendepot.org/sword-app/deposit/buffer">
 
    ....
 
    </collection>
 
    <collection href="http://opendepot.org/sword-app/deposit/inbox">
 
    ....
 
    </collection>
 
  </workspace>
 
</pre>
 
 
 
and in each collection is listed for package formats unsderstood:
 
<pre>
 
  <collection href="http://opendepot.org/sword-app/deposit/buffer">
 
    <atom:title>Repository Review</atom:title>
 
    <accept>*/*</accept>
 
    <sword:acceptPackaging q="0.2">http://www.loc.gov/METS/</sword:acceptPackaging>
 
    <sword:acceptPackaging q="1.0">http://eprints.org/ep2/data/2.0</sword:acceptPackaging>
 
    <sword:acceptPackaging q="0.2">http://www.imsglobal.org/xsd/imscp_v1p1</sword:acceptPackaging>
 
    <sword:acceptPackaging q="0.2">http://purl.org/net/sword-types/METSDSpaceSIP</sword:acceptPackaging>
 
    <sword:collectionPolicy/>
 
    <sword:treatment>
 
      Deposited items will undergo the review process. Upon approval, items will appear in the live repository.
 
    </sword:treatment>
 
  <sword:mediation>true</sword:mediation>
 
  <dcterms:abstract>This is the repository review.</dcterms:abstract>
 
  </collection>
 
</pre>
 
 
 
== Writing your own Importer ==
 
 
 
In EPrints 3.2, you need to create two things to enable a new importer
 
 
 
# You need to configure the repository to recognise a new packagge format, and associate that format with the code that handles it
 
** This lives in <tt>~~eprints/archives/<ID>/cfg/cfg.d/</tt>
 
# You need to write the actual package that handles the file being deposited
 
** This live in <tt>~~eprints/archives/<ID>/cfg/plugins/EPrints/plugin/Sword/Import/</tt>
 
** For complex importers, you may end up writing multiple packages - its Perl.... TMTOWTDI
 
 
 
=== Configuration file ===
 
This is relatively easy file to write - for example:
 
 
 
<pre>
 
  # Add in the RJ_Broker acceptance type
 
  $c->{sword}->{supported_packages}->{"http://opendepot.org/broker/1.0"} =
 
  {
 
    name => "Repository Junction Broker",
 
    plugin => "Sword::Import::RJ_Broker",
 
    qvalue => "0.8"
 
  };
 
</pre>
 
 
 
* The <tt>name</tt> is the string that's shown in the servicedocument
 
* The <code>plugin</code> is the package used to handle the file deposited
 
* The <tt>qvalue</tt> is what's known as the "Quality Value" - how closely the importer matches all the metadata wanted by the repository.
 
** The theory is that clients have a list of packages they can export as, and servers have a list of packages they understand - therefore a client can determine the best package for that transfer, based on relative QValues.
 
 
 
=== Importer Plugin ===
 
The importer plugin is a perl package that handles the deposited file, and creates a new record in the repository for the record deposited.
 
 
 
The Importer system is, as with all EPrints code, deeply hierarchical:
 
 
 
# <code>EPrints::Plugin::Sword::Import::MyImporter</code> will inherit most of its code from <code>EPrints::Plugin::Sword::Import</code>
 
# <code>EPrints::Plugin::Sword::Import</code> is never run directly, and inherits most it its functions from <code>EPrints::Plugin::Import</code>
 
# <code>EPrints::Plugin::Import</code> is another ''base class'' (one that is there to provide central functions), and inherits from <code>EPrints::Plugin</code>
 
# <code>EPrints::Plugin</code> is the ''base class'' for all plugins.
 
 
 
In practice, the best way to learn how importers are written is to look at existing importers (<tt>~~eprints/perl-lib/EPrints/Plugin/Sword/Import/...</tt>)
 
 
 
==== Basic framework ====
 
The very basic framework for an importer is just to register the importer with EPrints, and leave all functions to be handled by inheritence:
 
 
 
<pre>
 
package EPrints::Plugin::Sword::Import::MyImporter;
 
 
 
use strict;
 
 
 
use EPrints::Plugin::Sword::Import;
 
our @ISA = qw/ EPrints::Plugin::Sword::Import /;
 
 
 
 
 
sub new {
 
  my ( $class, %params ) = @_;
 
  my $self = $class->SUPER::new(%params);
 
  $self->{name} = "My Sword Importer";
 
  return $self;
 
} ## end sub new
 
 
 
1;
 
 
 
</pre>
 
 
 
This is actually pretty useless as all it will do is create a blank record, with the deposited file attached as a document. To be useful, it needs to somehow read in some metadata and maybe attach some files.
 
 
 
The first important task will be to handle the file deposited. For this, you need to create a function called <code>input_file</code>, and it will have a framework something like:
 
 
 
<pre>
 
##        $opts{file} = $file;
 
##        $opts{mime_type} = $headers->{content_type};
 
##        $opts{dataset_id} = $target_collection;
 
##        $opts{owner_id} = $owner->get_id;
 
##        $opts{depositor_id} = $depositor->get_id if(defined $depositor);
 
##        $opts{no_op}  = is this a No-op?
 
##        $opts{verbose} = is this verbosed?
 
sub input_file
 
{
 
    my ( $plugin, %opts) = @_;
 
 
 
    my $session = $plugin->{session};
 
 
 
    # needs to read the xml from the file:
 
    open my $fh, $file;
 
    my @xml = <$fh>;
 
    close $fh;
 
    my $xml = join '', @xml;
 
 
 
    my $epdata = {};
 
    my $epdata = $plugin->parse_xml($xml);
 
    my $eprint = $dataset->create_object( $plugin->{session}, $epdata );
 
    return $eprint;
 
}
 
 
 
sub parse_xml
 
{
 
    my ($plugin, $xml) = @_;
 
    my $epdata = {};
 
 
 
    #### do stuff
 
 
 
    return $epdata;
 
}
 
</pre>
 

Latest revision as of 14:22, 17 October 2013

This page is about SWORD which is a lightweight protocol for remotely depositing content into repositories.

The SWORD project was funded by JISC and more information can be found on the official website.

SWORD made easy

SWORD is basically an http put (or POST) to a defined web URL, where the content of the posted request is the thing being deposited.

SWORD 1.3 uses an http header field to define how the thing has been wrapped up (packaged)

SWORD 2.0 uses the content-type to deduce how to understand thing

By default, sword is ENABLED in all SWORD 3.2 & 3.3 EPrints servers, and access is available to all registered users.

EPrints 3.2 uses SWORD 1.3

EPrints 3.3 uses SWORD 2.0

For further information on SWORD 2 see API:EPrints/Apache/CRUD.