Accessing Metdata Fields

From EPrints Documentation
Revision as of 20:02, 12 October 2011 by Af05v@ecs.soton.ac.uk (talk | contribs) (Undo revision 10064 by Af05v@ecs.soton.ac.uk (Talk))
Jump to: navigation, search


This page proves an overview of the API calls you can use to access the data in a DataObj. The example framing this is that of an export plugin.

The Plugin

Below is a very simple export plugin, which outputs a single eprint or list of eprints as Text citations. This can be found in the perl_lib/EPrints/Plugin/Export directory. A good starting place to understand the structure of these plugins is to have a browse through the existing code base.

package EPrints::Plugin::Export::Text;

use EPrints::Plugin::Export::TextFile;

@ISA = ( "EPrints::Plugin::Export::TextFile" );

use strict;

sub new
{
        my( $class, %opts ) = @_;

        my $self = $class->SUPER::new( %opts );

        $self->{name} = "ASCII Citation";
        $self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
        $self->{visible} = "all";

        return $self;
}


sub output_dataobj
{
        my( $plugin, $dataobj ) = @_;

        my $cite = $dataobj->render_citation;

        return EPrints::Utils::tree_to_utf8( $cite )."\n\n";
}

1;

Note the output_dataobj function. In an export plugin, this will be called on every item in the list that is being exported, and the results for all items aggregated and outputted.

There are two function calls of particular interest that aid in retrieving and managing data:

my $cite = $dataobj->render_citation;

This returns an HTML DOM object containing the citation of the dataobj as specified in the configuration files (see cfg/citations/eprint/default.xml). Given an HTML DOM object, the following call will convert it into a string:

my $text = EPrints::Utils::tree_to_utf8( $html_dom )

Accessing Metadata

A number of functions exist to aid in accessing and rendering values in a dataobj.

my $title = $dataobj->value('title');

$title will now be a scalar containing the value stored in the title field of the dataobj. A function is provided to enable testing first:

if ($dataobj->is_set('title'))
{
     $title = $dataobj->value('title');
}

It is also possible to find out the fields that an item does have by querying the item's dataset:

my $ds = $dataobj->dataset;
my @fields = $ds->fields;
my %fieldvalues;
foreach my $field (@fields)
{
     my $fieldname = $field->name;
     if ($dataobj->is_set($fieldname))
     {
          $fieldvalues{$fieldname} = $dataobj->value($fieldname);
     }
}

The Structure of Values

On an eprint, the title is generally a simple metadata field. When $dataobj->value is called, it returns a scalar value.

EPrints has two types of metadata field:

  • Simple
  • Compound

Both types can either be a single or multiple values. An example of a compound multiple field is the creators field:

my $creators = $dataobj->value('creators');
use Data::Dumper;
print Dumper $creators;

Data::Dumper is a very useful library that will output the a Perl data structure. In the above case, the output may look something like this:

$VAR1 = [
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Noura',
                        'honourific' => '',
                        'family' => 'Abbas'
                      },
            'id' => '10363'
          },
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Andrew',
                        'honourific' => '',
                        'family' => 'Gravell'
                      },
            'id' => '22'
          },
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Gary',
                        'honourific' => '',
                        'family' => 'Wills'
                      },
            'id' => '395'
          }
        ];

The above output shows an array of hashes. The structure can be compared to the configuration of the creators field (see cfg/cfg.d/eprint_fields.pl):

          {
            'name' => 'creators',
            'type' => 'compound',
            'multiple' => 1,
            'fields' => [
                          {
                            'sub_name' => 'name',
                            'type' => 'name',
                            'hide_honourific' => 1,
                            'hide_lineage' => 1,
                            'family_first' => 1,
                          },
                          {
                            'sub_name' => 'id',
                            'type' => 'text',
                            'input_cols' => 20,
                            'allow_null' => 1,
                          }
                        ],
            'input_boxes' => 4,
          },

Each element in the array consists of a hash that has a key for each sub field of the creator field. Note the structure of the name part of the data dump. The name datatype is also a hash.

To summarise:

Simple
Usually a scalar value (though some datatypes return a hashref
Compund
A hashref containing a simple value for each subfield
Multiple
An arrayref containing simple or compound metadata fields

The perl 'ref' function can help you to identify the shape of your data.

Rendering a Field

If you just want a sensible text value from a metadata field, the following will usually be enough:

my $text = EPrints::Utils::tree_to_utf8( $dataobj->render_value('creators') );

This will call the default render method for the field, and convert the HTML DOM object to text.

Documents

To get an array of all of the documents attached to an eprint:

my @docs = $dataobj->get_all_documents;

Note that documents are also first class objects, so document metadata fields can be accessed in the same way as eprint metadata fields. In addition to the above, the following are useful for documents:

  my $size = 'medium'; #could also be 'small' or 'thumbnail'
  my $icon_url = $doc->icon_url(size => $size);
  my $download_url = $doc->get_url;

And if you want to access the document on the local storage (though it's important not to modify it):

my $path = $doc->local_path

...though note that there may be more than one file in that directory (and documents contain files, which are also first class objects).