Accessing Metdata Fields

From EPrints Documentation
Revision as of 08:47, 24 September 2015 by Th.lauke@arcor.de (Talk | contribs)

Jump to: navigation, search

EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects


API: Core API

This page proves an overview of the API calls you can use to access the data in a DataObj. The example framing this is that of an export plugin.

The Plugin

Below is a very simple export plugin, which outputs a single eprint or list of eprints as Text citations. This can be found in the perl_lib/EPrints/Plugin/Export directory. A good starting place to understand the structure of these plugins is to have a browse through the existing code base.

package EPrints::Plugin::Export::Text;

use EPrints::Plugin::Export::TextFile;

@ISA = ( "EPrints::Plugin::Export::TextFile" );

use strict;

sub new
{
        my( $class, %opts ) = @_;

        my $self = $class->SUPER::new( %opts );

        $self->{name} = "ASCII Citation";
        $self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
        $self->{visible} = "all";

        return $self;
}


sub output_dataobj
{
        my( $plugin, $dataobj ) = @_;

        my $cite = $dataobj->render_citation;

        return EPrints::Utils::tree_to_utf8( $cite )."\n\n";
}

1;

Note the output_dataobj function. In an export plugin, this will be called on every item in the list that is being exported, and the results for all items concatenated and outputted.

There are two function calls of particular interest that aid in retrieving and managing data:

my $cite = $dataobj->render_citation;

This returns an HTML DOM object containing the citation of the dataobj as specified in the configuration files (see cfg/citations/eprint/default.xml). Given an HTML DOM object, the following call will convert it into a string:

my $text = EPrints::Utils::tree_to_utf8( $html_dom )

Accessing the DataObjs

Every export plugin needs one of two functions. In the case above, where data is extracted from every item and concatenated, output_dataobj needs to be defined. This function will be run on ever dataobj in the list being exported. See the example above.

However, it may be the case that you need to do one of the following:

  • output a header
  • count the number of records you're exporting
  • aggregate data across records

In this case, a output_list function needs to be created. The example below is from the MultilineCSV export plugin:

sub output_list
{
        my( $plugin, %opts ) = @_;

        my $part = csv( $plugin->header_row( %opts ) );

        my $r = [];

        binmode( $opts{fh}, ":utf8" );

        if( defined $opts{fh} )
        {
                print {$opts{fh}} $part;
        }
        else
        {
                push @{$r}, $part;
        }

        # list of things

        $opts{list}->map( sub {
                my( $session, $dataset, $item ) = @_;

                my $part = $plugin->output_dataobj( $item, %opts );
                if( defined $opts{fh} )
                {
                        print {$opts{fh}} $part;
                }
                else
                {
                        push @{$r}, $part;
                }
        } );

        return if( defined $opts{fh} );

        return join( '', @{$r} );
}

This function outputs a header row for the CSV file and then maps a function on to the list which simply calls the output_dataobj function on each item in the list.

Mapping a function onto a list of objects is the correct way to process the dataobjs.

Accessing DataObj Metadata

A number of functions exist to aid in accessing and rendering values in a dataobj.

my $title = $dataobj->value('title');

$title will now be a scalar containing the value stored in the title field of the dataobj. A function is provided to enable testing first:

if ($dataobj->is_set('title'))
{
     $title = $dataobj->value('title');
}

It is also possible to find out the fields that an item does have by querying the item's dataset:

my $ds = $dataobj->dataset;
my @fields = $ds->fields;
my $fieldvalues;
foreach my $field (@fields)
{
     my $fieldname = $field->name;
     if ($dataobj->is_set($fieldname))
     {
          $fieldvalues{$fieldname} = $dataobj->value($fieldname);
     }
}

The Structure of Values

On an eprint, the title is generally a simple metadata field. When $dataobj->value is called, it returns a scalar value.

EPrints has two types of metadata field:

  • Simple
  • Compound

Both types can either be a single or multiple values. An example of a compound multiple field is the creators field:

my $creators = $dataobj->value('creators');
use Data::Dumper;
print Dumper $creators;

Data::Dumper is a very useful library that will output the Perl data structure. In the above case, the output may look something like this:

$VAR1 = [
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Noura',
                        'honourific' => '',
                        'family' => 'Abbas'
                      },
            'id' => '10363'
          },
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Andrew',
                        'honourific' => '',
                        'family' => 'Gravell'
                      },
            'id' => '22'
          },
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Gary',
                        'honourific' => '',
                        'family' => 'Wills'
                      },
            'id' => '395'
          }
        ];

The above output shows an array of hashes. The structure can be compared to the configuration of the creators field (see cfg/cfg.d/eprint_fields.pl):

          {
            'name' => 'creators',
            'type' => 'compound',
            'multiple' => 1,
            'fields' => [
                          {
                            'sub_name' => 'name',
                            'type' => 'name',
                            'hide_honourific' => 1,
                            'hide_lineage' => 1,
                            'family_first' => 1,
                          },
                          {
                            'sub_name' => 'id',
                            'type' => 'text',
                            'input_cols' => 20,
                            'allow_null' => 1,
                          }
                        ],
            'input_boxes' => 4,
          },

Each element in the array consists of a hash that has a key for each sub field of the creator field. Note the structure of the name part of the data dump. The name datatype is also a hash.

To summarise:

Simple
Usually a scalar value (though some datatypes return a hashref
Compound
A hashref containing a simple value for each subfield
Multiple
An arrayref containing simple or compound metadata fields

The perl 'ref' function can help you to identify the shape of your data.

Rendering a Field

If you just want a sensible text value from a metadata field, the following will usually be enough:

my $text = EPrints::Utils::tree_to_utf8( $dataobj->render_value('creators') );

This will call the default render method for the field, and convert the HTML DOM object to text.

You can also render a subfield:

my $text = EPrints::Utils::tree_to_utf8( $dataobj->render_value('creators_name') );

Documents

To get an array of all of the documents attached to an eprint:

my @docs = $dataobj->get_all_documents;

Note that documents are also first class objects, so document metadata fields can be accessed in the same way as eprint metadata fields. In addition to the above, the following are useful for documents:

  my $size = 'medium'; #could also be 'small' or 'preview'
  my $icon_url = $doc->icon_url(size => $size);
  my $download_url = $doc->get_url;

And if you want to access the document on the local storage (though it's important not to modify it):

my $path = $doc->local_path

... though note that there may be more than one file in that directory (and documents contain files, which are also first class objects).