Difference between revisions of "Accessing Metdata Fields"

From EPrints Documentation
Jump to: navigation, search
(The Structure of Values)
(The Structure of Values)
Line 165: Line 165:
  
 
Each element in the array consists of a hash that has a key for each sub field of the creator field.  Note the structure of the name part of the data dump.  The name datatype is also a hash.
 
Each element in the array consists of a hash that has a key for each sub field of the creator field.  Note the structure of the name part of the data dump.  The name datatype is also a hash.
 +
 +
To summarise:
 +
 +
<dl>
 +
<dt>Simple</dt><dd>Usually a scalar value (though some datatypes return a hashref</dd>
 +
<dt>Compund</dt><dd>A hashref containing a simple value for each subfield</dd>
 +
<dt>Multiple</dt><dd>An arrayref containing simple or compound metadata fields</dd>
 +
</dl>

Revision as of 19:04, 12 October 2011

This page proves an overview of the API calls you can use to access the data in a DataObj. The example framing this is that of an export plugin.

The Plugin

Below is a very simple export plugin, which outputs a single eprint or list of eprints as Text citations.

package EPrints::Plugin::Export::Text;

use EPrints::Plugin::Export::TextFile;

@ISA = ( "EPrints::Plugin::Export::TextFile" );

use strict;

sub new
{
        my( $class, %opts ) = @_;

        my $self = $class->SUPER::new( %opts );

        $self->{name} = "ASCII Citation";
        $self->{accept} = [ 'dataobj/eprint', 'list/eprint' ];
        $self->{visible} = "all";

        return $self;
}


sub output_dataobj
{
        my( $plugin, $dataobj ) = @_;

        my $cite = $dataobj->render_citation;

        return EPrints::Utils::tree_to_utf8( $cite )."\n\n";
}

1;

Note the output_dataobj function. In an export plugin, this will be called on every item in the list that is being exported, and the results for all items aggregated and outputted.

There are two function calls of particular interest that aid in retrieving and managing data:

my $cite = $dataobj->render_citation;

This returns an HTML DOM object containing the citation of the dataobj as specified in the configuration files (see cfg/citations/eprint/default.xml). Given an HTML DOM object, the following call will convert it into a string:

my $text = EPrints::Utils::tree_to_utf8( $html_dom )

Accessing Metadata

A number of functions exist to aid in accessing and rendering values in a dataobj.

my $title = $dataobj->value('title');

$title will now be a scalar containing the value stored in the title field of the dataobj. A function is provided to enable testing first:

if ($dataobj->is_set('title'))
{
     $title = $dataobj->value('title');
}

It is also possible to find out the fields that an item does have by querying the item's dataset:

my $ds = $dataobj->dataset;
my @fields = $ds->fields;
my %fieldvalues
foreach my $field (@field)
{
     my $fieldname = $field->name;
     if ($dataobj->is_set($fieldname))
     {
          $fieldvalues{$fieldname} = $dataobj->value($fieldname);
     }
}

The Structure of Values

On an eprint, the title is generally a simple metadata field. When $dataobj->value is called, it returns a scalar value.

EPrints has two types of metadata field:

  • Simple
  • Compound

Both types can either be a single or multiple values. An example of a compound multiple field is the creators field:

my $creators = $dataobj->value('creators');
use Data::Dumper;
print Dumper $creators;

Data::Dumper is a very useful library that will output the a perl datastructure. In the above case, the output may look something like this:

$VAR1 = [
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Noura',
                        'honourific' => '',
                        'family' => 'Abbas'
                      },
            'id' => '10363'
          },
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Andrew',
                        'honourific' => '',
                        'family' => 'Gravell'
                      },
            'id' => '22'
          },
          {
            'name' => {
                        'lineage' => '',
                        'given' => 'Gary',
                        'honourific' => '',
                        'family' => 'Wills'
                      },
            'id' => '395'
          }
        ];

The above output shows an array of hashes. The structure can be compared to the configuration of the creators field (see cfg/cfg.d/eprint_fields.pl):

          {
            'name' => 'creators',
            'type' => 'compound',
            'multiple' => 1,
            'fields' => [
                          {
                            'sub_name' => 'name',
                            'type' => 'name',
                            'hide_honourific' => 1,
                            'hide_lineage' => 1,
                            'family_first' => 1,
                          },
                          {
                            'sub_name' => 'id',
                            'type' => 'text',
                            'input_cols' => 20,
                            'allow_null' => 1,
                          }
                        ],
            'input_boxes' => 4,
          },

Each element in the array consists of a hash that has a key for each sub field of the creator field. Note the structure of the name part of the data dump. The name datatype is also a hash.

To summarise:

Simple
Usually a scalar value (though some datatypes return a hashref
Compund
A hashref containing a simple value for each subfield
Multiple
An arrayref containing simple or compound metadata fields