Difference between revisions of "Accessing Metdata Fields"
Line 43: | Line 43: | ||
</pre> | </pre> | ||
− | Note the output_dataobj function. In an export plugin, this will be called on every item in the list that is being exported, and the results for all items | + | Note the output_dataobj function. In an export plugin, this will be called on every item in the list that is being exported, and the results for all items concatenated and outputted. |
There are two function calls of particular interest that aid in retrieving and managing data: | There are two function calls of particular interest that aid in retrieving and managing data: | ||
Line 57: | Line 57: | ||
</pre> | </pre> | ||
− | == Accessing Metadata == | + | == Accessing the DataObjs == |
+ | |||
+ | Every export plugin needs one of two functions. In the case above, where data is extracted from every item and concatenated, output_dataobj needs to be defined. This function will be run on ever dataobj in the list being exported. See the example above. | ||
+ | |||
+ | However, it may be the case that you need to do one of the following: | ||
+ | |||
+ | *output a header | ||
+ | *count the number of records you're exporting | ||
+ | *aggregate data across records | ||
+ | |||
+ | In this case, a output_list function needs to be created: | ||
+ | |||
+ | <pre> | ||
+ | sub output_list | ||
+ | { | ||
+ | my( $plugin, %opts ) = @_; | ||
+ | |||
+ | my $part = csv( $plugin->header_row( %opts ) ); | ||
+ | |||
+ | my $r = []; | ||
+ | |||
+ | binmode( $opts{fh}, ":utf8" ); | ||
+ | |||
+ | if( defined $opts{fh} ) | ||
+ | { | ||
+ | print {$opts{fh}} $part; | ||
+ | } | ||
+ | else | ||
+ | { | ||
+ | push @{$r}, $part; | ||
+ | } | ||
+ | |||
+ | # list of things | ||
+ | |||
+ | $opts{list}->map( sub { | ||
+ | my( $session, $dataset, $item ) = @_; | ||
+ | |||
+ | my $part = $plugin->output_dataobj( $item, %opts ); | ||
+ | if( defined $opts{fh} ) | ||
+ | { | ||
+ | print {$opts{fh}} $part; | ||
+ | } | ||
+ | else | ||
+ | { | ||
+ | push @{$r}, $part; | ||
+ | } | ||
+ | } ); | ||
+ | |||
+ | return if( defined $opts{fh} ); | ||
+ | |||
+ | return join( '', @{$r} ); | ||
+ | } | ||
+ | </pre> | ||
+ | |||
+ | This function outputs a header row for the CSV file and then maps a function on to the list which simply calls the output_dataobj function on each item in the list. | ||
+ | |||
+ | == Accessing DataObj Metadata == | ||
A number of functions exist to aid in accessing and rendering values in a dataobj. | A number of functions exist to aid in accessing and rendering values in a dataobj. |
Revision as of 08:40, 13 October 2011
EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects
This page proves an overview of the API calls you can use to access the data in a DataObj. The example framing this is that of an export plugin.
Contents
The Plugin
Below is a very simple export plugin, which outputs a single eprint or list of eprints as Text citations. This can be found in the perl_lib/EPrints/Plugin/Export directory. A good starting place to understand the structure of these plugins is to have a browse through the existing code base.
package EPrints::Plugin::Export::Text; use EPrints::Plugin::Export::TextFile; @ISA = ( "EPrints::Plugin::Export::TextFile" ); use strict; sub new { my( $class, %opts ) = @_; my $self = $class->SUPER::new( %opts ); $self->{name} = "ASCII Citation"; $self->{accept} = [ 'dataobj/eprint', 'list/eprint' ]; $self->{visible} = "all"; return $self; } sub output_dataobj { my( $plugin, $dataobj ) = @_; my $cite = $dataobj->render_citation; return EPrints::Utils::tree_to_utf8( $cite )."\n\n"; } 1;
Note the output_dataobj function. In an export plugin, this will be called on every item in the list that is being exported, and the results for all items concatenated and outputted.
There are two function calls of particular interest that aid in retrieving and managing data:
my $cite = $dataobj->render_citation;
This returns an HTML DOM object containing the citation of the dataobj as specified in the configuration files (see cfg/citations/eprint/default.xml). Given an HTML DOM object, the following call will convert it into a string:
my $text = EPrints::Utils::tree_to_utf8( $html_dom )
Accessing the DataObjs
Every export plugin needs one of two functions. In the case above, where data is extracted from every item and concatenated, output_dataobj needs to be defined. This function will be run on ever dataobj in the list being exported. See the example above.
However, it may be the case that you need to do one of the following:
- output a header
- count the number of records you're exporting
- aggregate data across records
In this case, a output_list function needs to be created:
sub output_list { my( $plugin, %opts ) = @_; my $part = csv( $plugin->header_row( %opts ) ); my $r = []; binmode( $opts{fh}, ":utf8" ); if( defined $opts{fh} ) { print {$opts{fh}} $part; } else { push @{$r}, $part; } # list of things $opts{list}->map( sub { my( $session, $dataset, $item ) = @_; my $part = $plugin->output_dataobj( $item, %opts ); if( defined $opts{fh} ) { print {$opts{fh}} $part; } else { push @{$r}, $part; } } ); return if( defined $opts{fh} ); return join( '', @{$r} ); }
This function outputs a header row for the CSV file and then maps a function on to the list which simply calls the output_dataobj function on each item in the list.
Accessing DataObj Metadata
A number of functions exist to aid in accessing and rendering values in a dataobj.
my $title = $dataobj->value('title');
$title will now be a scalar containing the value stored in the title field of the dataobj. A function is provided to enable testing first:
if ($dataobj->is_set('title')) { $title = $dataobj->value('title'); }
It is also possible to find out the fields that an item does have by querying the item's dataset:
my $ds = $dataobj->dataset; my @fields = $ds->fields; my %fieldvalues; foreach my $field (@fields) { my $fieldname = $field->name; if ($dataobj->is_set($fieldname)) { $fieldvalues{$fieldname} = $dataobj->value($fieldname); } }
The Structure of Values
On an eprint, the title is generally a simple metadata field. When $dataobj->value is called, it returns a scalar value.
EPrints has two types of metadata field:
- Simple
- Compound
Both types can either be a single or multiple values. An example of a compound multiple field is the creators field:
my $creators = $dataobj->value('creators'); use Data::Dumper; print Dumper $creators;
Data::Dumper is a very useful library that will output the a Perl data structure. In the above case, the output may look something like this:
$VAR1 = [ { 'name' => { 'lineage' => '', 'given' => 'Noura', 'honourific' => '', 'family' => 'Abbas' }, 'id' => '10363' }, { 'name' => { 'lineage' => '', 'given' => 'Andrew', 'honourific' => '', 'family' => 'Gravell' }, 'id' => '22' }, { 'name' => { 'lineage' => '', 'given' => 'Gary', 'honourific' => '', 'family' => 'Wills' }, 'id' => '395' } ];
The above output shows an array of hashes. The structure can be compared to the configuration of the creators field (see cfg/cfg.d/eprint_fields.pl):
{ 'name' => 'creators', 'type' => 'compound', 'multiple' => 1, 'fields' => [ { 'sub_name' => 'name', 'type' => 'name', 'hide_honourific' => 1, 'hide_lineage' => 1, 'family_first' => 1, }, { 'sub_name' => 'id', 'type' => 'text', 'input_cols' => 20, 'allow_null' => 1, } ], 'input_boxes' => 4, },
Each element in the array consists of a hash that has a key for each sub field of the creator field. Note the structure of the name part of the data dump. The name datatype is also a hash.
To summarise:
- Simple
- Usually a scalar value (though some datatypes return a hashref
- Compound
- A hashref containing a simple value for each subfield
- Multiple
- An arrayref containing simple or compound metadata fields
The perl 'ref' function can help you to identify the shape of your data.
Rendering a Field
If you just want a sensible text value from a metadata field, the following will usually be enough:
my $text = EPrints::Utils::tree_to_utf8( $dataobj->render_value('creators') );
This will call the default render method for the field, and convert the HTML DOM object to text.
Documents
To get an array of all of the documents attached to an eprint:
my @docs = $dataobj->get_all_documents;
Note that documents are also first class objects, so document metadata fields can be accessed in the same way as eprint metadata fields. In addition to the above, the following are useful for documents:
my $size = 'medium'; #could also be 'small' or 'preview' my $icon_url = $doc->icon_url(size => $size);
my $download_url = $doc->get_url;
And if you want to access the document on the local storage (though it's important not to modify it):
my $path = $doc->local_path
...though note that there may be more than one file in that directory (and documents contain files, which are also first class objects).