Difference between revisions of "API:EPrints/DataSet"

From EPrints Documentation
Jump to: navigation, search
Line 6: Line 6:
 
<!-- Pod2Wiki=_private_ --><!-- Pod2Wiki=head_name -->
 
<!-- Pod2Wiki=_private_ --><!-- Pod2Wiki=head_name -->
 
==NAME==
 
==NAME==
'''EPrints::DataSet''' - a dataset is a set of records in the eprints system with the same metadata.
+
'''EPrints::DataSet''' - a set of records with the same metadata scheme
  
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
Line 17: Line 17:
 
<!-- Pod2Wiki=head_synopsis -->
 
<!-- Pod2Wiki=head_synopsis -->
 
==SYNOPSIS==
 
==SYNOPSIS==
   my $dataset = $repository-&gt;get_dataset( "inbox" );
+
   my $dataset = $repository-&gt;dataset( "inbox" );
 
    
 
    
 
   print sprintf("There are %d records in the inbox\n",
 
   print sprintf("There are %d records in the inbox\n",
Line 50: Line 50:
 
<!-- Pod2Wiki=head_description -->
 
<!-- Pod2Wiki=head_description -->
 
==DESCRIPTION==
 
==DESCRIPTION==
This module describes an EPrint dataset.
+
This module describes a dataset.
  
A repository has several datasets that make up the repository's database. The list of dataset ids can be obtained from the repository object (see [[API:EPrints/Repository|EPrints::Repository]]).
+
A repository has several datasets that make up the repository's metadata schema. The list of dataset ids can be obtained from the repository object (see [[API:EPrints/Repository|EPrints::Repository]]).
  
A normal dataset (eg. "user") has a package associated with it  (eg. [[API:EPrints/DataObj/User|EPrints::DataObj::User]]) which must be a subclass of [[API:EPrints/DataObj|EPrints::DataObj]]  and a number of SQL tables which are prefixed with the dataset name. Most datasets also have a set of associated [[API:EPrints/MetaField|EPrints::MetaField]]'s which may be optional or compulsary depending on the type eg. books have editors but posters don't but they are both EPrints.
+
A normal dataset (eg. "user") has a package associated with it  (eg. [[API:EPrints/DataObj/User|EPrints::DataObj::User]]) which must be a subclass of [[API:EPrints/DataObj|EPrints::DataObj]]  and a number of SQL tables which are prefixed with the dataset name. Most datasets also have a set of associated [[API:EPrints/MetaField|EPrints::MetaField]]'s which may be optional or required depending on the type eg. books have editors but posters don't but they are both EPrints.
  
 
The fields contained in a dataset are defined by the data object and by any additional fields defined in cfg.d. Some datasets don't have any fields.
 
The fields contained in a dataset are defined by the data object and by any additional fields defined in cfg.d. Some datasets don't have any fields.
  
Some datasets are "virtual" datasets made from others. Examples include  "inbox", "archive", "buffer" and "retired" which are all virtual datasets  of of the "eprint" dataset. That is to say "inbox" is a subset of "eprint"  and by inference contains [[API:EPrints/DataObj/EPrints|EPrints::DataObj::EPrints]]. You can define your  own virtual datasets which opperate on existing datasets.
+
Some datasets are "virtual" datasets made from others. Examples include  "inbox", "archive", "buffer" and "deletion" which are all virtual datasets  of of the "eprint" dataset. That is to say "inbox" is a subset of "eprint"  and by inference contains [[API:EPrints/DataObj/EPrints|EPrints::DataObj::EPrints]]. You can define your  own virtual datasets which opperate on existing datasets.
  
 +
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 +
<span style='display:none'>User Comments</span>
 +
<!-- Edit below this comment -->
 +
 +
 +
<!-- Pod2Wiki= -->
 +
</div>
 +
<!-- Pod2Wiki=head_creating_custom_datasets -->
 +
==CREATING CUSTOM DATASETS==
 +
New datasets can be defined in a configuration file, e.g.
 +
 +
  $c-&gt;{datasets}-&gt;{bread} = {
 +
    class =&gt; "EPrints::DataObj::Bread",
 +
    sqlname =&gt; "bread",
 +
  };
 +
 
 +
This defines a dataset with the id <tt>bread</tt> (must be unique). The dataobj package (class) to instantiate objects with is <tt>EPrints::DataObj::Bread</tt>, which must be a sub-class of [[API:EPrints/DataObj|EPrints::DataObj]]. Lastly, the database tables used by the dataset will be called 'bread' or prefixed 'bread_'.
 +
 +
Other optional properties:
 +
 +
  columns - an array ref of field ids to default the user view to
 +
  datestamp - field id to use to sort this dataset
 +
  import - is the dataset importable?
 +
  index - is the dataset text-indexed?
 +
  order - is the dataset orderable?
 +
  virtual - completely virtual dataset (no database tables)
 +
 
 +
To make one dataset a virtual dataset of another (as 'inbox' is to 'eprint') use the following properties:
 +
 +
  confid - the super-dataset this is a virtual sub-dataset of
 +
  dataset_id_field - the field containing the sub-dataset id
 +
  filters - an array ref of filters to apply when retrieving records
 +
 
 +
As with system datasets, the [[API:EPrints/MetaField|EPrints::MetaField]]s can be defined via [[API:EPrints/DataObj#get_system_field_info|EPrints::DataObj/get_system_field_info]] or via configuration:
 +
 +
  $c-&gt;add_dataset_field(
 +
    "bread",
 +
    { name =&gt; "breadid", type =&gt; "counter", sql_counter =&gt; "bread" }
 +
  );
 +
  $c-&gt;add_dataset_field(
 +
    "bread",
 +
    { name =&gt; "toasted", type =&gt; "bool", }
 +
  );
 +
  $c-&gt;add_dataset_field(
 +
    "bread",
 +
    { name =&gt; "description", type =&gt; "text", }
 +
  );
 +
 
 +
See [[API:EPrints/RepositoryConfig#add_dataset_field|EPrints::RepositoryConfig/add_dataset_field]] for details on <tt>add_dataset_field</tt>.
 +
 +
Creating a fully-operational dataset will require more configuration files. You will probably want at least a {{API:PodLink|file=EPrints/Workflow|package_name=EPrints::Workflow|section=|text=workflow}}, {{API:PodLink|file=EPrints/Citation|package_name=EPrints::Citation|section=|text=citations}} for the summary page, search results etc, and permissions and searching settings:
 +
 +
  push @{$c-&gt;{user_roles}-&gt;{admin}}, qw(
 +
    +bread/create
 +
    +bread/edit
 +
    +bread/view
 +
    +bread/destroy
 +
    +bread/details
 +
  );
 +
  push @{$c-&gt;{plugins}-&gt;{"Export::SummaryPage"}-&gt;{params}-&gt;{accept}}, qw(
 +
    dataobj/bread
 +
  );
 +
  $c-&gt;{datasets}-&gt;{bread}-&gt;{search}-&gt;{simple} = {
 +
    search_fields =&gt; {
 +
      id =&gt; "q",
 +
      meta_fields =&gt; [qw(
 +
        breadid
 +
        description
 +
      )],
 +
    },
 +
  };
 +
 
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<span style='display:none'>User Comments</span>
 
<span style='display:none'>User Comments</span>

Revision as of 11:40, 14 September 2011

EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects


API: Core API

Latest Source Code (3.4, 3.3) | Revision Log | Before editing this page please read Pod2Wiki


NAME

EPrints::DataSet - a set of records with the same metadata scheme

User Comments


SYNOPSIS

 my $dataset = $repository->dataset( "inbox" );
 
 print sprintf("There are %d records in the inbox\n",
   $dataset->count);
 
 $string = $dataset->base_id; # eprint
 $string = $dataset->id; # inbox
 
 $dataobj = $dataset->create_dataobj( $data );
 $user = $dataset->dataobj( 23 );
 
 $search = $dataset->prepare_search( %options );
 $list = $dataset->search( %options ); # prepare_search( %options )->execute
 $list = $dataset->search; # match ALL
 
 $metafield = $dataset->field( $fieldname );
 $metafield = $dataset->key_field;
 @metafields = $dataset->fields; 
 
 $dataset->search->map( sub {}, $ctx );
 $n = $dataset->search->count; 
 $ids = $dataset->search->ids;
 $list = $dataset->list( \@ids );
 

User Comments


DESCRIPTION

This module describes a dataset.

A repository has several datasets that make up the repository's metadata schema. The list of dataset ids can be obtained from the repository object (see EPrints::Repository).

A normal dataset (eg. "user") has a package associated with it (eg. EPrints::DataObj::User) which must be a subclass of EPrints::DataObj and a number of SQL tables which are prefixed with the dataset name. Most datasets also have a set of associated EPrints::MetaField's which may be optional or required depending on the type eg. books have editors but posters don't but they are both EPrints.

The fields contained in a dataset are defined by the data object and by any additional fields defined in cfg.d. Some datasets don't have any fields.

Some datasets are "virtual" datasets made from others. Examples include "inbox", "archive", "buffer" and "deletion" which are all virtual datasets of of the "eprint" dataset. That is to say "inbox" is a subset of "eprint" and by inference contains EPrints::DataObj::EPrints. You can define your own virtual datasets which opperate on existing datasets.

User Comments


CREATING CUSTOM DATASETS

New datasets can be defined in a configuration file, e.g.

 $c->{datasets}->{bread} = {
   class => "EPrints::DataObj::Bread",
   sqlname => "bread",
 };
 

This defines a dataset with the id bread (must be unique). The dataobj package (class) to instantiate objects with is EPrints::DataObj::Bread, which must be a sub-class of EPrints::DataObj. Lastly, the database tables used by the dataset will be called 'bread' or prefixed 'bread_'.

Other optional properties:

 columns - an array ref of field ids to default the user view to
 datestamp - field id to use to sort this dataset
 import - is the dataset importable?
 index - is the dataset text-indexed?
 order - is the dataset orderable?
 virtual - completely virtual dataset (no database tables)
 

To make one dataset a virtual dataset of another (as 'inbox' is to 'eprint') use the following properties:

 confid - the super-dataset this is a virtual sub-dataset of
 dataset_id_field - the field containing the sub-dataset id
 filters - an array ref of filters to apply when retrieving records
 

As with system datasets, the EPrints::MetaFields can be defined via EPrints::DataObj/get_system_field_info or via configuration:

 $c->add_dataset_field(
   "bread",
   { name => "breadid", type => "counter", sql_counter => "bread" }
 );
 $c->add_dataset_field(
   "bread",
   { name => "toasted", type => "bool", }
 );
 $c->add_dataset_field(
   "bread",
   { name => "description", type => "text", }
 );
 

See EPrints::RepositoryConfig/add_dataset_field for details on add_dataset_field.

Creating a fully-operational dataset will require more configuration files. You will probably want at least a workflow, citations for the summary page, search results etc, and permissions and searching settings:

 push @{$c->{user_roles}->{admin}}, qw(
   +bread/create
   +bread/edit
   +bread/view
   +bread/destroy
   +bread/details
 );
 push @{$c->{plugins}->{"Export::SummaryPage"}->{params}->{accept}}, qw(
   dataobj/bread
 );
 $c->{datasets}->{bread}->{search}->{simple} = {
   search_fields => {
     id => "q",
     meta_fields => [qw(
       breadid
       description
     )],
   },
 };
 

User Comments


Object Methods

User Comments


$id = $ds->base_id

 $ds = $repo->dataset( "inbox" );
 $id = $ds->base_id; # returns "eprint"
 

Returns the identifier of the base dataset for this dataset (same as id unless this dataset is virtual).

User Comments


$metafield = $ds->field( $fieldname )

Returns the EPrints::MetaField from this dataset with the given name, or undef.

User Comments


$id = $ds->id

Return the id of this dataset.

User Comments


$n = $ds->count( $session )

Return the number of records in this dataset.

User Comments


@fields = $ds->fields

Returns a list of the EPrints::MetaFields belonging to this dataset.

User Comments


$field = $ds->key_field

Return the EPrints::MetaField representing the primary key field.

Always the first field.

User Comments


$dataobj = $ds->make_dataobj( $epdata )

Return an object of the class associated with this dataset, always a subclass of EPrints::DataObj.

$epdata is a hash of values for fields in this dataset.

Returns $epdata if no class is associated with this dataset.

User Comments


$obj = $ds->create_dataobj( $data )

Returns a new object in this dataset based on $data or undef on failure.

If $data describes sub-objects then those will also be created.

User Comments


$dataobj = $ds->dataobj( $id )

Returns the object from this dataset with the given id, or undefined.

User Comments


$repository = $ds->repository

Returns the EPrints::Repository to which this dataset belongs.

User Comments


$searchexp = $ds->prepare_search( %options )

Returns a EPrints::Search for this dataset with %options.

User Comments


$list = $ds->search( %options )

Short-cut to prepare_search( %options )->execute.

User Comments


"satisfy_all"=>1

Satify all conditions specified. 0 means satisfy any of the conditions specified. Default is 1

User Comments


"staff"=>1

Do search as an adminstrator means you get everything back

User Comments


"custom_order" => "field1/-field2/field3"

Order the search results by field order. prefixing the field name with a "-" results in reverse ordering

User Comments


"search_fields" => \@({meta_fields=>[ "field1", "field2" "document.field3" ], merge=>"ANY", match=>"EX", value=>"bees"}, {meta_fields=>[ "field4" ], value=>"honey"});

Return values where field1 field2 or field3 is "bees" and field2 is "honey" (assuming satisfy all is set)

User Comments


"limit" => 10

Only return 10 results

User Comments


$list = $ds->list( $ids )

Returns a EPrints::List for this dataset for the given $ids list.

User Comments


COPYRIGHT

User Comments