Difference between revisions of "Talk:API:EPrints"

From EPrints Documentation
Jump to: navigation, search
(XHTML: Changed page() API)
 
(158 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
I'm going to use this page to get my thoughts in order. [[User:Cjg|Cjg]] 16:58, 2 September 2009 (BST)
 
I'm going to use this page to get my thoughts in order. [[User:Cjg|Cjg]] 16:58, 2 September 2009 (BST)
  
===Current 3.1 System===
+
==Current 3.1 System==
Session, Repository, DataSet, MetaField
 
  
===API===
+
===Unsessioned Classes===
Plan:
 
* RepositoryHandle (was Session)
 
* MetaFieldHandle
 
* DataSetHandle
 
* Repository, MetaField and DataSet still exist but are not part of the API.
 
  
OK. The problems:
+
These classes don't store a session internally resulting in methods like $foo->render( $session, ARGS ).
* what to name the new modules
 
* if we should rename the old modules? s/DataSet/DataSetConfig/
 
* What methods to use
 
* What $foo variable name to conventially use to refer to this item.
 
  
== repo ==
+
* Repository
  $repo = EPrints->repository( "devel", noise=>1 );
+
* DataSet
  $repo = EPrints->repository_from_request( noise=>3 );
+
* MetaField
 +
* Language
 +
 
 +
===Sessioned Classes===
 +
 
 +
These classes store a session internally resulting in methods like $foo->render( ARGS ).
 +
 
 +
* Session
 +
* DataObj
 +
* Plugin
 +
* List
 +
* Search
 +
* Database
 +
* Workflow
 +
* ScreenProcessor
 +
 
 +
==API==
 +
New Plan(!)
 +
* Merge Session and Repository into a single class.
 +
* Move XML functions into their own class.
 +
* Move Page functions into their own class.
 +
* Add a link to repository for dataset and metafield.
 +
* Ensure cleanup when repository object goes out of scope.
 +
* Make EPrints->new() return an eprints object which can pass out repository objects.
 +
** repository objects don't have a link to the EPrints object EVER.
 +
** When the eprints object is DESTROY'd it takes out the repositories, datasets and metafields etc.
 +
 
 +
== Stage 0.9 API ==
 +
 
 +
We aim to finish these modules in the next few days, the 2nd list below require some more thought and are less central.
 +
 
 +
=== Rules of the Road ===
 +
 
 +
API-specific stuff - it is assumed general EPrints/Perl style will be followed.
 +
 
 +
==== Parameters ====
 +
 
 +
* Except where very short (less than 3) and very unlikely to be expanded should be a named list:
 +
$person->slap( fish => 'haddock', weight => '3lb', side => 'right' );
 +
* Acceptable to mix required as params and optional as args?
 +
$person->set_name( "Smith", "John", honourific => "Dr." );
 +
* Context is at end:
 +
$list->map( CALLBACK, CONTEXT );
 +
 
 +
==== Results ====
 +
 
 +
* Return a single value type (either as a single or list of values)
 +
* Bounded or static lists may be returned as a list:
 +
@bits = $lump->chisel( 10, 5 );
 +
@limbs = $person->parts;
 +
* All other lists should be returned as a reference to an array
 +
$matches = $google->results;
 +
 
 +
==== Nomenclature ====
 +
 
 +
* Accessors are just the name of the attribute:
 +
$head = $person->head;
 +
* Setters are the name of the attribute prefixed with "set_":
 +
$person->set_head( $head );
 +
 
 +
==== Exceptions ====
 +
 
 +
Exceptions should be exceptional and only used when continued processing is impossible or presents a risk of data corruption. The normal return on requesting a non-existant entity should be to return undef (and clearly documented as such).
 +
 
 +
=== Remaining Issues ===
 +
 
 +
==== values/value ====
 +
 
 +
$dataobj->value( "x" );
 +
returns the value of field X.
 +
$field->values()
 +
returns all possible values of the field, this is a confusing similarity.
 +
 
 +
Options:
 +
# Keep as is
 +
# rename $field->values()
 +
# remove $field->values() from the 1.0 API.
 +
 
 +
I vote for the removal. [[User:Cjg|Cjg]] 11:57, 1 October 2009 (BST)
 +
 
 +
==== conf/config ====
 +
 
 +
Should the config getting method be
 +
$repository->config(...)
 +
or
 +
$repository->conf(...)
 +
 
 +
I don't care: [[User:Cjg|Cjg]] 11:57, 1 October 2009 (BST)
 +
 
 +
==== text_phrase ====
 +
 
 +
* Should $xhtml->text_phrase move out of XHTML
 +
** cjg: I vote to move it to $repository. So we have $repository->phrase() and $repository->xhtml->phrase()
 +
*** TDB:  Lukewarm. I'm not sure about $xhtml->phrase - the XHTML class doesn't own phrases in the same way the repository does via Lang objects (->phrase is an accessor?).
 +
*** TDB Ideally I would like $repo->phrase to return XML/XHTML because->xhtml_phrase is currently used 3x more often than ->phrase and this better represents what's going on anyway (phrases *are* XML).
 +
*** TDB: To make stringification less ambiguous?:
 +
$repo->phrase_as_string( $phraseid ); # $repo->xml->to_string
 +
$repo->phrase_as_text_dump( $phraseid );  # $repo->xhtml->to_text_dump
 +
**** CJG: I kinda hate those method names. Clearly this issue remains open!
 +
 
 +
==== Return value from $dataset->get_field ====
 +
(kinda a test case)
 +
General principle:
 +
 
 +
  Empty [] = it worked, but result set is empty.
 +
  undef = expected error (eg field doesn't exist)
 +
  die = unexpected error
 +
 
 +
Ben Wheeler: The question is, is attempting to access a non-existant field really an
 +
expected error condition? It shouldn't happen in normal circumstances
 +
if the configuration is right, so I'd argue it's a death condition,
 +
although worth trapping in an eval {} in certain circumstances (eg importing).
 +
Rule of thumb: Should the program sensibly carry on regardless if this
 +
function call fails and some idiot fails to check the return value?
 +
If 90% of the time the caller will likely want to "do_foo() or die"
 +
then do_foo() should die for them if it fails, and the other 10% they
 +
can indicate acceptance of risk by having to use "eval {}; if ($@)..."
 +
(Where 'die' obviously means 'abort nicely with helpful browser
 +
error message etc'...)
 +
 
 +
* CJG: Mostly we want to be able to make import/export plugins which don't explode when a field is missing. If someone does foreach my $badger ( @{ $eprint->get_values( "badgers" ) } ) {} that'll still explode if they don't check it's not undef, but it's probably cleaner than using has_field() etc.
 +
** CJG: So I'm happy with the above convention. undef=expected, abort for "unexpected". Agreed?
 +
** TDB: What about warning on get_value( "invalid" )? i.e. the user should do defined(get_field) for anything non-core but we're not going to break plugins if the admin removes fields? The alternative is to write plugins properly ofc :-)
 +
 
 +
[[User:Cjg|Cjg]] 11:22, 30 September 2009 (BST): How about:
 +
 
 +
$f = $dataset->field( "foo" );
 +
$v = $dataobj->value( "bar" );
 +
 
 +
$v is the value of the field OR undef if the field is not defined. undef may also mean the field exists but is a non-multiple field with no value. If you care about the existence of the field you test "defined $f".
 +
 
 +
This will still cause plugins to crash in the situation where they expected a list reference but got undef, but is pretty good. The other option would be to say that $v returns undef rather than [] to force people to test for it.
 +
 
 +
* [[User:Cjg|Cjg]] 12:04, 7 October 2009 (BST): Better still, make it a config option which defaults to "abort" and can also be set to "warn" and "ignore" on what occurs when you call  $v = $dataobj->value( "bar" ); and bar does not exist as a field. The abort should give info about how to fix the code AND what the config option to ignore it is.
 +
 
 +
=== EPrints ===
 +
$ep = EPrints->new();
 +
@ids = $ep->repository_ids; # list active repository ids
 +
  $repo = $ep->repository( "devel", noise=>1 );
 +
  $repo = $ep->current_repository(); # from Apache::Request URI
 
  EPrints->abort( $message );
 
  EPrints->abort( $message );
  $xml = $repo->xml();
+
 
 +
=== Repository ===
 +
  $xml = $repo->xml;
 
  $dataset = $repo->dataset( "user" );
 
  $dataset = $repo->dataset( "user" );
$repository->log( $message );
 
$config_element = $repository->config( $key, [@subkeys] );
 
 
  $user = $repo->current_user;
 
  $user = $repo->current_user;
 
  $query = $repo->query;
 
  $query = $repo->query;
 +
$current_page_url = $repo->current_url( host => 1, path => 1, query => 1, etc. );
 +
$config_element = $repo->config( $key, [@subkeys] );
 +
$repository->log( $message );
 
  $string = $repo->query->param( "X" );
 
  $string = $repo->query->param( "X" );
 
  $repo->redirect( $url );
 
  $repo->redirect( $url );
$current_page_url = $repo->current_url;
 
MAYBE:
 
 
  $eprint = $repo->eprint( 23 );
 
  $eprint = $repo->eprint( 23 );
 
  $user = $repo->user( 23 );
 
  $user = $repo->user( 23 );
 
  $user = $repo->user_by_username( "cjg" );
 
  $user = $repo->user_by_username( "cjg" );
 
  $user = $repo->user_by_email( 'cjg@ecs.soton.ac.uk' );
 
  $user = $repo->user_by_email( 'cjg@ecs.soton.ac.uk' );
== dataset ==
+
 
 +
* TDB: config is abbr. of "configuration", is there any name conflict on just 'conf' (keep same name as get_conf)?
 +
 
 +
=== Dataset ===
 +
$dataset = $repo->dataset( "eprint" )
 +
$string = $dataset->base_id; # eprint
 +
$string = $dataset->id; # inbox
 +
 +
$repo = $dataobj->repository;
 +
 +
$dataobj = $dataset->create_dataobj( $data );
 
  $user = $dataset->dataobj( 23 );
 
  $user = $dataset->dataobj( 23 );
 +
 
  $search = $dataset->prepare_search( %options );
 
  $search = $dataset->prepare_search( %options );
  $list = $dataset->( @ids );
+
  $list = $dataset->search( %options ); # prepare_search( %options )->execute
== search ==
+
  $list = $dataset->search; # match ALL
$list = $search->execute();
+
   
== XML ==
+
  $metafield = $dataset->field( $fieldname );
$utf8_string = $xml->to_string( $dom_node );
+
  $metafield = $dataset->key_field;
$dom_node = $xml->parse_string( $string );
+
  @metafields = $dataset->fields;  
  $dom_node = $xml->parse_file( $filename );
+
   
  $dom_node = $xml->parse_url( $url );
+
  $dataset->search->map( sub {}, $ctx );
  $dom_node = $xml->clone( $dom_node );
+
  $n = $dataset->search->count;  
  $xhtml_dom_node = $xml->render_ruler;
+
  $ids = $dataset->search->ids;
  $xhtml_dom_node = $xml->render_nbsp;
+
  $list = $dataset->list( \@ids );
$xhtml_dom_node = $xml->render_link( $url, %opts ); #nb will require clever hack if scalar @opts = 1;
 
$xhtml_dom_node = $xml->render_name( $namehash, $familylast ); # I'd like to get away from boolean params -- too confusing!
 
  $xhtml_dom_node = $xml->render_input_field( %opts ); # nb. noenter & hidden are now options.
 
  $xhtml_dom_node = $xml->render_form( $method, $url );
 
$dom_node = $xml->make_element( $element_name, %attributes );
 
$dom_node = $xml->make_text( $utf8string );
 
  $dom_node = $xml->make_comment( $utf8string )
 
$dom_node = $xml->make_doc_fragment;
 
  $page = $xml->build_page( %opts );
 
$xhtml_dom_node = $xml->html_phrase( $phrase_id );
 
  $utf8string = $xml->text_phrase( $phrase_id );
 
Could we shorten this to $xml->ruler $xml->text etc...?
 
=== Not in API? ===
 
is_dom?
 
write_xhtml_file
 
== Page ==
 
$page->send( %options );
 
$page->write_to_file( $filename );
 
== Search ==
 
$search = $dataset->prepare_search( order => "-date", satisfy_all=>1 );
 
# the below call needs replacing with something less sucky.
 
$search->add_field( $ds->get_field( "type" ), qw/ article book /, "EQ", "ANY" );
 
$list = $search->execute;
 
  
Still needing API working out: dataobj,eprint,user,subject,file,document,search,list,page
+
=== list ===
 +
$n = $list->count;
 +
$list->map( sub {}, $ctx );
 +
$dataobj = $list->item( offset );
 +
@dataobjs = $list->slice( offset, length );
 +
# tdb: consistent with creating a list: unbounded returns should use array ref?
 +
\@ids = $list->ids;
  
== Current ==
+
=== XML ===
 +
$doc = $xml->parse_string( $string );
 +
$doc = $xml->parse_file( $filename );
 +
$doc = $xml->parse_url( $url );
 +
 +
$utf8_string = $xml->to_string( $dom_node, %opts );
 +
 +
$dom_node = $xml->clone( $dom_node ); # deep
 +
$dom_node = $xml->clone_node( $dom_node ); # shallow
 
   
 
   
 
+
$dom_node = $xml->contents_of( $dom_node ); # clone and return child nodes
  $new_list = $list->reorder( "-creation_date" ); # makes a new list ordered by reverse order creation_date
+
  $utf8_string = $xml->text_contents_of( $dom_node ); # Return text child nodes as a string
 
   
 
   
  $new_list = $list->union( $list2, "creation_date" ) # makes a new list by adding the contents of $list to $list2. the resulting list is ordered by "creation_date"
+
  $dom_node = $xml->create_element( $name, %attr );
 +
$dom_node = $xml->create_text_node( $value );
 +
$dom_node = $xml->create_comment( $value );
 +
$dom_node = $xml->create_document_fragment;
 
   
 
   
  $new_list = $list->remainder( $list2, "title" ); # makes a new list by removing the contents of $list2 from $list orders the resulting list by title
+
  $bool = $xml->is( $dom_node, "Text" );
 
   
 
   
  $n = $list->count() # returns the number of items in the list
+
  $xml->dispose( $dom_node );
 +
 
 +
=== XHTML ===
 +
$xhtml = $repo->xhtml;
 
   
 
   
  @dataobjs = $list->get_records( 0, 20 ); #get the first 20 DataObjs from the list in an array
+
  $utf8_string = $xhtml->to_xhtml( $dom_node, %opts ); # remove NS prefixes, fix <script> etc.
 
   
 
   
  $list->map( $function, $info ) # performs a function on every item in the list. This is very useful go and look at the detailed description.
+
  $xhtml_dom_node = $xhtml->input_field( $name, $value, %opts ); # nb. type & noenter are now options.
 +
$xhtml_dom_node = $xhtml->hidden_field( $name, $value, %opts ); # tdb: this is used *a lot* and is well defined
 +
$xhtml_dom_node = $xhtml->text_area_field( $name, $value, %opts ); # tdb: value becomes a child text node
 +
$xhtml_dom_node = $xhtml->form( $method, $url );
 
   
 
   
  $plugin_output = $list->export( "BibTeX" ); #calls Plugin::Export::BibTeX on the list.
+
  $xhtml_dom_node = $xhtml->data_element( $name, $value, %opts ); # tdb: render_data_element
 
   
 
   
  $dataset = $list->get_dataset(); #returns the dataset in which the containing objects belong
+
  $page = $xhtml->page( $map, %opts );
 
   
 
   
 +
* Still under discussion
 +
# $repo->xhtml->phrase( $phrase_id )
 +
$xhtml_dom_node = $xhtml->phrase( $phrase_id, %pins );
 +
# is this an XHTML function or move to Repository?
 +
$utf8string = $xhtml->text_phrase( $phrase_id, %pins );
  
 +
=== Page ===
 +
$page->send( %options );
 +
$page->write_to_file( $filename );
  
 +
=== DataObj ===
  
 
+
  $dataobj = $dataset->dataobj( $id );
  my $field = $dataset->get_field( $fieldname );
+
$dataobj->delete;
 +
$dataobj->commit( $force );
 
   
 
   
# you must clone a field to modify any properties
+
  $dataset = $dataobj->dataset;
  $newfield = $field->clone;
+
  $repo = $dataobj->repository;
  $newfield->set_property( $property, $value );
 
 
   
 
   
  $name = $field->get_name;
+
  $id = $dataobj->id;
  $type = $field->get_type;
+
  $dataobj->set_value( $fieldname, $value );
  $value = $field->get_property( $property );
+
  $value = $dataobj->value( $fieldname );
  $boolean = $field->is_type( @typenames );
+
  \@value = $dataobj->value( $fieldname ); # multiple
  $results = $field->call_property( $property, @args );  
+
  $boolean = $dataobj->is_set( $fieldname );
# (results depend on what the property sub returns)
 
 
   
 
   
  $xhtml = $field->render_name( $handle );
+
  $xhtml = $dataobj->render_value( $fieldname );
$xhtml = $field->render_help( $handle );
+
  $xhtml = $dataobj->render_citation( $style, %opts );
$xhtml = $field->render_value( $handle, $value, $show_all_langs, $dont_include_links, $object );
 
  $xhtml = $field->render_single_value( $handle, $value );
 
$xhtml = $field->get_value_label( $handle, $value );
 
 
   
 
   
  $values = $field->get_values( $handle, $dataset, %opts );
+
  $uri = $dataobj->uri;
 +
$url = $dataobj->url;
 
   
 
   
  $sorted_list = $field->sort_values( $handle, $unsorted_list );
+
  $string = $dataobj->export( $plugin_id, %opts );
 +
$dataobj = $dataobj->create_subobject( $fieldname, $epdata );
  
 +
=== MetaField ===
 +
Now has a handle on both it's repository/session AND dataset.
  
 
+
  my $field = $dataset->field( $fieldname );
  my $ds = $handle->get_dataset( "inbox" );
+
   
  my $ds = $repository->get_dataset( "inbox" );
+
$dataset = $field->dataset;
 +
$repo = $field->repository;
 
   
 
   
  $confid = $ds->confid; # eprint
+
  $field->set_property( $property, $value );
  $id = $ds->id;         # inbox
+
  $value = $field->property( $property );
 
   
 
   
  $metafield = $ds->get_field( $fieldname );
+
  $name = $field->name;
  $metafield = $ds->get_key_field;
+
  $type = $field->type;
$bool = $ds->has_field( $fieldname );
 
@metafields = $ds->get_fields;
 
 
   
 
   
  $n = $ds->count( $handle );
+
  $xhtml = $field->render_name;
  $ds->map( $handle, $fn, $info );
+
  $xhtml = $field->render_help;
  @ids = $dataset->get_item_ids( $handle );
+
  $xhtml = $field->render_value_label( $value );
 
   
 
   
  $obj = $ds->create_object( $handle, $data );
+
  $values = $field->all_values( %opts );
 +
$sorted_list = $field->sort_values( $unsorted_list );
  
 +
== Deferred from 0.9 ==
  
  
 +
==== tree_to_utf8 ====
 +
$utf8_string = $xml->to_text( $dom_node, %opts ); # nee tree_to_utf8
  
 +
* What do we rename tree_to_utf8 to?
 +
** currently $xml->to_text()
 +
** CJG: I vote to move it to $xhtml
 +
** Also to_text() needs renaming
 +
*** CJG: It might be better to actually defer adding it until we have a good name!
 +
*** CJG: Or we could have $xhtml->to_text() which applies basic markup and $xml->to_text() which doesn't!
 +
**** TDB: $xhtml->to_text_dump( $dom_node, %opts ); # dump == elinks verb
 +
**** TDB: $xhtml->dump( $dom_node, %opts );
  
 +
==== $dataobj->render ====
  
 +
* ($xhtml, $title, $head_elements) = $dataobj->render; is ugly in      the way it returns the result
 +
** CJG: I vote to make a new function with a better name & return values
 +
** Can't just change what it returns as it's an  established library function
 +
** CJG: Not sure what to name it to. render_page?
 +
*** This would be confusing as it's not related to  the xhtml page function.
 +
**** maybe render_page_parts?
 +
*** CJG: Either way should return a hash or a ( body=>...., title=>...., head=>.... )
 +
** TDB: Should this be:
 +
$xhtml = $dataobj->render_head_elements;
 +
$xhtml = $dataobj->render_title;
 +
$xhtml = $dataobj->render_body;
 +
*** CJG: That sounds sane, but the current eprint_render.pl function returns all 3 in one go! We could split it up in 3.2 default config, but make it also work with plain old eprints_render.pl -- actually, it feels like $c->{eprint_render} should become a hashref:
 +
$c->{eprint_render}->{body} $c->{eprint_render}->{title} etc... nice and extendable!
 +
*** Seb: $parts = $dataobj->render_parts? $parts->{head} $parts->{title}  $parts->{body} I'd rather use a hash ref than a hash.... I'm not sure how perl handles returned values, but if done via the stack, it makes more sense to send back a ref (cf in C -> "object *").
 +
**** CJG: The stack isn't significant in a high level function like this, only in lower ones like make_element. So % or \% is style, rather than opimisation.
 +
*** Ben W: Thinking about the way that pages are built though, I'm not sure I want to get all those things at once, and function that return multiple things usually end up being a pain to remember or extend. How about this?
 +
# Build the <head> section
 +
$dataobj1->render_head;
 +
$dataobj2->render_head;
 +
 +
# Build the rest of it
 +
$dataobj1->render_body;
 +
$dataobj2->render_body;
 +
*** TDB: We currently have a half-replacement of eprint-render with a citation. I would rather have the 3.2 default to be XML that calls eprint_render. We can then add XML templates for "title" and "head" which also call eprint_render. (And just take the hit on the occasional triple call). The goal should be to remove Perl from the eprint page rendering - in future we'll have plugins that will poke bits into that page.
  
 +
== Stage 1.0 API ==
  
 +
This is still required to make the basic API complete, but we'll worry about once we've got the stuff above finalisedish.
  
 
+
=== search ===
nherrits all methods from EPrints::DataObj.
+
TERM_EQUALS = "EQ"
 +
TERM_INDEX = "IN"
 +
TERM_EXACT = "EX"
 +
 +
MATCH_ALL = "ALL"
 +
MATCH_ANY = "ANY"
 +
 +
$search = $dataset->prepare_search( order => "-date", satisfy_all=>1 );
 +
$search = $dataset->prepare_search( terms => [], filters => [], ... );
 
   
 
   
  # create a new document on $eprint
+
  $list = $search->execute;
my $doc_data = {
 
  _parent => $eprint,
 
  eprintid => $eprint->get_id,
 
};
 
my $doc_ds = $handle->get_dataset( 'document' );
 
my $document = $doc_ds->create_object( $handle, $doc_data );
 
 
   
 
   
  # Add files to the document 
+
  # constructor interface
  $success = $doc->add_file( $file, $filename, [$preserve_path] );
+
  $s = $dataset->prepare_search( order => "-date", satisfy_all => 1, terms => [
$success = $doc->upload( $filehandle, $filename [, $preserve_path [, $filesize ] ] );
+
  { fields => "type", values => [qw( article book )], op => TERM_EQUALS, match => MATCH_ANY },
  $success = $doc->upload_archive( $filehandle, $filename, $archive_format );
+
  { fields => [qw( title abstract )], values => ["dogbert dilbert"], op => TERM_INDEX, match => MATCH_ALL },
  $success = $doc->add_archive( $file, $archive_format );
+
  { fields => "userid", values => "52", op => TERM_EXACT },
  $success = $doc->add_directory( $directory );
+
] );
  $success = $doc->upload_url( $url );
+
* tdb: meta_fields/fields/field, value/values?
 +
** cjg: could have choice of fields=>["",""] or field=>""
 +
 
 +
# method interface
 +
  $s = $dataset->prepare_search( order => "-date", satisfy_all => 1 );
 +
$s->add_term( "type", [qw( article book )], op => TERM_EQUALS, match => MATCH_ANY );
 +
  $s->add_term( [qw( title abstract )], ["dogbert dilbert"], op => TERM_INDEX, match => MATCH_ALL );
 +
  $s->add_term( "userid", "52", op => TERM_EXACT );
 +
  $list = $s->execute;
 
   
 
   
  # get an existing document
+
  print $list->count, "\n";
  $document = $handle->get_document( $doc_id );
+
 
  # or
+
* tdb: I would like to make the value argument not automagically-split on ANY but explicitly use an array ref of values to match.
  foreach my $doc ( $eprint->get_all_documents ) { ... }
+
** cjg: Agreed.
 +
 
 +
==== Maybe ====
 +
$s->add_term( $dataset->field( "type" ), [qw/ article book/], merge => MERGE_ANY );
 +
* cjg: I'm not convinced we need this.
 +
 
 +
=== SubObject ===
 +
* tdb: superclass of Document, File, History, SavedSearch
 +
  $dataobj = $dataobj->parent;
 +
 
 +
=== EPrint ===
 +
 
 +
$eprint->set_status( "inbox" );
 +
 
 +
* tdb: consistency - ->documents ~ ->value( "documents" ) and ->value() returns array ref for multiple
 +
  my $docs = $eprint->documents( %opts ); # isVolatileVersionOf => 0 ?
 +
  foreach my $document ( @$docs ) { .. }
 
   
 
   
 +
# create a new document on $eprint
 +
$doc_data = { ... };
 +
$doc = $eprint->create_subobject( "documents", $doc_data );
 
  # eprint to which this document belongs
 
  # eprint to which this document belongs
  $eprint = $doc->get_eprint;
+
  $eprint = $doc->parent;
 +
 
 +
=== User ===
 +
$user = $repository->dataset( "user" )->dataobj( 23 );
 +
* tdb: is this accessor or action? c.f. send_page()
 +
** It's an action. [[User:Cjg|Cjg]] 20:04, 17 September 2009 (BST)
 +
$user->send_email( .... )
 +
 
 +
=== Subject ===
 +
 
 +
$subject = $repository->dataset("subject")->dataobj( "FOO" );
 +
\@subjects = $subject->children;
 +
\@subjects = $subject->parents;
 +
* CJG: issue with confusion with subobjects like document, or are these really subobjects? Maybe child_subjects parent_subjects to avoid overlapping names?
 +
* tdb: subobject relation = if you remove parent, children are removed - is this true of subjects? Could extend subobject concept to be M:N
 +
** $subject->value( "children" ) = ->children
 +
** $subject->value( "parents" ) = ->parents
 +
 
 +
=== Document ===
 +
$doc = $repo->dataset( "document" )->dataobj( 23 );
 
   
 
   
 
  # delete a document object *forever*:
 
  # delete a document object *forever*:
  $success = $doc->remove;
+
  $ok = $doc->delete;
 +
 +
$url = $doc->url( [$file] );
 +
* tdb: is $file ^^ a file object or filename?
 +
** [[User:Cjg|Cjg]] 20:05, 17 September 2009 (BST): Duh, we should just use $doc->file_by_filename( $file )->get_url;
 +
 
 +
# change the file which is used as the URL for the document.
 +
$doc->set_value( "main", "foo.html" );
 
   
 
   
  $url = $doc->get_url( [$file] );
+
# dealing with files
  $path = $doc->local_path;
+
  $file = $doc->file_by_filename( $filename ); # nee get_stored_file
  %files = $doc->files;
+
  \@files = $doc->value( "files" );
 +
$file->value( "filename" );
 +
  $file->value( "filesize" );
 
   
 
   
 
  # delete a file
 
  # delete a file
  $success = $doc->remove_file( $filename );
+
  $doc->file_by_filename( $filename )->delete;
 +
 
  # delete all files
 
  # delete all files
  $success = $doc->remove_all_files;
+
  $files = $doc->value( "files" );
   
+
  foreach my $file (@$files) { $file->delete; }
# change the file which is used as the URL for the document.
 
$doc->set_main( $main_file );
 
 
   
 
   
  # icons and previews
+
  # icons and previews???? These need work!
 
  $xhtml = $doc->render_icon_link( %opts );
 
  $xhtml = $doc->render_icon_link( %opts );
 
  $xhtml = $doc->render_preview_link( %opts );
 
  $xhtml = $doc->render_preview_link( %opts );
 +
 +
==== Maybe ====
 +
# Add files to the document
 +
# tdb: these should be deprecated from Document and moved to Import/Screen plugins
 +
$success = $doc->upload( $filehandle, $filename [, $preserve_path [, $filesize ] ] );
 +
$success = $doc->upload_archive( $filehandle, $filename, $archive_format );
 +
$success = $doc->add_archive( $file, $archive_format );
 +
$success = $doc->add_directory( $directory );
 +
$success = $doc->upload_url( $url );
 +
 +
=== File ===
 +
$file = $history->file_by_filename( "dataobj.xml" );
 +
$file->retrieve( sub { ... }, $info ); # callback-based file content retrieval
 +
 +
* I don't know enough about this method to know what we *must* have in. Less is better IMO, though. [[User:Cjg|Cjg]] 01:23, 17 September 2009 (BST)
 +
* tdb: files should probably be only created via a parent object

Latest revision as of 22:04, 16 December 2009

I'm going to use this page to get my thoughts in order. Cjg 16:58, 2 September 2009 (BST)

Current 3.1 System

Unsessioned Classes

These classes don't store a session internally resulting in methods like $foo->render( $session, ARGS ).

  • Repository
  • DataSet
  • MetaField
  • Language

Sessioned Classes

These classes store a session internally resulting in methods like $foo->render( ARGS ).

  • Session
  • DataObj
  • Plugin
  • List
  • Search
  • Database
  • Workflow
  • ScreenProcessor

API

New Plan(!)

  • Merge Session and Repository into a single class.
  • Move XML functions into their own class.
  • Move Page functions into their own class.
  • Add a link to repository for dataset and metafield.
  • Ensure cleanup when repository object goes out of scope.
  • Make EPrints->new() return an eprints object which can pass out repository objects.
    • repository objects don't have a link to the EPrints object EVER.
    • When the eprints object is DESTROY'd it takes out the repositories, datasets and metafields etc.

Stage 0.9 API

We aim to finish these modules in the next few days, the 2nd list below require some more thought and are less central.

Rules of the Road

API-specific stuff - it is assumed general EPrints/Perl style will be followed.

Parameters

  • Except where very short (less than 3) and very unlikely to be expanded should be a named list:
$person->slap( fish => 'haddock', weight => '3lb', side => 'right' );
  • Acceptable to mix required as params and optional as args?
$person->set_name( "Smith", "John", honourific => "Dr." );
  • Context is at end:
$list->map( CALLBACK, CONTEXT );

Results

  • Return a single value type (either as a single or list of values)
  • Bounded or static lists may be returned as a list:
@bits = $lump->chisel( 10, 5 );
@limbs = $person->parts;
  • All other lists should be returned as a reference to an array
$matches = $google->results;

Nomenclature

  • Accessors are just the name of the attribute:
$head = $person->head;
  • Setters are the name of the attribute prefixed with "set_":
$person->set_head( $head );

Exceptions

Exceptions should be exceptional and only used when continued processing is impossible or presents a risk of data corruption. The normal return on requesting a non-existant entity should be to return undef (and clearly documented as such).

Remaining Issues

values/value

$dataobj->value( "x" ); 

returns the value of field X.

$field->values()

returns all possible values of the field, this is a confusing similarity.

Options:

  1. Keep as is
  2. rename $field->values()
  3. remove $field->values() from the 1.0 API.

I vote for the removal. Cjg 11:57, 1 October 2009 (BST)

conf/config

Should the config getting method be

$repository->config(...)

or

$repository->conf(...)

I don't care: Cjg 11:57, 1 October 2009 (BST)

text_phrase

  • Should $xhtml->text_phrase move out of XHTML
    • cjg: I vote to move it to $repository. So we have $repository->phrase() and $repository->xhtml->phrase()
      • TDB: Lukewarm. I'm not sure about $xhtml->phrase - the XHTML class doesn't own phrases in the same way the repository does via Lang objects (->phrase is an accessor?).
      • TDB Ideally I would like $repo->phrase to return XML/XHTML because->xhtml_phrase is currently used 3x more often than ->phrase and this better represents what's going on anyway (phrases *are* XML).
      • TDB: To make stringification less ambiguous?:
$repo->phrase_as_string( $phraseid ); # $repo->xml->to_string
$repo->phrase_as_text_dump( $phraseid );  # $repo->xhtml->to_text_dump
        • CJG: I kinda hate those method names. Clearly this issue remains open!

Return value from $dataset->get_field

(kinda a test case) General principle:

 Empty [] = it worked, but result set is empty.
 undef = expected error (eg field doesn't exist)
 die = unexpected error

Ben Wheeler: The question is, is attempting to access a non-existant field really an expected error condition? It shouldn't happen in normal circumstances if the configuration is right, so I'd argue it's a death condition, although worth trapping in an eval {} in certain circumstances (eg importing). Rule of thumb: Should the program sensibly carry on regardless if this function call fails and some idiot fails to check the return value? If 90% of the time the caller will likely want to "do_foo() or die" then do_foo() should die for them if it fails, and the other 10% they can indicate acceptance of risk by having to use "eval {}; if ($@)..." (Where 'die' obviously means 'abort nicely with helpful browser error message etc'...)

  • CJG: Mostly we want to be able to make import/export plugins which don't explode when a field is missing. If someone does foreach my $badger ( @{ $eprint->get_values( "badgers" ) } ) {} that'll still explode if they don't check it's not undef, but it's probably cleaner than using has_field() etc.
    • CJG: So I'm happy with the above convention. undef=expected, abort for "unexpected". Agreed?
    • TDB: What about warning on get_value( "invalid" )? i.e. the user should do defined(get_field) for anything non-core but we're not going to break plugins if the admin removes fields? The alternative is to write plugins properly ofc :-)

Cjg 11:22, 30 September 2009 (BST): How about:

$f = $dataset->field( "foo" );
$v = $dataobj->value( "bar" );

$v is the value of the field OR undef if the field is not defined. undef may also mean the field exists but is a non-multiple field with no value. If you care about the existence of the field you test "defined $f".

This will still cause plugins to crash in the situation where they expected a list reference but got undef, but is pretty good. The other option would be to say that $v returns undef rather than [] to force people to test for it.

  • Cjg 12:04, 7 October 2009 (BST): Better still, make it a config option which defaults to "abort" and can also be set to "warn" and "ignore" on what occurs when you call $v = $dataobj->value( "bar" ); and bar does not exist as a field. The abort should give info about how to fix the code AND what the config option to ignore it is.

EPrints

$ep = EPrints->new();
@ids = $ep->repository_ids; # list active repository ids
$repo = $ep->repository( "devel", noise=>1 );
$repo = $ep->current_repository(); # from Apache::Request URI
EPrints->abort( $message );

Repository

$xml = $repo->xml;
$dataset = $repo->dataset( "user" );
$user = $repo->current_user;
$query = $repo->query;
$current_page_url = $repo->current_url( host => 1, path => 1, query => 1, etc. );
$config_element = $repo->config( $key, [@subkeys] );
$repository->log( $message ); 
$string = $repo->query->param( "X" );
$repo->redirect( $url );
$eprint = $repo->eprint( 23 );
$user = $repo->user( 23 );
$user = $repo->user_by_username( "cjg" );
$user = $repo->user_by_email( 'cjg@ecs.soton.ac.uk' );
  • TDB: config is abbr. of "configuration", is there any name conflict on just 'conf' (keep same name as get_conf)?

Dataset

$dataset = $repo->dataset( "eprint" )
$string = $dataset->base_id; # eprint
$string = $dataset->id; # inbox

$repo = $dataobj->repository;

$dataobj = $dataset->create_dataobj( $data );
$user = $dataset->dataobj( 23 );

$search = $dataset->prepare_search( %options );
$list = $dataset->search( %options ); # prepare_search( %options )->execute
$list = $dataset->search; # match ALL

$metafield = $dataset->field( $fieldname );
$metafield = $dataset->key_field;
@metafields = $dataset->fields; 

$dataset->search->map( sub {}, $ctx );
$n = $dataset->search->count; 
$ids = $dataset->search->ids;
$list = $dataset->list( \@ids );

list

$n = $list->count;
$list->map( sub {}, $ctx );
$dataobj = $list->item( offset );
@dataobjs = $list->slice( offset, length ); 
# tdb: consistent with creating a list: unbounded returns should use array ref?
\@ids = $list->ids;

XML

$doc = $xml->parse_string( $string );
$doc = $xml->parse_file( $filename );
$doc = $xml->parse_url( $url );

$utf8_string = $xml->to_string( $dom_node, %opts );

$dom_node = $xml->clone( $dom_node ); # deep
$dom_node = $xml->clone_node( $dom_node ); # shallow

$dom_node = $xml->contents_of( $dom_node ); # clone and return child nodes
$utf8_string = $xml->text_contents_of( $dom_node ); # Return text child nodes as a string

$dom_node = $xml->create_element( $name, %attr );
$dom_node = $xml->create_text_node( $value );
$dom_node = $xml->create_comment( $value );
$dom_node = $xml->create_document_fragment;

$bool = $xml->is( $dom_node, "Text" );

$xml->dispose( $dom_node );

XHTML

$xhtml = $repo->xhtml;

$utf8_string = $xhtml->to_xhtml( $dom_node, %opts ); # remove NS prefixes, fix <script> etc.

$xhtml_dom_node = $xhtml->input_field( $name, $value, %opts ); # nb. type & noenter are now options.
$xhtml_dom_node = $xhtml->hidden_field( $name, $value, %opts ); # tdb: this is used *a lot* and is well defined
$xhtml_dom_node = $xhtml->text_area_field( $name, $value, %opts ); # tdb: value becomes a child text node 
$xhtml_dom_node = $xhtml->form( $method, $url );

$xhtml_dom_node = $xhtml->data_element( $name, $value, %opts ); # tdb: render_data_element

$page = $xhtml->page( $map, %opts );

  • Still under discussion
# $repo->xhtml->phrase( $phrase_id )
$xhtml_dom_node = $xhtml->phrase( $phrase_id, %pins );
# is this an XHTML function or move to Repository?
$utf8string = $xhtml->text_phrase( $phrase_id, %pins );

Page

$page->send( %options ); 
$page->write_to_file( $filename );

DataObj

$dataobj = $dataset->dataobj( $id );
$dataobj->delete;
$dataobj->commit( $force );

$dataset = $dataobj->dataset;
$repo = $dataobj->repository;

$id = $dataobj->id;
$dataobj->set_value( $fieldname, $value );
$value = $dataobj->value( $fieldname );
\@value = $dataobj->value( $fieldname ); # multiple
$boolean = $dataobj->is_set( $fieldname );

$xhtml = $dataobj->render_value( $fieldname );
$xhtml = $dataobj->render_citation( $style, %opts );

$uri = $dataobj->uri;
$url = $dataobj->url;

$string = $dataobj->export( $plugin_id, %opts );
$dataobj = $dataobj->create_subobject( $fieldname, $epdata );

MetaField

Now has a handle on both it's repository/session AND dataset.

my $field = $dataset->field( $fieldname );

$dataset = $field->dataset;
$repo = $field->repository;

$field->set_property( $property, $value );
$value = $field->property( $property );

$name = $field->name;
$type = $field->type;

$xhtml = $field->render_name;
$xhtml = $field->render_help;
$xhtml = $field->render_value_label( $value );

$values = $field->all_values( %opts );
$sorted_list = $field->sort_values( $unsorted_list );

Deferred from 0.9

tree_to_utf8

$utf8_string = $xml->to_text( $dom_node, %opts ); # nee tree_to_utf8
  • What do we rename tree_to_utf8 to?
    • currently $xml->to_text()
    • CJG: I vote to move it to $xhtml
    • Also to_text() needs renaming
      • CJG: It might be better to actually defer adding it until we have a good name!
      • CJG: Or we could have $xhtml->to_text() which applies basic markup and $xml->to_text() which doesn't!
        • TDB: $xhtml->to_text_dump( $dom_node, %opts ); # dump == elinks verb
        • TDB: $xhtml->dump( $dom_node, %opts );

$dataobj->render

  • ($xhtml, $title, $head_elements) = $dataobj->render; is ugly in the way it returns the result
    • CJG: I vote to make a new function with a better name & return values
    • Can't just change what it returns as it's an established library function
    • CJG: Not sure what to name it to. render_page?
      • This would be confusing as it's not related to the xhtml page function.
        • maybe render_page_parts?
      • CJG: Either way should return a hash or a ( body=>...., title=>...., head=>.... )
    • TDB: Should this be:

$xhtml = $dataobj->render_head_elements; $xhtml = $dataobj->render_title; $xhtml = $dataobj->render_body;

      • CJG: That sounds sane, but the current eprint_render.pl function returns all 3 in one go! We could split it up in 3.2 default config, but make it also work with plain old eprints_render.pl -- actually, it feels like $c->{eprint_render} should become a hashref:
$c->{eprint_render}->{body} $c->{eprint_render}->{title} etc... nice and extendable!
      • Seb: $parts = $dataobj->render_parts? $parts->{head} $parts->{title} $parts->{body} I'd rather use a hash ref than a hash.... I'm not sure how perl handles returned values, but if done via the stack, it makes more sense to send back a ref (cf in C -> "object *").
        • CJG: The stack isn't significant in a high level function like this, only in lower ones like make_element. So % or \% is style, rather than opimisation.
      • Ben W: Thinking about the way that pages are built though, I'm not sure I want to get all those things at once, and function that return multiple things usually end up being a pain to remember or extend. How about this?
# Build the <head> section
$dataobj1->render_head;
$dataobj2->render_head;

# Build the rest of it
$dataobj1->render_body;
$dataobj2->render_body;
      • TDB: We currently have a half-replacement of eprint-render with a citation. I would rather have the 3.2 default to be XML that calls eprint_render. We can then add XML templates for "title" and "head" which also call eprint_render. (And just take the hit on the occasional triple call). The goal should be to remove Perl from the eprint page rendering - in future we'll have plugins that will poke bits into that page.

Stage 1.0 API

This is still required to make the basic API complete, but we'll worry about once we've got the stuff above finalisedish.

search

TERM_EQUALS = "EQ"
TERM_INDEX = "IN"
TERM_EXACT = "EX"

MATCH_ALL = "ALL"
MATCH_ANY = "ANY"

$search = $dataset->prepare_search( order => "-date", satisfy_all=>1 );
$search = $dataset->prepare_search( terms => [], filters => [], ... );

$list = $search->execute;

# constructor interface
$s = $dataset->prepare_search( order => "-date", satisfy_all => 1, terms => [
 { fields => "type", values => [qw( article book )], op => TERM_EQUALS, match => MATCH_ANY },
 { fields => [qw( title abstract )], values => ["dogbert dilbert"], op => TERM_INDEX, match => MATCH_ALL },
 { fields => "userid", values => "52", op => TERM_EXACT },
] );
  • tdb: meta_fields/fields/field, value/values?
    • cjg: could have choice of fields=>["",""] or field=>""
# method interface
$s = $dataset->prepare_search( order => "-date", satisfy_all => 1 );
$s->add_term( "type", [qw( article book )], op => TERM_EQUALS, match => MATCH_ANY );
$s->add_term( [qw( title abstract )], ["dogbert dilbert"], op => TERM_INDEX, match => MATCH_ALL );
$s->add_term( "userid", "52", op => TERM_EXACT );
$list = $s->execute;

print $list->count, "\n";
  • tdb: I would like to make the value argument not automagically-split on ANY but explicitly use an array ref of values to match.
    • cjg: Agreed.

Maybe

$s->add_term( $dataset->field( "type" ), [qw/ article book/], merge => MERGE_ANY );
  • cjg: I'm not convinced we need this.

SubObject

  • tdb: superclass of Document, File, History, SavedSearch
$dataobj = $dataobj->parent;

EPrint

$eprint->set_status( "inbox" );
  • tdb: consistency - ->documents ~ ->value( "documents" ) and ->value() returns array ref for multiple
my $docs = $eprint->documents( %opts ); # isVolatileVersionOf => 0 ?
foreach my $document ( @$docs ) { .. }

# create a new document on $eprint 
$doc_data = { ... };
$doc = $eprint->create_subobject( "documents", $doc_data ); 
# eprint to which this document belongs
$eprint = $doc->parent;

User

$user = $repository->dataset( "user" )->dataobj( 23 ); 
  • tdb: is this accessor or action? c.f. send_page()
    • It's an action. Cjg 20:04, 17 September 2009 (BST)
$user->send_email( .... )

Subject

$subject = $repository->dataset("subject")->dataobj( "FOO" );
\@subjects = $subject->children;
\@subjects = $subject->parents; 
  • CJG: issue with confusion with subobjects like document, or are these really subobjects? Maybe child_subjects parent_subjects to avoid overlapping names?
  • tdb: subobject relation = if you remove parent, children are removed - is this true of subjects? Could extend subobject concept to be M:N
    • $subject->value( "children" ) = ->children
    • $subject->value( "parents" ) = ->parents

Document

$doc = $repo->dataset( "document" )->dataobj( 23 );

# delete a document object *forever*:
$ok = $doc->delete;

$url = $doc->url( [$file] );
  • tdb: is $file ^^ a file object or filename?
    • Cjg 20:05, 17 September 2009 (BST): Duh, we should just use $doc->file_by_filename( $file )->get_url;
# change the file which is used as the URL for the document.
$doc->set_value( "main", "foo.html" );

# dealing with files
$file = $doc->file_by_filename( $filename ); # nee get_stored_file
\@files = $doc->value( "files" );
$file->value( "filename" );
$file->value( "filesize" );

# delete a file
$doc->file_by_filename( $filename )->delete;

# delete all files
$files = $doc->value( "files" );
foreach my $file (@$files) { $file->delete; } 

# icons and previews???? These need work!
$xhtml = $doc->render_icon_link( %opts );
$xhtml = $doc->render_preview_link( %opts );

Maybe

# Add files to the document
# tdb: these should be deprecated from Document and moved to Import/Screen plugins
$success = $doc->upload( $filehandle, $filename [, $preserve_path [, $filesize ] ] );
$success = $doc->upload_archive( $filehandle, $filename, $archive_format );
$success = $doc->add_archive( $file, $archive_format );
$success = $doc->add_directory( $directory );
$success = $doc->upload_url( $url );

File

$file = $history->file_by_filename( "dataobj.xml" );
$file->retrieve( sub { ... }, $info ); # callback-based file content retrieval
  • I don't know enough about this method to know what we *must* have in. Less is better IMO, though. Cjg 01:23, 17 September 2009 (BST)
  • tdb: files should probably be only created via a parent object