Difference between revisions of "New Features in EPrints 3.2"

From EPrints Documentation
Jump to: navigation, search
(Preservation Planning Capabilities)
(66 intermediate revisions by 6 users not shown)
Line 1: Line 1:
DRAFT DOCUMENT (not yet approved by whole EPrints 3.2 team)
+
[[Category:Releases]]
 +
{{releasenotes}}
  
= EPrints Edit locking =  
+
==NOTE==
 +
* We now recommend LibXML in preference to GDOME. It's less buggy, and easier to install.
 +
* Upgrade may take several hours as it cleans up the unicode issues in the database.
  
Allows you to lock editing to certain users and sessions
+
==Database==
 +
* In addition to MySQL, EPrints 3.2 now supports Oracle and Postgres
  
Expected: Beta 1
+
==API==
 +
* This release features a formal API. Not all functionality is yet available via the API, but will be added slowly and carefully in future releases.
 +
* The bugbear of EPrints internals, EPrints::Session has been merged into EPrints::Repository. All old code will still work.
  
= Plug-in Based Storage Layer and Storage Controller =
+
==Documents==
 +
* Thumbnails are now documents in their own right
 +
* Built in document-format icons, as well as those you configure yourself
 +
* Thumbnailing now happens in the background as part of the indexer process
  
The EPrints Storage Layer is evolving to enable easy plug-and-plug with many storage platforms including local and multiple institutional storage as well as cloud storage. The Storage Controller enables you to use multiple storage platforms simultaneously, define rules for what is stored on each platform and also manage these platforms and migrate resources between platforms as required. More information, including the current API, can be found on the [[StorageController]] page.
+
==Deposit Interface==
 +
* Edit Locking locks records reduces risk of 2 people editing a record at the same time.
 +
* Option to extract metadata and images from OpenXML files (.docx and .pptx)
 +
* Offers options to users and editors on the deposit screen if there are problems
 +
* Document upload screen has been redesgined to be clearer.
 +
* Split document uploading into adding a new document and editing existing documents
 +
* The documents inside an EPrint may now be re-ordered
 +
* Progress bar on file upload
 +
* Document upload methods (file, url, zip etc.) are now plugin-based and can be extended
 +
* When attempting to deposit an eprint with problems show Save button
 +
* Made it an option to provide action buttons top and bottom in workflow
 +
* Added support for "input_boxes" property to the workflow, so you can now specify the number of input boxes to show for multiple fields
 +
* epc: no longer crashes eprints on bad scripts, just reports an error
  
Expected: Beta 1
+
==Search & Indexing==
 +
* The search library has been entirely re-written to reduce use of cache tables and to improve performance. Simple searches are now over ten times faster.
 +
* The indexer now uses plugins, so you can schedule other tasks, like thumbnail conversion, to be done in the background.
 +
* Added config option "cache_max" to limit the cachemap tables used
 +
* Added --clear option to cleanup old/broken indexer jobs
  
= SWORD 2 (1.3 Specification Support) =
+
==Unicode==
 +
* EPrints use of unicode has been significantly improved.
  
Conforming to the new standards set out by the SWORD project, EPrints 3.2 will include compatibility for the new features.
+
==REST==
 +
* A "REST" style interface to objects, via /rest/eprint/23/title.txt, for example. This can also support "PUT" to alter fields!
  
Expected: Beta 1
+
==SWORD2==
 +
* SWORD2 (1.3 Specification) is supported.
  
= Preservation Planning Capabilities =  
+
==Linked Data Support==
 +
* Ability to establish arbitrary relations between objects or provide additional metadata in triple form.
  
Allows EPrints to be linked with file classification tools (primarily DROID) and risk analysis services (PRONOM) which can then not only profile the content of your repository but also identify risks to objects contained within it.
+
==Collections Support==
 +
* Collections can be built via use of linked data, object ids and relationships.  
  
More information on this can be found on the [[Preservation in EPrints 3.2]] page.
+
== Semantic Web / Linked Data (RDF) ==
  
Expected: Beta 1
+
We have made a (difficult) decision to move these features to 3.2.1 (due out soon after 3.2.0) because testing showed it caused a significant slow down.
  
= Arbitrary metadata linking capabilities =
+
We're rewriting it to do the same thing but with much less overhead!
  
Allows the user to expand their data model with custom predicates which link a resources with other resources. Such an example include the derivedFrom predicate which we are already using.
+
==Storage Layer==
 +
* Now uses plugins to store files
 +
** Local Filesystem
 +
** Amazon S3
 +
** Sun Cloud Storage
  
Expected: Beta 1
+
==Speed==
 +
* Search & Indexing much faster
 +
* Import is faster
 +
* Other parts of the code have been audited for speed, and optimised.
  
= docx,xslx,pptx MS Office XML compatability =  
+
==Import==
 +
* Modified Import UI to allow a per-plugin/single/bulk workflow
  
Upon upload of these file types EPrints 3.2 will automatically fill in much of the metadata such as title, authors and abstract if possible. 3.2 will also be able to pull these files apart offering optional access to the content within them such as embedded pictures.  
+
==EPC & EPrints Script==
 +
* New EPC tag: epc:debug, which is like print but sends the XML to STDERR for debugging purposes.
 +
* New EPC tag: epc:set which defines a variable inside it's scope.
 +
* Improvements to the epc:foreach processing (better handling of multiple object types in lists)
 +
* Added "limit" option to epc:foreach to limit the number
 +
* Inside <epc:foreach> blocks an $index variable is set, allowing you to test which interation it's on.
 +
* New EPScript methods: citation_link, dataset, related_objects, url, doc_size, is_public, thumbnail_url, preview_link, icon, human_filesize, control_url, contact_email, property, substr, filter_compound_list, to_data_array, pretty_list, array_concat, action_list, action_button, action_title, action_description, action_icon
 +
* New Script methods:
 +
** $data.property($key) which takes a string and returns a property from a hash or dataobj.
 +
** $eprint.documents() which returns all the "real" (non-volatile) documents.
 +
* New Script inline math functions: + - / * %
 +
* New EPrints Script datatype: DATA_ARRAY: Represents a list of tuples of [$value, $epscript_type]
  
Expected: Beta 1 (tentative)
+
==OAI==
 +
* Stateless OAI Interface means no timing-out
 +
* Support for multiple constraints in custom OAI sets
  
= Enhancements to repository web site management =
+
==Unit Tests==
 +
* We have introduced unit-tests to improve both the short and long term quality of our code.
 +
==Metadata Types==
 +
* Counter (incrementing value)
 +
* Timestamp (defaults to the current time)
 +
* UUID
 +
* MetaField::Search now has two properties:
 +
** "namedfields" which is an array ref of field names to search OR
 +
** "namedfields_config" which is the name of a config variable
 +
* MetaField::Search can now be used in any workflow (not hard-coded to editpermfields)
 +
* A captcha pseudo-field based on http://recaptcha.net/
 +
* added "repeat_secret" property to secret fields that will render a confirmation box which is checked with validate()
 +
* Storable (store arbitrary Perl structures - internal use)
  
Taking on the push of 3.1 to make it easy for the repository manager to edit and change the repository configuration without needing access to the configuration files themselves, we are taking that another step further. Coming in 3.2 we intend to allow full look and feel (branding) editing of the main EPrints web pages and templates to be done externally to EPrints in tools such Dreamweaver and Amaya. There will also be a complementary way of uploading new image files.
+
==Administration Interface==
 +
* Converted Admin screen into several tabs.
 +
* Improved the BatchEdit interface
 +
* Show a progress bar while records are updated during batch edit
  
This editing capability is also complemented by two links which appear on certain pages enabling the administrator to directly edit the page look and feel as well as the phrases on that page.
+
==Editorial Interface==
 +
* Improved "Review" Screen
 +
* The "Review buffer" can now be filtered for better management of large review buffers.
 +
* When an editor provide the "Move to Review" button if there are problems
  
Expected: Beta 1
+
==User Defined Datasets==
 +
* Allows 3rd party tools to create their own additional datasets
 +
* Suite of interface screens to work with these new datasets
  
= Abstract Page Improvements =  
+
==Command Line Tools==
 +
* Allow eprint ids to be specified for redo_thumbnails
  
Lightboxes have been added to the abstract pages for easier previewing of documents.
+
==Export ==
 +
* Added support for OAI-ORE
 +
* Added support for JSONP
 +
* Added support for an 'n' argument to search exports
 +
* Added arguments support to export plugins. Passed by CGI arguments on abstract search or by the --arg option in bin/export
  
Expected: Beta 1
+
==Abstract Page==
 +
* Now generated with a citation
 +
* Shows an "action list", so plugins can register to appear on this page
  
= User definable datasets =  
+
==Phrases==
 +
* Primary method of editing phrases is now the web interface
 +
* Added "ref" option to phrases, which will cause the referenced phrase to be used instead - Equivalent to calling the referenced phrase directly
  
Allows you to expand the core EPrints data model with whole new types of data and datasets which can be indexed and used in searches.
+
==Views==
 +
* Entire rendering of item lists and menus can be over-ridden by a function
  
Expected: Beta 1 (command line only)
+
==Misc. Changes ==
 +
* Can now disable a repository through a system configuration setting
 +
* Refactoured DataObj::get_defaults so that you can now specify default values through a "default_value" property
 +
* Most of get_defaults() can now be specified through the metafield spec.
 +
* Can now apply multiple changes to the same field (???? I assume this means metafield?)
 +
* Preference field for users (to store k/v pairs in)
 +
* Simplified Apache configuration: generate_apacheconf will no longer overwrite existing files
  
= IR Stats =  
+
==Key Bugfixes==
 +
* Fixed login/logout pages not using phrases
 +
* Fixed spurious history objects being created on document upload
 +
* Fixed an HTML insertion bug in the <title> element [Brian D. Gregg]
 +
* Fixed schema errors in uketd_dc and METS/MODS export plugins
 +
* Fixed bug in Compound creation of Set types that squashed the set options
 +
* Fixed order static directories are searched to: repository->theme->system
 +
* Support long values in browse views by using the MD5 of the value,
 +
* Subject inputform component can now be used with singular values
 +
* Fixed bug that is_advertised property on export plugins was being ignored.
 +
* Fixed bug in indexer which meant it didn't index in a round-robin fashion.
 +
* Fixed export not respecting metadata visibility
  
Institutional Repository stats are becoming an even more important part of the repository and we hope to have these in the final 3.2 release.
 
  
Expected: Beta 2
 
  
= EPrints Scheduler =  
+
= Changes to repository configuration =
 +
We've made some changes to the configuration of a new repository. These will not be automatically applied to your current repositories when upgrading.
  
Allows tracking of events which happen as well as scheduling new events which need to take place to maintain your repository. Investigation is under way into the power of such a system and if it can be interfaced with desktop calendar programs such as iCal and Google Calendar.
+
== Suggested Changes ==
  
Expected: Beta 2
+
If upgrading from 3.1 to 3.2, the following changes to your own configuration are suggested to gain the features described above.
  
= Shelves of EPrints =
+
* cp lib/defaultcfg/cfg.d/rdf* archives/YOURID/cfg/cfg.d/
 +
* run epadmin recommit
 +
* edit cfg/lang/en/static/index.xpage and add the following to the <xpage:head> section.
 +
<xpage:head>
 +
  <link rel="alternate" type="application/rss+xml" title="Items in {phrase('archive_name')}" href="{$config{http_cgiurl}}/latest_tool?output=RSS2"></link>
 +
  <link rel="alternate" type="application/atom+xml" title="Items in {phrase('archive_name')}" href="{$config{http_cgiurl}}/latest_tool?output=Atom"></link>
 +
  <link rel="alternate" type="application/rdf+xml" title="Repository Summary RDF+XML" href="{$config{http_cgiurl}}/repositoryinfo/RDFXML/devel.rdf"></link>
 +
  <link rel="alternate" type="text/n3" title="Repository Summary RDF+N3" href="{$config{http_cgiurl}}/repositoryinfo/RDFN3/devel.n3"></link>
 +
</xpage:head>
 +
(you may already have the rss & atom bits)
  
No details to be released on this yet
+
* This list has NOT been completed yet, we're working on it!
  
Expected: Beta 2 (tentative)
+
==New and Altered Config options==
  
= Issues Raising and Resolving Tool =
+
These need documenting and noting which ones we recommend setting/altering when upgrading.
  
No description currently
+
* Set "hide_document_conversion" to hide the Convert link on the document workflow
 
+
* Broke up SystemSettings into logically named files
Expected: Beta 2 (tentative)
+
* Can now disable a repository through a system configuration setting
 +
* Moved most of eprint_render.pl into a citation file: summary_page.xml
 +
* Updated defaults views.pl to show current configuration style
 +
* Improved document_upload.pl layout to make it easier to add/remove suffix to mimetype mappings.
 +
* Added URI to EPrint Summary Page
 +
* Added RDF+XML and N3+NT Document formats
 +
* New metafield option: $defaults{render_max_search_values} = 5;
 +
* Added "show_help" option to workflow component to disable collapsing Usage: show_help={always,toggle,never}
 +
* Added config option "cache_max" to limit the cachemap tables used
 +
* user defined datasets
 +
* Made it an option to provide action buttons top and bottom in workflow
 +
**$c->{locking}->{eprint}->{enable} = 1;
 +
**$c->{locking}->{eprint}->{timeout} = 600;
 +
*REST privs
 +
*check registation email callback
 +
*epc:debug, epc:set, changes to epc:foreach
 +
*lots of eprints script functions (see list above)
 +
*views.pl
 +
** "DEFAULT;render_fn=render_view_items_3col_boxes",
 +
** render_menu => "render_view_menu_3col_boxes"
 +
** ranges & variations were introduced in 3.1.? but need documenting.
 +
* Storage plugins
 +
* adding actions to abstract page
 +
* select targz,zip,plain etc. in workflow/upload

Revision as of 16:15, 26 July 2012

Release Notes

3.4 | 3.4.1 | 3.4.2 | 3.4.3 | 3.4.4 | 3.4.5


3.3 | 3.3.5 | 3.3.6 | 3.3.7 | 3.3.8 | 3.3.9 | 3.3.10 | 3.3.11 | 3.3.13 | 3.3.14 | 3.3.15 | 3.3.16


3.2.0 | 3.2.1 | 3.2.2 | 3.2.3 | 3.2.4 | 3.2.5 | 3.2.6 | 3.2.7 | 3.2.8 | 3.2.9


3.1.0

NOTE

  • We now recommend LibXML in preference to GDOME. It's less buggy, and easier to install.
  • Upgrade may take several hours as it cleans up the unicode issues in the database.

Database

  • In addition to MySQL, EPrints 3.2 now supports Oracle and Postgres

API

  • This release features a formal API. Not all functionality is yet available via the API, but will be added slowly and carefully in future releases.
  • The bugbear of EPrints internals, EPrints::Session has been merged into EPrints::Repository. All old code will still work.

Documents

  • Thumbnails are now documents in their own right
  • Built in document-format icons, as well as those you configure yourself
  • Thumbnailing now happens in the background as part of the indexer process

Deposit Interface

  • Edit Locking locks records reduces risk of 2 people editing a record at the same time.
  • Option to extract metadata and images from OpenXML files (.docx and .pptx)
  • Offers options to users and editors on the deposit screen if there are problems
  • Document upload screen has been redesgined to be clearer.
  • Split document uploading into adding a new document and editing existing documents
  • The documents inside an EPrint may now be re-ordered
  • Progress bar on file upload
  • Document upload methods (file, url, zip etc.) are now plugin-based and can be extended
  • When attempting to deposit an eprint with problems show Save button
  • Made it an option to provide action buttons top and bottom in workflow
  • Added support for "input_boxes" property to the workflow, so you can now specify the number of input boxes to show for multiple fields
  • epc: no longer crashes eprints on bad scripts, just reports an error

Search & Indexing

  • The search library has been entirely re-written to reduce use of cache tables and to improve performance. Simple searches are now over ten times faster.
  • The indexer now uses plugins, so you can schedule other tasks, like thumbnail conversion, to be done in the background.
  • Added config option "cache_max" to limit the cachemap tables used
  • Added --clear option to cleanup old/broken indexer jobs

Unicode

  • EPrints use of unicode has been significantly improved.

REST

  • A "REST" style interface to objects, via /rest/eprint/23/title.txt, for example. This can also support "PUT" to alter fields!

SWORD2

  • SWORD2 (1.3 Specification) is supported.

Linked Data Support

  • Ability to establish arbitrary relations between objects or provide additional metadata in triple form.

Collections Support

  • Collections can be built via use of linked data, object ids and relationships.

Semantic Web / Linked Data (RDF)

We have made a (difficult) decision to move these features to 3.2.1 (due out soon after 3.2.0) because testing showed it caused a significant slow down.

We're rewriting it to do the same thing but with much less overhead!

Storage Layer

  • Now uses plugins to store files
    • Local Filesystem
    • Amazon S3
    • Sun Cloud Storage

Speed

  • Search & Indexing much faster
  • Import is faster
  • Other parts of the code have been audited for speed, and optimised.

Import

  • Modified Import UI to allow a per-plugin/single/bulk workflow

EPC & EPrints Script

  • New EPC tag: epc:debug, which is like print but sends the XML to STDERR for debugging purposes.
  • New EPC tag: epc:set which defines a variable inside it's scope.
  • Improvements to the epc:foreach processing (better handling of multiple object types in lists)
  • Added "limit" option to epc:foreach to limit the number
  • Inside <epc:foreach> blocks an $index variable is set, allowing you to test which interation it's on.
  • New EPScript methods: citation_link, dataset, related_objects, url, doc_size, is_public, thumbnail_url, preview_link, icon, human_filesize, control_url, contact_email, property, substr, filter_compound_list, to_data_array, pretty_list, array_concat, action_list, action_button, action_title, action_description, action_icon
  • New Script methods:
    • $data.property($key) which takes a string and returns a property from a hash or dataobj.
    • $eprint.documents() which returns all the "real" (non-volatile) documents.
  • New Script inline math functions: + - / * %
  • New EPrints Script datatype: DATA_ARRAY: Represents a list of tuples of [$value, $epscript_type]

OAI

  • Stateless OAI Interface means no timing-out
  • Support for multiple constraints in custom OAI sets

Unit Tests

  • We have introduced unit-tests to improve both the short and long term quality of our code.

Metadata Types

  • Counter (incrementing value)
  • Timestamp (defaults to the current time)
  • UUID
  • MetaField::Search now has two properties:
    • "namedfields" which is an array ref of field names to search OR
    • "namedfields_config" which is the name of a config variable
  • MetaField::Search can now be used in any workflow (not hard-coded to editpermfields)
  • A captcha pseudo-field based on http://recaptcha.net/
  • added "repeat_secret" property to secret fields that will render a confirmation box which is checked with validate()
  • Storable (store arbitrary Perl structures - internal use)

Administration Interface

  • Converted Admin screen into several tabs.
  • Improved the BatchEdit interface
  • Show a progress bar while records are updated during batch edit

Editorial Interface

  • Improved "Review" Screen
  • The "Review buffer" can now be filtered for better management of large review buffers.
  • When an editor provide the "Move to Review" button if there are problems

User Defined Datasets

  • Allows 3rd party tools to create their own additional datasets
  • Suite of interface screens to work with these new datasets

Command Line Tools

  • Allow eprint ids to be specified for redo_thumbnails

Export

  • Added support for OAI-ORE
  • Added support for JSONP
  • Added support for an 'n' argument to search exports
  • Added arguments support to export plugins. Passed by CGI arguments on abstract search or by the --arg option in bin/export

Abstract Page

  • Now generated with a citation
  • Shows an "action list", so plugins can register to appear on this page

Phrases

  • Primary method of editing phrases is now the web interface
  • Added "ref" option to phrases, which will cause the referenced phrase to be used instead - Equivalent to calling the referenced phrase directly

Views

  • Entire rendering of item lists and menus can be over-ridden by a function

Misc. Changes

  • Can now disable a repository through a system configuration setting
  • Refactoured DataObj::get_defaults so that you can now specify default values through a "default_value" property
  • Most of get_defaults() can now be specified through the metafield spec.
  • Can now apply multiple changes to the same field (???? I assume this means metafield?)
  • Preference field for users (to store k/v pairs in)
  • Simplified Apache configuration: generate_apacheconf will no longer overwrite existing files

Key Bugfixes

  • Fixed login/logout pages not using phrases
  • Fixed spurious history objects being created on document upload
  • Fixed an HTML insertion bug in the <title> element [Brian D. Gregg]
  • Fixed schema errors in uketd_dc and METS/MODS export plugins
  • Fixed bug in Compound creation of Set types that squashed the set options
  • Fixed order static directories are searched to: repository->theme->system
  • Support long values in browse views by using the MD5 of the value,
  • Subject inputform component can now be used with singular values
  • Fixed bug that is_advertised property on export plugins was being ignored.
  • Fixed bug in indexer which meant it didn't index in a round-robin fashion.
  • Fixed export not respecting metadata visibility


Changes to repository configuration

We've made some changes to the configuration of a new repository. These will not be automatically applied to your current repositories when upgrading.

Suggested Changes

If upgrading from 3.1 to 3.2, the following changes to your own configuration are suggested to gain the features described above.

  • cp lib/defaultcfg/cfg.d/rdf* archives/YOURID/cfg/cfg.d/
  • run epadmin recommit
  • edit cfg/lang/en/static/index.xpage and add the following to the <xpage:head> section.
<xpage:head>
  <link rel="alternate" type="application/rss+xml" title="Items in {phrase('archive_name')}" href="{$config{http_cgiurl}}/latest_tool?output=RSS2"></link>
  <link rel="alternate" type="application/atom+xml" title="Items in {phrase('archive_name')}" href="{$config{http_cgiurl}}/latest_tool?output=Atom"></link>
  <link rel="alternate" type="application/rdf+xml" title="Repository Summary RDF+XML" href="{$config{http_cgiurl}}/repositoryinfo/RDFXML/devel.rdf"></link>
  <link rel="alternate" type="text/n3" title="Repository Summary RDF+N3" href="{$config{http_cgiurl}}/repositoryinfo/RDFN3/devel.n3"></link>
</xpage:head>

(you may already have the rss & atom bits)

  • This list has NOT been completed yet, we're working on it!

New and Altered Config options

These need documenting and noting which ones we recommend setting/altering when upgrading.

  • Set "hide_document_conversion" to hide the Convert link on the document workflow
  • Broke up SystemSettings into logically named files
  • Can now disable a repository through a system configuration setting
  • Moved most of eprint_render.pl into a citation file: summary_page.xml
  • Updated defaults views.pl to show current configuration style
  • Improved document_upload.pl layout to make it easier to add/remove suffix to mimetype mappings.
  • Added URI to EPrint Summary Page
  • Added RDF+XML and N3+NT Document formats
  • New metafield option: $defaults{render_max_search_values} = 5;
  • Added "show_help" option to workflow component to disable collapsing Usage: show_help={always,toggle,never}
  • Added config option "cache_max" to limit the cachemap tables used
  • user defined datasets
  • Made it an option to provide action buttons top and bottom in workflow
    • $c->{locking}->{eprint}->{enable} = 1;
    • $c->{locking}->{eprint}->{timeout} = 600;
  • REST privs
  • check registation email callback
  • epc:debug, epc:set, changes to epc:foreach
  • lots of eprints script functions (see list above)
  • views.pl
    • "DEFAULT;render_fn=render_view_items_3col_boxes",
    • render_menu => "render_view_menu_3col_boxes"
    • ranges & variations were introduced in 3.1.? but need documenting.
  • Storage plugins
  • adding actions to abstract page
  • select targz,zip,plain etc. in workflow/upload