Difference between revisions of "New Features in EPrints 3.1"
(→New Document Formats)
|Line 195:||Line 195:|
The "subheadings" option has been '''replaced''' by the variations heading. This only allows one level of subheading but is much faster.
The "subheadings" option has been '''replaced''' by the variations heading. This only allows one level of subheading but is much faster.
=== cfg.d/indexing.pl ===
=== cfg.d/indexing.pl ===
Revision as of 10:43, 10 April 2008
- 1 New Features
- 1.1 Redesigned "Manage Deposits" and "Review" pages
- 1.2 Issues tracking system
- 1.3 Views
- 1.4 Submission Process
- 1.5 Export
- 1.6 Import
- 1.7 Search
- 1.8 Database
- 1.9 Configuration
- 1.10 Administration
- 1.11 Semantic Web/Complex Objects
- 1.12 Indexing
- 1.13 Autocompletion
- 1.14 Searching
- 1.15 Improved privilege handling
- 1.16 Templates
- 1.17 Toolbox
- 1.18 New field-rendering tools
- 1.19 XML based Scripting
- 1.20 Handy Hooks
- 1.21 Uncategorised features
- 2 Changes to repository configuration
- 2.1 Suggested Changes
- 2.2 Other Changes to the default configuration
Important note. Some new features will be enabled in NEW repositories, but if you upgrade you will need to modify your configuration to use them. We strongly recommend you review the "changes to default configuration" section at the end of this page, and apply them to your configuration, where appropriate.
Redesigned "Manage Deposits" and "Review" pages
- Columns may now be moved left and right.
- Columns may be deleted.
- A column may be added for any eprint field.
- Changes to columns are saved on the user record.
- Added icons for common actions (edit, deposit, etc.)
Issues tracking system
The new issues system allows the discovery of eprints with issues. For example, duplicate titles, or an item from five years ago still listed as "in press". You can configure the issues system via a simple XML file, similar to the citation file format. More richly specified issues discovery can be achieved using the new issues-plugins system.
- A issues_audit script is run nightly to discover and note issues. Issues which were there last time, but not there today are marked as "resolved" so you can review the resolution time. This audits items in the live archive and review buffer.
- An Issues Tab on the eprint control page shows the current logged issues, and also a live list of issues (although not all issues, such as a search for similar titles, can currently be done "on the fly")
- An Issues Search tool to search the
- New "Issues" system and plugins, with a tab on eprints and an issues search. You can order the list of items with issues by the number of issues, or by the most recently discovered issues.
Planned extensions to the issues system
These are not yet implemented, but we may provide as a set of plugins to 3.1.
- Tool to allow QA staff to mark auto-discovered issues as "ignored". For example, two items which really DO have the same title.
- Tool to allow staff to add issues by hand, and mark these issues as resolved.
The views part of EPrints has had a significant rewrite.
- Significant speed improvements
- On the command line, epadmin refresh_views, and a button on the Admin web page, which both cause all view pages to be regenerated next time they are requested.
- Groupings: You may configure alternate groupings for the views pages. Eg. group results by type, or by creators first initial.
- Views can be set to regenerate if they are requested and the file on disk is older than a specified age.
- Generate_views can be limited to just one view or language. Also just the menu pages, or just the lists-of-citations pages.
- Document Upload now allows upload+unpacking of .tgz or .zip files.
- Document Upload offers "capture from URL".
- Documents now have an option to convert them to any format available in the convert plugins.
- There's a convert plugin to turn .doc into .pdf, which is available from the upload screen convert option.
- Also plugins to convert PDFs and images to various image formats (png, jpg etc.)
- Option to allow import plugins to caputre full text from URLs in the imported data.
- Existing export plugins have been improved in terms of speed and memory usage.
- The BibTeX export no longer requires an external module.
- EAP (SWAP) Plugin - The EPrints Application Profile.
- XSLT based export plugins support.
- IDs Plugin - just export the IDs of each item.
- ListUserEmails Plugin - lists the emails of a set of users. Only available to repository staff. This replaces the list_user_emails command.
- DatabaseSchema Plugin - exports the database schema.
- Plugin to Export history dataobjects in ical format. Potentially useful for preserving/sharing history in a standard format.
- XSLT import plugin system
- Import scripts can now download documents from a URL (mainly aimed at the EP3 XML format).
- Added import dataset for keeping track of imports.
- In the web interface, you can import from a file instead of just cut-and-pasting metadata.
- Also, the Import screen has an import full text option, if web imports are enabled in the config.
- added importid to eprints to link them to the related import object.
- added source_id to eprints to describe the id in the source repository.
- Search by a sub-object field (eg. search eprints by the format of their documents)
- Search by a related-object field (eg. search eprints by the name of the depositing user)
- Abstracted database layer to allow support of other databases.
- Oracle Support!
- Writing objects which have not changed is now optimised to not actually write, speeding up some parts of the system.
- EPrints fields may be marked as volatile. These fields can be modified without causing a change to the last_modified time of the eprint, the revision_id does not increase and no history event is created. These fields are useful for storing values which change frequently, such as citation counts or hit counts imported from an external tool.
- Messages and login_tickets are now datasets.
- epadmin has a command to create anything missing in the database. Handy if you want to add a field, you no longer have to fiddle with the SQL.
- New form based interface for admins to add metadata fields via the web.
- Admins may view configuration via the web.
- Admins may modify configuration via the web.
- Phrases, Citations, Workflows, Templates and static files are automatically re-read if they are altered. This means no more running generate_static every five minutes (although you still need it to ADD files)
- Buttons to refresh the configuration, the views and the abstracts, via the Admin page.
- When creating a new repository, a db connection error doesn't abort the process, and you are offered a random password to use for the db account.
- errors and warnings in executing code from .pl files now show correct line number and filename making debugging less annoying.
- epadmin erase_data now recreates all tables, rather than drop and recreate the entire database.
- Added plugin-masking so a local plugin can take the id of an existing plugin even though it has a different Perl class.
- epadmin has a command to create anything missing in the database. Handy if you want to add a field.
- Plugins can now be installed in a repository-specific plugins dir, although the perl package names must be unique.
- Batch editing: An administrator can perform a search and then set or modify values on all items which match the search.
- New buttons on the admin screen:
- send a test email (to check outgoing email is working).
- refresh the abstracts (on request, not right now)
- refresh the views (on request, not right now)
- reload the all configuration files.
- Staff can now queue an eprint for re-indexing via a button in the list of EPrint actions.
- New status screen to show database schema (replacing explain_sql)
Semantic Web/Complex Objects
- We've assigned URI's to each object in each dataset, and you can get at them using $eprint->uri, $user->uri etc.
- EP3 XML Export now adds the URI to each record.
- URIs take the format <ARCHIVE_URL>/id/<datasetid>/<objectid> eg. http://foo.eprints.org/id/eprint/23
- URIs for eprints redirect the abstract page for that eprint.
- URIs for documents redirect the base URL of that document.
- Added a "Content" field to documents to describe their relationship to the main record.
- Added a "Relation" field to documents and eprints. This is a list of URI's and relationship type, and using several eprints records, complex objects can be described.
- Significant improvemnts to the speed items are indexed.
- Can now auto-complete on "select" fields.
- New options for autocompletion to target cells in the row being autocompleted, and to hide & show HTML elements when the auto-completion is triggered.
- Float fields are now searchable (and searches can be mixed with ints)
- search cgi now understands the "EX" flag for exact matching. Not available via the form interface, but handy for some scripts.
- Made it possible for search results to show zero matches (so you can create an export).
Improved privilege handling
First, a quick refresher on the way privileges, roles and usertypes relate to each other:
- A privilege is a very fine-grained right. It allows, for example, a user to view the "history" tab on eprints they submitted, while the eprint is in the inbox. In this case "eprint/inbox/history:owner".
- A "role" is a bunch of related privs that may be assigned to a user or usertype.
- A "user" has a "usertype". The user_roles.pl config file defines what roles are assigned to each usertype.
Now for the changes in 3.1:
- You can add (and remove) individual privs in the user_roles.pl
- Also a new field in User can be used to assign
- New roles can be configured (roles are a set of related privileges)
- And you can define a new role. These are hats for assigning to individual users. For example a "edit item in buffer, but don't approve" hat.
Additional templates can be added and used instead of the default template. You can configure a search and view to use such a template. You can also apply a different template to a static page, abstract page or screen plugin. A good application of this new feature would be to create a view and search filtered to only show items from the university maths department, and give that view and search a template in the style of the maths dept.
The toolbox is a command line tool allows eprints and documents to be searched, queried and modified. Even adding files to documents. This is potentially very powerful as it allows a way to script reading and writing data without learning Perl. There's also a CGI interface, but it should be carefully secured before use.
New field-rendering tools
We've added some new field renderers to EPrints::Extras. If you're upgrading, you'll need to add these to eprint_fields.pl by hand.
- renderer for URLs which truncates very long URLs at the end
- renderer for URLs which truncates very long URLs in the middle
- renderer for the related-urls field to lay out the values more sensibly.
- render_possible_doi links to the DOI resolver if the field appears to be a DOI, otherwise doesn't.
XML based Scripting
This is a new tag for XML files: <epc:foreach> takes a list (eg. the value of a "multiple" field) and returns the contents of the tag once for each value in the list.eg.
<ol> <epc:foreach expr="creators_name" iterator="name"> <li>(<epc:print expr="$name" />)</li> </epc:foreach> </ol>
which would resolve to something like:
<ol> <li>(Smith, John)</li> <li>(Jones, Davy)</li> </ol>
This just returns the length of a string. Useful in the issues.xml for identifying very long or short strings.
<spc:print expr="title.strlen()" />
would return the number of characters in the title field.
The today() function takes no parameters and returns the current date.
DATE.datemath( CHANGE, TYPE )
CHANGE is a positive or negative value and type is year, month or day. Returns the resulting date. For example:
today().datemath( -6, "month" )
would return a date six months before today.
Hooks are easy ways to add some code to be run on certain events. New hooks are:
- When a user logs out.
- When the thumbnails for a document are (re)generated.
- Abstract pages can now be refreshed using epadmin or the web-interface.
- oai_accesslog script to allow users with a given privilage to get access to access-logs via OAI.- New random data generating script to create any amount of random eprints.
- New types of "thumbnail" format can be added. You're not restricted to the ones we think our useful - eg. "fullsize" is handy to add to get a large image of the front page of documents.
- Added EPrints::Extras::english_title_orderkey - which can be used to cause a field to be sorted, ignoring a leading a/an/the.
- Lists of checkboxes which are more than 5 items long are now rendered as two columns. This makes search pages much more tidy.
- Added config option to set the citation style used in in saved search emails.
- New document icons for zip, tgz, rtf, xml, ppt, video, audio.
Changes to repository configuration
We've made some changes to the configuration of a new repository. These will not be automatically applied to your current repositories when upgrading.
If upgrading from 3.0 to 3.1, the following changes to your own configuration are suggested to gain the features described above.
New Document Formats
Added new default document formats: zip, tgz, rtf, xml, ppt, bz2, audio and video.
The "subheadings" option has been replaced by the variations heading. This only allows one level of subheading but is much faster.
The way the titles of views pages are configured has changed.
The title of the first page of the view with id "subjects" is now:
<epp:phrase id="viewtitle_eprint_subjects_menu_1">Browse by Subject and Year</epp:phrase>
if there's a second level menu, configure it with:
<epp:phrase id="viewtitle_eprint_subjects_menu_2">Browse by Year where Subject is "<epc:pin name="value1" />"</epp:phrase>
<epp:phrase id="viewtitle_eprint_subjects_list">Items where Subject is "<epc:pin name="value1" />" and Year is <epc:pin name="value2" /> (Grouped by <epc:pin name="grouping" />)</epp:phrase>
A new version of this was added in the 3.0 series, but if you're using an early 3.0 version of this file, then replace it with the one that comes with 3.1 - it's much faster.
- added documents.format to advanced search
- changed all searches on "userid" to search "userid.username" instead. Another good option might be userid.name or even userid.usertype
- Most earches are now configured by default to show zero results (rather than returning to the search form). The option to add to searches in search.pl is show_zero_results => 1. This is handy for people who want to make a saved search, even if there were no results yet,.
- Added the "edit-config" role to admin users.
- added "content" to the "Upload" component, as the first field (just before "format").
This is a new file which describes the options in the new "document.content" field. Add this file if you enabled the field in the workflow.
- Removed item_fields and review_fields from the user workflow as these can now be modified more easily from the screens themselves.
- Added user.roles to the "usertype" stage, so it's only editable by people who can also set the usertype.
This is a new (optional) configuration file which defines some simple issues to warn about in live eprints.
We applied some of the new Extras render and order methods:
- title field: 'make_single_value_orderkey' => 'EPrints::Extras::english_title_orderkey' - so when ordering by title, leading "a", "an" and "the" are ignored. If you add this as an upgrade you'll need to run epadmin reorder repositoryid eprint
- id_number field: 'render_value' => 'EPrints::Extras::render_possible_doi' - which links to the DOI resolver if the field looks like a DOI.
- offical_url: 'render_value' => 'EPrints::Extras::render_url_truncate_end' - this now chops the end of very long URLs for nicer page rendering.
- related_url: 'render_value' => 'EPrints::Extras::render_related_url' - this is a renderer specifically for this field. It links the word "author", or "publisher", etc. to the URL. If no type is specified, it links the URL (truncated if over 40 chars).
Other Changes to the default configuration
These are changes from 3.0 which you probably don't need to bother applying when upgrading.
- Changed thesis to only be set to "unpub" if its' not yet defined. Thesis are not always unpublished.
The contributor field is like the creators field, but each contributor has a "type" set (with a default list of about 200 possible types of contribution. This field is not added to the workflow, by default, as it's more than most sites need. We suggest trimming the contributor type list down, but we started with a big list to ensure commonality of the values over all repositories using it.
- eprint_files.pl: added contributor field
- namedsets/contributor_type: List of all possible types of contributor.
- lang/en/phrases/eprint_fields.xml: Added related phrases.
- workflows/eprint/default.xml: Added the field in a comment, so people can add it if needed.
- Improved the message at the top of the default homepage to be more friendly and link to a "what next?" page in the wiki.
- Some cfg.d/*.pl files gained additional example code. Eg. user_roles.pl