Entire Manual

From EPrints Documentation
Redirect page
Jump to: navigation, search

Redirect to:

Warning This page is under development as part of the EPrints 3.4 manual. It may still contain content specific to earlier versions.

This page was generated on 2023-11-28



Manual Sections

What is EPrints?

EPrints 3 is generic repository building software developed by the University of Southampton. It is intended to create a highly configurable web-based repository.

EPrints is often used as an open archive for research papers, and the default configuration reflects this, but it is also used for other things such as images, research data, audio archives - anything that can be stored digitally.

The EPrints series began in early 2000 and is in use by over 200 sites!

Should I be installing EPrints 3, how much effort will it take?

Start by looking at our live, online server Demoprints to get a feel for what the software does.

You can get a vanilla install up and running quite easily, installation notes on the wiki should help you over any snags relating to your operating system. You'll need a UNIX-like machine (linux is good), and a root password is helpful.

The task which will take longest is actually deciding what you want your repository to do (and not do). Many sites want to make significant customisations. EPrints creates a repository with a sensible default, but all our users want something slightly different.

Installing and configuring the software isn't too hard, and we're working on admin tools to make it even easier.

The time taken in running the archive day to day depends on your own policy. Do you want a very light touch on the data submitted or a formal review process on each item - that's up to you!

What will it run on?

We develop EPrints on Redhat Linux (both Fedora Core and Enterprise), but it is used on any number of Linux distributions, and other UNIX-like systems including OS-X. Thanks to support from Microsoft, it also runs on Windows Vista and XP.

EPrints doesn't require any unusual hardware. It's slightly easier to run on a dedicated machine, but that's not essential, and should not affect performance.

Don't forget to budget for a backup system, your data is valuable!

Required Software

Manual Sections

What Additional Software does EPrints Require?

In brief, EPrints minimally requires Apache (with mod_perl), MySQL and Perl with some extra modules. Various utilities like wget, tar and unzip would also be useful.

EPrints bundles some Perl modules which it uses, to save you installing them.

Where to get the Required Software

Apache, MySQL, Perl and mod_perl are all provided as operating system level packages that can be installed on EPrints' Recommended Platforms. If you wish to install on a platform that is not recommended, then you will need to determine the best way to install these applications. It may be possible to infer comparable packages for your platform by checking the dependencies installed on Red Hat based and Debian based Linux.

Other Tools

File uploads

wget, tar, gunzip and unzip are required to allow users to upload files as .tar.gz or .zip or to captures them from a URL.

These all come installed with most modern versions of linux. If you cannot get them working, you can remove the option by editing "archive_formats" in SystemSettings.pm

If there are problems you may need to tweak how these are invoked in SystemSettings.pm

Full Text Indexing

The EPrints indexer requires various tools to extract plain (UTF-8) text from different types of document for indexing.

The full text indexer requires various tools to index each kind of document. These tools may or may not be already installed in your system. EPrints uses these tools to build a "words" file for each document (which contains the text of the document in UTF-8). If it can't run the tool, the "words" file will be empty and EPrints will not retry creating it unless you manually remove it.


Full text indexing PDF documents requires pdftotext application provided by the poppler-utils Deb or RPM package.

Microsoft Word

Full test indexing of Microsoft Word documents is provided by the antiword Deb or RPM package. The RPM package is available through the forensics RPM repository/


Full test indexing of HTML documents requires the lynx text-based browser provided by the 'lynx Deb or RPM package.

LaTeX Tools

There is an optional feature which allows you to instruct EPrints to look in certain fields (e.g. title and abstract) for strings that look like LaTeX equations and render them as images. These tools are only required if you want to use this feature.

These are provided by the tetex-latex and ImageMagick RPMs or the texlive-base, texlive-bin and imagemagick Deb packages.

This is a "cosmetic" feature, it only affects the rendering of information, so you can always add it later if you want to save time initially.

Other Platforms

Often the best way to find certain packages of other platforms is to use a search engine to look for the package name for Red Hat or Ubuntu Linux along with the name of your platform. (E.g. antiword Arch Linux). If you platform does not have comparable packages, then the next best option is to download the software tool is the official site. Below are links to the download pages for the essential components of EPrints:

Installing MySQL

Installing MySQL

Installing mod_perl

Installing mod perl

Installing Perl Modules

Installing Perl modules

Installing GDOME

Manual Sections

In future versions of EPrints, XML::LibXML will be the only XML library supported. As this is about as efficient as XML::GDOME but can be more easily installed on various Linux operating systems. Therefore, it is best to install this rather than XML::GDOME, as this will make it more difficult to upgrade to future version of EPrints.


Manual Sections

Below describes installing EPrints on a generic Linux distribution. It is recommended that EPrints is installed on either Red Hat based or Debian based Linux.


EPrints can be downloaded from https://files.eprints.org/


If you are upgrading an existing installation of eprints please see the section on Upgrading section of this manual.

EPrints needs to be installed as the same user as the apache webserver runs as. We suggest you install it as user "eprints" and group "eprints". Under some UNIX platforms, creating a user and group can be done using the "adduser" command. Otherwise refer to your operating system documentation.

Unpack the eprints tar.gz file:

% gunzip eprints-3.something.tar.gz
% tar xf eprints-3.something.tar

Now run the "configure" script. This is a /bin/sh script which will attempt to locate various parts of your system such as the perl binary. It will also check your system for required components.

% cd eprints-3.something
% ./configure

By default the system installs as user and group "eprints". You will need to change this if you are not installing as either "root" or "eprints".

The configure script accepts a number of options.

List all the options (many are intended for compiled software and are ignored).


Where to install EPrints (or look for a version to upgrade). By default /opt/eprints3/
Use HOST to deliver mail. If the server running EPrints has an MTA such as exim or sendmail, you can specify localhost. If you do not specify this option, you will get a warning to configure it later.
Install eprints to run as USER. By default "eprints".
Install eprints to run as GROUP. By default "eprints".


Path of perl interpreter (in case configure can't find it, or you have more than one and want to use a specific one).
Use VIRTUALHOST rather than * for apache VirtualHost directives.
An alternate path to search for the required binaries.
Disable disk free space calls. These can cause problems on some platforms, notably 64-bit.


Use Apache 1.x.x instead of 2.x.x, but EPrints 3 does not support this.

Once you are happy with your configuration you may install eprints by running install.pl:

% ./install.pl

Now you should edit the configuration file for your copy of apache. This is often /usr/local/apache/conf/http.conf or /etc/httpd/conf/httpd.conf

Add this line: (If you didn't install eprints in /opt/eprints3/ replace that with the location on your system).

Include /opt/eprints3/cfg/apache.conf

Note that this file is only available after you created your archive via epadmin create. See Running epadmin for more information on creating an archive.

You need to make sure the apache user have read/write access to the installation directory (/opt/eprints3). The user must be the same as the user you installed eprints as. We recommend to configure your apache to run as:

User eprints
Group eprints

API Overview

EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects


EPrints is written in the Perl language. Sometimes you may need to write your own Perl code using the EPrints Application Programming Interface (API).

Reasons you may need to write your own EPrints code:

  • customising the way the Eprints summary pages are rendered
  • writing your own command-line script to control EPrints in some way
  • writing a new CGI script (dynamic web page)
  • writing a plugin

Key Sections

  • Core API provides a basic introduction to using the EPrints API.
  • StyleGuide gives guidance on how to code in a compatible style.
  • Modules discusses the framework for Plugins you write for the community.

Principal Modules

This is a list of the principal modules in the EPrints API. These modules are the primary means of connecting to and manipulating objects in the repository.


An repository is an eprints archive with its own website configuration and data. One install of the Eprints software can run several separate repositories. Sharing code but with totally different configurations. Before EPrints 2.4 this was known as EPrints::Archive. This was changed to avoid confusion with the eprint status of "archive".


The connection to the MySQL (or other database) back end. datasets are stored in the MySQL system, but you do not have to address it directly.


A dataset is a collection of items of the same type. It can be searched.

Some datasets all have the same "config id". The "config id" is used to get information about the dataset from the archive config - inbox, buffer, archive and deletion all have the same metadata fields and types.

Core datasets are:

eprint EPrint records are the core of the system.
user Users registered with the system.
subject The subject tree.
document Documents belonging to EPrints. Every document is part of an EPrint record.
subscription Subscriptions made by users. Every subscription is a part of a subscription record.
history Stores actions performed on records.

In addtion to these datasets are four virtual datasets: inbox, buffer, archive and deletion. These act just like "eprint" except that they are filtered to only contain records with those status.

Note that prior to 2.4 the "eprint" dataset was virtual, rather than "inbox", "buffer" etc. The history dataset was introduced in 2.4.


A single field in a dataset. Each dataset has a few "system" fields which Eprints uses to manage the system and then any number of archive specific fields which you may configure.


The "super class" of subjects, users, eprints and documents etc. In the very core of the system these are all treated identically and much of the configuration and methods of these classes of "thing" are identical. We use the term item to speak about the general case.

type or user-type or eprint-type 
users, eprints and documents all have a "type". This controls how they are "cited" and also for users and eprints it controls what fields may be edited, and which are required.


A document is a single format of an eprint, e.g. HTML, PDF, PS etc. It can contain more than one file, for example HTML may contain more than one html page + image files. The actual files are stored in the filesystem. Pre 2.4 this was known as EPrints::Document.


An eprint is a record in the system which has one or more documents and some metadata. Usually, more than one document is to provide the same information in multiple formats, although this is not compulsory. Pre 2.4 this was known as EPrints::EPrint.


Some software refers to this concept as alerts.

A stored search which is performed every day/week/month and any new results are then mailed to the user who owns the subscription.

This diagram does not show "Subscription". Subscription is a subclass of DataObj (like EPrint, User etc.). A subscription is associated with one User. A user is associated with 0..n Subscription's.


A subject has an id and a list of who its parents are. There is a built-in subject with the id "ROOT" to act as the top level. A subject can have more than one parent to allow you to create a rich lattice, rather than just a tree, but loops are not allowed.


A user registered with the system. (NOT necessarily the author of the eprint they deposit). Pre 2.4 this was known as EPrints::User

EPrints Configuration

Warning This page is under development as part of the EPrints 3.4 manual. It may still contain content specific to earlier versions.
Manual Sections

This page deals with configuring the software.

See also: repository configuration

EPrints General Configuration

This section describes all the configuration files in the EPrints system which do not relate to any specific archive.

EPrints Configuration Directory

The general EPrints configuration directory is usually /opt/eprints2/cfg/ and contains the following files:

This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.
This XML file contains an (exhaustive) list of all ISO language ID's and their names.
One of these files per language needed for any archive in this system. These files contain the phrases needed to render the website and email in each language, not counting names of things like metadata fields which vary between archives. It should not be edited by hand, but may be overridden. See the instructions on phrase files in the archive config documentation.
Described below.


This is a perl module which is created and edited by the eprints installer script when installing or upgrading EPrints. It's found in perl-lib/EPrints/

SystemSettings contains system specific things:

The root directory of your eprints install. Normally /opt/eprints2/
A hash of the path of various external commands such as sendmail and wget.
A hash of how eprints is to invoke various external commands. The variables with uppercase names - $(FOO) - are replaced with parameters from eprints, the lowercase names - $(sendmail) - are replaced with the strings in executables.
An array of id's of archive formats offered in the upload document page. For each their must be an entry in the archive_extension and invocation, $(DIR) is the where eprints wants the contents of the archive and $(ARC) is the archive file.
The id of the current eprints version.
The human readable version number.
The UNIX user eprints will run as. Usually "eprints".
The UNIX user eprints will run as. Usually "eprints".
virtualhost (Since v2.1) 
If this is set, it is used for the VirtualHostName in the Apache configuration files. (By default EPrints uses "*").
disable_df (Since v2.1) 
If this is set to 1 then this disables the parts of EPrints which use the df call (disk free). If the "configure" script tested the "df" command and found that it failed the this function will initially be set to 1, otherwise 0.
enable_gdome (Since v2.2) 
If this is set to 1 then it enables the use of the XML::GDOME module, rather than XML::DOM. XML::GDOME is faster and less memory intensive but depends on a number of other libraries and modules which are not worth installing for a trial system.

Repository Configuration

Warning This page is under development as part of the EPrints 3.4 manual. It may still contain content specific to earlier versions.
Manual Sections

EPrints Archive Configuration

This section describes all the configuration files in an single archive in the EPrints system.

Primary archive configuration file

Once you have created an EPrints archive the information you entered is placed in an XML file in /usr/local/eprint2/archives/ with the name archiveid.xml - this file is documented later in this section.

Archive configuration directory

The bulk of the archive configuration is copied from /opt/eprints2/defaultcfg/ into the archives own configuration directory (usually /opt/eprints2/archives/archiveid/cfg/ This directory will usually contain the following files and directories:

This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.
apachevhost.conf (added v2.2) 
This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.
The general configuration items which don't fit anywhere else are in this perl module. It is described fully later in this section of documentation. This module "requires" the other 5 perl modules. They are in seperate files to make them easier to get to grips with.
This module configures the metadata fields and the default values.
This module configures how the archive exports itself via the Open Archives protocol.
This module contains subroutines which handle rendering the data into XHTML (mostly) for display as webpages.
This module handles turning UTF8 text strings into lists of index words for free text searches.
This module contains subroutines which check the metadata for problems.
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.
One of these files for each languageid supported by this archive. These XML files describe how to turn metadata for an item into a citation (with markup). They are described fully later in this section of documentation.
One of these files for each languageid supported by this archive. These DTD files are generated automaticly just before eprints loads the archives configuration and should not be edited directly.
This XML file describes the various types of eprints, users etc. and which metadata fields are required or relevant to each. It is described fully later in this section of documentation.
One of these files for each languageid supported by this archive. These XML files contain all the phrases which are specific to this archives such as the titles of metadata fields. They are described fully later in this section of documentation.
This XML file just contains the horizontal divider used in webpages created by the system. It is described fully later in this section of documentation.
This directory contains the data needed to create the static webpages such as the homepage, and about page. It is described fully later in this section of documentation.
This file contains the initial subjects for the system. It is described fully in the documentation for import_subjects.
One of these files for each languageid supported by this archive. These XML/XHTML files describe the outline for webpages for this system. They are described fully later in this section of documentation.

XML Config Files in EPrints

This section contains some general information about the XML archive config files: template, phrases, ruler and citations. metadata-types.xml uses XML but these comments do not apply.


These files use HTML elements (and other elements too). XHTML is a fairly new version of HTML which is backwards compatable with HTML 4 but written using XML, not SGML. This means that it is much stricter but less ambiguous and easier to parse and modify. Assuming you know HTML, the main differences are as follows:

All tags must be closed 
All elements must be closed, even ones such as <li>. Tags which do not have a close tag in HTML, like <br> or <img src="foo"> still must be closed eg. <img src="foo"></img> - this can be abbreviated as: <img src="foo" />
All tags and attributes must be lower case 
Self explanitary.
Strict definition of what tags may appear within others 
Not actually checked by EPrints. It will let any rubbish past as long as it's valid XML. But that's no reason to be naughty.
All attributes must be wrapped in quotes 
In HTML the values of attributes do not have to be wrapped in quotes, but in XML (and therefore XHTML) they do.
All attributes must have a value 
In HTML some attribues do not require a value, for example <hr noshade> In XHTML it is represented as <hr noshade="noshade" />

So in summary, the HTML:

<img SRC=someurl>
<P>Foo bar</P>

should become in XHTML:

<img src="someurl" />
<hr noshade="noshade" width="2" />
<p>Foo bar</p>

And that's more or less it. See http://www.w3c.org/ for a complete description.

Language specific files.

phrases, templates and citations have one instance per supported language. This allows the system to generate pages and emails in more than one language. Supporting a new language will require translating all the english config files currently shipped. If you do intend to do this (lots of work!) please get in touch with the eprints admin so that we can avoid duplicated effort.

Extra Entities

The XML files all use a DTD which defines a few extra entities. Entities are items in XML (or HTML) which start with "&" and end with ";" like &amp;. These additional entities come from the entities DTD file created by generate_entities. One DTD is created per language, although currently the only variation is the archive name.

The name of the archive in the current language.
The administrators email address.
The base URL of the system (without a trailing slash)
The base URL of the CGI directory (without a trailing slash)
The URL of the system homepage.
The URL of the user homepage.
The current EPrints version.
The XHTML of the standard divider.
Any XHTML character entity (since EPrints v2.1) 
You may now use any XHTML character entity, eg. &nbsp; &eacute; &euro;.
User configured entities 
You can generate your own entities by modifying the function which generates them in ArchiveConfig.pm

None of these entities are not available in the citations file or the ruler file.

Name Spaces and XHTML

These files contain a mixture of custom tags and XHTML. To keep these distinct the XML files contain a name space definition in the first element. The pratical upshot is that all EPrints own tags have the prefix "ep:". The namespace information is actually ignored by the current version of the eprints system.

example of mixed tags (and entities):

<ep:phrase ref="lib/session:contact"><p>Feel free to contact 
<a href="mailto:&adminemail;%22>&archivename; administration</a> 
with details.</p></ep:phrase>
eprints elements: phrase
xhtml elements: p, a
eprints entities: archiveemail, archivename

The Primary Archive Configuration File

This XML file appears in the archives/ directory, usually /opt/eprints2/archives/, it describes the most very basic details about the archive. It is generated (and modified) by configure_archive and will not normally need to be edited.

EPrints looks in this directory for XML files and attempts to load them all when starting the webserver.

This file should be chmod'd so that it can not be read by random users as it contains the database password.

The top level element is "archive" which has the attribute "id" which is the id of the archive. It should be the same as the filename. If this file is foo.xml then the id should be foo.

<archive> contains a list of XML tags enclosing some text. eg.


The following tags are expected in no special order:

The hostname of this archive.
<alias redirect="yes-or-no"> 
This is optional and may be repeated. It has the attribute "redirect" which may be set to yes or no. This controls what virtual hosts are supported and if they should redirect to the main <host>.
The ISO id of a language supported by this archive. Repeatable. One of these should also be the defaultlanguage. See below.
The port number that the server is running on. Usually 80.
The directory from the root of the server name. Usually /
The filesystem path of the rest of the archive configuration.
The path to the perl module which does the main configuration (ArchiveConfig.pm)
The name of the MySQL database. Usually the same as the archive ID.
The host on which MySQL is running. Usually localhost.
An optional MySQL port, if it's not the standard one. Should be empty if we are to use the default.
An optional MySQL socket. Should be empty if we are to use the default.
The username to use when connecting to MySQL, usually "eprints".
The password to use to connect to MySQL.
One of the supported language. This is the default for this archive.
The email address of the archive administrator. I strongly suggest that this is an alias rather than a personal email address. If all your webpages contain "bob@footle.edu" and bill takes over from bob you would have to regenerate every page with "bill@footle.edu". Much better to set up an email alias or forward from "archive-support@footle.edu" and point it at bob (for now). Heed these words spoken from grim experience!
<archivename language="langcode"> 
The name of the archive. This has an attribute "language" the value of which is an iso language id. There should be one of these archivename elements per supported language. eg.

   <archivename language="en">White Lemur</archivename>
   <archivename language="fr">La Archive d'Lemur Blanc</archivename>

(apologies to the french, human languages aren't my strong suit)

<securehost> (since v2.2) 
Used for experiemental https support.
<securepath> (since v2.2) 
Used for experiemental https support.


This module imports the other 5 perl modules. It allows lots of little tweaks to the system, which are all commented in the file.

It includes options to hide various features you may not want and to customise the browse, search and subscription functions.

Also you can customise what each type of user can and can't do, and how they authenticate their passwords.

This configuaration file contains perl methods which are called when a session starts and ends, to log things, to generate the entities for the entities file and security on non public files.

Browse Views

The browse views are generated by the script "generate_views" and what that script does is configured by the "browse_views" item in the config.

It is a reference to a perl array [], each item of which is a hash {}.

The hash has 3 required properties and a number of optional ones.

id (required) 
The ID of this view - the view will be placed in a subdirectory of /views/ of this name. The ID is also used to identify the full name of this view in the phrase file. id=>"foo" would find it's title in the phrase "viewname_eprint_foo"
fields (required) 
The list of the names of the fields to browse, seperated by a slash "/". This should normally be a single field unless you want to merge the values of two fields. The id part of a field may be specified by appending ".id" to the fieldname.
order (required) 
A list of fields to sort by in order of priority, sepearted by slashes "/". A minus sign prefixing the fieldname "-" indicates reverse sorting on that field.
Should we make a page for the "unset" condition? A page for items which do not have a year set may be useful. But for other fields this may be meaningless. Set it to 1 for true.
Generate a file for every value, ending in ".include" which contains the XHTML of the citations of records and the number of records, but without wrapping the site standard template around it.
Normally the system generates a page like that described for "include" with a .html suffix and the site template. If nohtml is set to 1 then it won't.
Normally the citation used is that for the "type" of eprint. If this is set then that citation (from the citations file) will be used for all items. This allows for some clever stuff if you want to make page which can get sucked into another website.

Normally the system puts a paragraph tag around each citation, but if you use a custom citation this will not happen.

Do not include the count of how many items at the top of the page.
The system generates an index.html in /view/ with a list of all the browse views available. Setting nolink to 1 will hide this item.
Do not generate an index.html file in /view/foo/ listing all the values of the view and linking to their respective pages.
notimestamp (since v2.2) 
Do not add the timestamp at the bottom of the view page.
hideempty (since v2.2) 
Only applicable to subjects. This option will supress subjects which do not have any records in. This is useful on "young" archives which look very empty if you have a large subject tree and only a few records, and those clustered in 3 or 4 subjects.

The most common view is to browse by subject:

{ id=>"subject", allow_null=>0, fields=>"subjects", 
   order=>"title/authors", hideempty=>1 }

A more complex view generates a view on author & editor ID's which are not advertised but may be captured by some other software to build staff CV pages.

{ id=>"person", allow_null=>0, fields=>"authors.id/editors.id", 
   nohtml=>1, nolink=>1, noindex=>1, include=>1, 
   order=>"-year/title" }

For my example person id "wh" this will generate a webpage called /view/person/wh.include (and one for each other value of authors or editors ID's) which can be captured by an external automated system.

User Privs

The user permission configuration allows you to set what types of user can and can't do. The user home page will only show a user options which they can do.

New types of user, and which data about themselves they can edit is set in metadata-fields.xml.

Permissions are set by "type" of user. By default there are 3 kinds of user: "user", "editor" and "admin".

Admin can, by default, do everything.

subscription (since EPrints v2.1) 
If included then this kind of user can create subscriptions.
Reset their password via the web registration system.
Submit items into the archive.
View the archive status page.
User can edit then approve submitted items into the main archive, or delete them, or return them to sender. Also can remove items from the archive back into the edit buffer for corrections, and move records into the deleted table (delete them).
User can perform a "staff search" of user or eprint records and view ALL the metadata.
User can edit the subject tree via the online interface.
User can edit other users records.
User can change their email address via the web interface. This is safer than allowing them to edit it directly as it ensures they cannot set it to an address which they recieve (it mails them a confirmation pin number)
This allows the sinister feature which lets you log in as someone else. It still requires a password. This is useful if you want to perform admin tasks as a super user, then log-in as a normal user to deposit items.
no_edit_own_record (since v2.2) 
This supresses the "edit my user record" option. This may be useful if you disable web-registration and import the user records from some other database.


Fields Configuration

Metadata is data about data. The information which we store to describe each record (eprint) in the system. Users also have metadata.

This module is the configuration for the metadata. This is probably the most important part of the system.

See the chapter on metadata for all the configuration options.


This section of the file contains subroutines which are called to set default values for Users, Documents and EPrints.


These functions let you set automatic fields. This allows you to make fields which are updated automatically each time the item (User/EPrints/Document) is commited to the database.

This allows you to create "compound" fields. Such fields are created by processing the values of other fields rather than being edited directly.

For example, if you wanted to make an automatic int field which contains the number of authors, you could add the following to set_eprint_automatic_fields:

# no authors at all will be undef, not [] so check first
if( $eprint->is_set( "authors" ) )
       my $auths = $eprint->get_value( "authors" );
       $eprint->set_value( "authcount" , scalar @{$auths} );
       $eprint->set_value( "authcount" , 0 );


This module configures how the archive exports its data via the OAI protocol.

For more inforamtion on the how and why of OAI see http://www.openarchives.org/

OAI allows a harvestor to request the metadata from your archive and other archives to provide a federated search. The next time the harvestor harvests your archive it only has to ask for items which have changed or been added since last time it asked.

The current version of EPrints supports OAI v2.0. OAI version one is no longer supported.

The base URL for your OAI v2.0 interface will be http://archivepath/perl/oai2

If you want to use the OAI system then you need to fill in the blanks, such as policy and the OAI-id of the archive.

You may create OAI sets in a similar manner to "browse views" in ArchiveConfig.pm.

If you want to change the way that an EPrint is mapped into Dublin Core then edit the make_metadata_oai_dc - which returns a DOM XML object.

To add a new metadata type you need to add a new mapping function and add entries to the namespaces, schemas and functions items near the top of the file.


This module contains fuctions which turn data into XHTML for displaying on the web.

If you want to change the way a user info page, or an eprint "abstract" page is rendered then here's the place to do it.

There are also "full" versions of these functions which display all the internal variables and things. These are the views which the editors and admin see.

The XHTML is generated using DOM (Document Object Model), but eprints provides some functions for easily generating XHTML DOM. The only method of DOM you should need to use is appendChild - which adds an element to this element.

EPrints API functions which return XHTML objects.

Note, all text strings should be in UTF-8.


my $page = $session->make_doc_fragment(); 
my $h1 = $session->make_element( "h1" );
$h1->appendChild( $session->make_text( "Title" ) );
$page->appendChild( $h1 );
       height=>53 ) );

$page now contains:

<h1>Title</h1><img src="/images/cheese.gif" width="128" height="53" />

Many of the EPrints modules are fully documented. For an example try running:

% perldoc /opt/eprints2/perl_lib/EPrints/Archive.pm

The functions most useful to extacting and rendering information are documented here:

$session->make_text( $text )  
Returns a DOM object representing that text.
Returns a document fragment. This renders to nothing but is a container to which you can add stuff.
$session->make_element( $name, %opts )  
Makes a simple XHTML element. %opts is an optional series of attributes.

To make <h1 class="foo">...</h1> you would call:

$session->make_element( "h1", class=>"foo" );

Returns the default ruler for the archive (from ruler.xml).
$session->render_link( $uri, $target )  
Returns the XHTML element (with URI properly escaped):

<a href="uri"></a>

Which you can appendChild stuff into. If $target is specified then a target attribute is included - to make it pop up a new window.

$item->render_value( $fieldname, $showall )  
$item is either an EPrint, a User or a Document.

$fieldname is the name of the field you want to render. If $showall is 1 then ALL values are rendered in a multilang field.

$item->render_citation( $style )  
Renders the citation of the item using the citation for the item's type from the citation file.

If $style is set then it uses the citation with that id instead.

$item->render_citation_link( $style )  
This renders a citation as above, but links it to the url of the item.
This renders a simple description of the item using the default citation for this dataset eg. for eprint it uses citation type "eprint".
$session->html_phrase( $phraseid, %opts )  
Returns the item from the phrase file. If you don't care about supporting multiple languages then just use make_text instead, it's easier.

It looks first in the archive field from the current language. Then in the archive phrase file for english. Then is the system phrase file for the current language. Then is the system phrase file for the english. The %opts are a series of DOM elements to place in the "pin" items in the phrase file.

Some other useful functions you may need

$item->get_value( $fieldname, $no_id )  
Returns the value of field $fieldname from the item. An optional second parameter may be set to 1 to return the value without the "id" part, to keep things simple.
$item->is_set( $fieldname )  
Returns true if the field is set on this object, false otherwise.
Return an array of the document objects belonging to this eprint.


This module you probably won't need to change unless you want to modify how eprints does searches for words in strings.

When a record is added to the system eprints uses this module to turn a string into a list of values which are indexed. By default these are words with 3 letters or more except some predefined stop words. It also turns latin characters with acutes into the their plain ascii (no acute/grave) versions.

It then does the same with the search string and looks for these keys.


The rain in spain falls mainly on the plains.

Is turned (by default) into the keys:

rain spain fall mainly plain

Thus searching for "rain" or "plain" or "plains" or "MaiNlY" will all match this string.

You may wish to add your own "stop words". eg. If you are running an archive about badgers, a search for the word "badger" will return almost all the records.

At a more complex level you may wish to add handling for non-european character sets (I have no idea how well the default setting will work on these), or do "stemming" - removing "ed", "ing", "ies", "s" etc. from the end of words so that "land" will match "land", "landed", "landing" and "lands". (It current removes 's').

Another suggestion is using soundex or similar techniques to match words which sound similar.

Changing the indexing on a live system will require you to regenerate the indexes using the reindex script. (If you don't then some of the search results will be wrong).


This module handles validating data entered by users. Each subroutine is described in more detail in the module itself.

Each subroutine returns a list of DOM elements, each of which describing a single problem. Any problems will prevent the user from continuing with editing until they correct the problems.

As with the rendering functions, if you don't care about making this work in more than one language then you can just make the DOM items by calling $session->make_text( "problem explanation" )

The eprint & document validation routines have a flag $for_archive which, if true, indicates that the item is being checked before going into the actual archive. You can use this to force an editor to enter fields which the user may leave blank.

Validation Functions

Called for all fields. Use it to check individual field values. By default checks that url's look OK.
Check the metadata of an eprint. Use this to test dependencies between fields. eg. if you have a requirement that field "A" OR field "B" must be set.
Validate the whole eprint. The last part of the validation of an eprint.
Validate the metadata of the document (as with eprint_meta)
Validate the whole document, files and metadata.
Validate a user record.


The ciations file describes how to render an item (eprint/user/whatever) into a short piece of XHTML. Each citation has a "type". There are 3 kinds of citation:

default citation 
This is a very short description of the item. Usually "the title or failing that, the id". The type id is just the name of the dataset. eg. "eprint"
type citation 
These are richer descriptions which vary between type of eprint, user or document. The type id is dataset_type eg. eprint_preprint.
other citation 
Used by custom browse views. Any name you like.

The citation file contains a list of citation elements:

<ep:citation type="..."> Each one may contain text and tags. The text may also include the names of fields in the record being rendered. These names should be between @ symbols. eg. @authors@ or @title@. These will be replaced with a rendered version of the value in that field. (if you need an actual @ symbol for some reason two @@ with nothing inside will be rendered as a single @).

Note. The @title@ style was introduced in EPrints 2.2. Before that this file used XML entities such as &title; but this caused problems and didn't solve any. Use of entities is still supported, but deprecated.

In addition you may use XHTML elements and the following elements in the eprints namespace. These elements are always removed but they control if their contents is kept or not. Conditional elements may be placed inside each other since v2.2.

This element is replaced with an XHTML anchor linking to the item. If this citation is being rendered without a link then it is just removed (but not the contents).
The contents of this element are only preserved if we are rendering this citation as a link. Maybe an icon which you don't want if it's not a link.
The opposite of iflink.
<ep:ifset ref="fieldname">  
The contents of this element are only preserved if the field "fieldname" has a value.
<ep:ifnotset ref="fieldname">  
The contents of this element are only preserved if the field "fieldname" does not have a value.
<ep:ifmatch name="fieldname(s)" value="searchparam">  
This is the swiss army knife of the world of conditional rendering. It is also a bit complicated, and few people will need to use it. This actually works like a single search element. The attributes are:
This is the name of one or more fields, specified as in the search fields configuration. eg. "title/abstract"
This is a value to search for. Treated like the value entered in a search field.
merge (optional) 
Can be ANY or ALL. Works like the match all? in a search form.
match (optional) 
Can be IN, EQ, or EX. In, Equal or Exact. Exact on subjects means that subject, but not any below it in the heirarchy.

For example:

@year@<ep:ifmatch name="year" value="-1949"> (approx)</ep:ifmatch>

This will render (approx) after years before 1950. Neat eh?

<ep:ifnotmatch name="fieldname(s)" value="searchparam">  
Like ifmatch but only includes the values inside if the search does not match.


This file allows you to configure the types of eprint, user, document and document security level.

When you add a new type you should add it's name to the archive phrases file(s). The phraseid is "dataset_typename_typename" eg. "document_typename_pdf", and you should add a new citation to the citations file. Any fields which are not required but appear in the citation should probably be inside a <ep:ifset> so that you don't get see "UNSPECIFIED" if they are not, er, specified.

The main element is "metadatatypes". This contains a list of "dataset" elements each of which has a name attribute.

The "type" elements in user and eprint "dataset"s should contain a list of "field" elements. This describes the fields which may be edited for this type and the order that they appear on the form.

You may include system fields in this list, but be careful if you do.

Multi-page metadata (2.3.0+)

You may optionally add <page name="pagename" /> elements to the field list. These break the submission process into smaller stages. The pagename is used to identify the sub-page, for purposes of validation etc. Pages only have an effect on eprint types, not user, document etc.

See the section on paged metadata.XX

Attributes for "field" element

name (May not be ommited) 
The name of the metadata field.
If set to "yes" then this field may not be left blank. Some system fields are always required no matter how this is set.
This field only appears on the "editor" edit eprint form, not the user one. Or, in the case of the user dataset, the staff edit-user page.

The "security" dataset

This is a handy place to define the security levels. The type with no name is special. It is the "public" security type. All other types will require a valid username and password. If that username is acceptable for a given document is decided by the can_user_view_document subroutine in ArchiveConfig.pm

The "document" dataset

By default eprints requires at least one of ps, pdf, ascii or html to be uploaded before an eprint is valid. You may change this list in ArchiveConfig.pm - any more complicated conditions will have to be checked in the eprint validation subroutine.


This file contains a list of XML "phrasees". Everything eprints "says" to users is stored in this file and its system-level counterpart. If you want the site to run in more than one language, you need one phrase file per language.

The phrase file is XML and contains a toplevel "phrases" element. This contains the list of phrases.

Each phrase has a "ref" attribute to identify it and contains text and optionally some XHTML tags. It may also contain eprints entities such as &archivename; and also some phrases should contain "pin" elements, described below.

The phrases in the archive phrase file are specific to that archive, the system phrase file contains non-archive specific phrases. The id's of most of the phrases in the archive phrases are generated from the id's of the fields, datasets, types etc.

The archive phrase file contains: names of dataset types, names of metadata fields, help on entering each Ametadata field, the names of options in "set" fields, the description of different search ordering options, names of browse views, phrases used in the render and validation routines, mail which eprints sends out and phrases which override those in the system file.


Some phrases need some "pin" elements to show eprints where to insert values. Usually pins don't contain any elements but occasionally they do when they represent what to place a link around.

Overriding System Phrases

If you don't like some of the phrases in the main system phrases file you can override them by creating a phrase with the same "ref" in the archive file.

Don't edit the system file, if you upgrade eprints to a newer version it will get over-written.


EPrints sends out emails when a user registers/changes their password, when a user changes their email, when a deposited item is rejected/deleted by an editor and when the system is low on resources. These mails can be customised in the phrase file.

Make sure you wrap your text in paragraph

tags. EPrints will automatically word wrap these in the email.

elements in a mail are turned into a line of dashes.

When eprints sends a mail it will send it as plain ASCII text, unless it contains latin-1 elements, in which case it will be latin-1 encoded. If it contains unicode characters not in the latin-1 charset then it will be utf-8 encoded.


This file configures the horizontal divider which eprints uses, which is inserted in place of &ruler;

If you have no great dislike of <hr /> horizontal rulers then you can leave it alone.

You can't use entities like &frontpage; in ruler.

The static/ directory

This directory contains the static pages for the site - the frontpage, the help pages, images, the stylesheet etc.

static/ contains one directory per language, eg. en. Plus a general directory which contains files which don't need translating like images and the stylesheet.

When you run the generate_static command it copies the files for each language, and the gerneral dir, into the static site for that language.

See the generate_static documentation for more details.


This file is not used by the core eprints system. It is used by import_subjects to set up the initial subjects. For more information see the instructions for import_subjects.


This file is the shell of every page in the system. It is more or less a normal XHTML page but you can use the eprints &foo; entities in it and it should contain "pin" elements like a phrase. The pins it should contain are:

<ep:pin ref="title" />  
This is where to put the title of the page. It can be used more than once - in the title in the page header and somewhere in the body. If placing it in the title in the head of the page you must use the additional attribute textonly="yes" which only works here. It removes images from the title (which can happen if using the "Latex" mode).
<ep:pin ref="head" />  
This goes somewhere in the head of the page. It shows eprints where to insert the "meta" and "link" elements.
<ep:pin ref="pagetop" />  
This goes at the top of the body. It is sometimes used as a "target".
<ep:pin ref="page" />  
Where to place the bulk of the content of the page.


Manual Sections

Metadata Field Types

There are many different types of metadata field. The type controls how a field is rendered, indexed, searched and so forth. A field always has a type and a name property, and usually has several more. Most properties are documented on this page, but some properties are only available to certain types of field, and they are listed on the page for that field.

Some of these subclasses provide very rich features, others very simple. For example the url field works just like the text field except that it's only valid if it looks like a url and when rendered it is a hyper-link.

A metadata field describes one field of data in one type of Data Object. For example the "title" field of an EPrint Object or the "email" field in a User Object.

Every Data Object has system fields (which are set by the system, and not alterable), but the User Object and EPrint Object have additional fields which are configured on a per-repository basis.

These can be customised in the user_fields.pl and eprint_fields.pl files. Note that changing these files does not automatically modify the underlying database so should (generally) only be done before the database is created. Some metadata properties do not affect the database, and are marked as such.

If you add or remove fields, or modify a property which affects the database then you'll need to alter the database to match. In 3.0 this must be done by hand, but we have plans to build a tool to do this for you.


This is the list of useful field types. Under it is listed the other field types which are just included for completeness and are not intended to be used as part of the configuration.

Some field types inherit the properties of another, and then modify them in some way. For example the namedset field works like a set field except that it gets its options from a namedsets file not from the options=>[] in the field properties.

  • Basic metadata field - this is abstract, fields must be one of the types listed below...
    • Boolean - TRUE or FALSE (or can be unset, of course).
    • Compound - virtual field, joins together several "multiple" fields, e.g. author_name and author_email.
      • Dataobjref - references another data object.
      • Multilang - allows language variants of a field, e.g. titles in French, German and/or English.
      • Relation - stores a typed relationship with something represented by a URI.
    • Date - stores a date
      • Time - stores a date and time
    • Float - stores a floating-point value
      • Decimal - stores a decimal number. Specifying the length of number before and after the decimal point.
    • Id - like basic text field but search only finds exact matches
      • Id (case-insensive) - like Id field but search find exact matches ignoring case (use for usernames, email addresses, etc.)
      • Keywords - stores as longtext but searchable as exact individual keyword phrases
      • Recaptcha - virtual field to display a reCAPTCHA to prevent spamming of public input forms.
      • Text - the basic text field. Maximum 255 bytes. nb. uft-8 means some chars take more than one byte.
        • Longtext - like text but allows much longer text (65,000 bytes).
        • Pagerange - a range of page from one number to another.
        • Secret - used to store passwords and other secrets.
        • Set - a limited set of options
          • Namedset - like a normal set, but takes its options from a namedset configuration file.
          • Subject - possible values are taken from the Subject hierarchy.
          • Base64 - stores Base64 encoded data.
            • Image - stores image encoded in Base64 data.
      • Url - stores a URL.
      • Uuid - stores a UUID.
    • Int - an integer value
      • Bigint' - a large integer value (can be greater than 2,147,483,647 or less than -2,147,483.647).
      • Counter' - an auto-incrementing integer value.
      • Itemref - a reference to another Data Object (e.g. a user or other eprint)
      • Pagerange - a pagerange, e.g. 122-130
    • Multipart - Stores a mutiple sub-fields like a person's name.
      • Name - Stores a person's name broken up into logical parts.
    • Subobject - Stores another data object under a parent data object.

Internal-use and Deprecated Field Types


Note that true/false properties use 1 and 0 to indicate their setting.

Some properties can be temporarily set or overridden by the Workflow Format and Citation Format files.

Core Properties

Name Default Value Required Description Notes
name n/a YES This is the internal name of the field. It should only contain alphanumeric characters and underscores. It will be used to identify this field in scripts, other configuration files, in the database, and in the XML export/import system, etc. This property is not required when defining sub-fields of Compound fields where sub_name should be used. This property affects the database structure. It must be unique within the Data Object (so the EPrint Object cannot have two fields called email but the EPrint Object and User Object can each have a field with the same name.
type n/a YES This sets the type of the metafield, which in turn affects what other properties it may have. This property affects the database structure. The value must be one of the metafield types listed above.
multiple 0 NO This indicates if this field is a single value or a list of values. E.g. title is only a single Longtext field but creators is a multiple Compound field. This property affects the database structure. In the database a non-multiple field is stored in one (or more) columns in the main object table, but a multiple field gets its own table.
readonly 0 NO Whether to not make this field editable in the workflow. This is useful if you want to display the pre-generated value(s) for this field for reference whilst other fields are being edited.
sql_index 1 NO When the database is created this field indicates that an SQL index should be created to speed searching. This property affects the database structure. Different field types override the default value with the sensible option for that type of field. It is not worth putting a SQL index on a field that is only ever searched for words in it (like title or abstract) but it is worth indexing fields who's values are explicitly searched for, or where ranges are searched (e.g. Date fields, Set fields etc.). It is unlikely you will need to set this by hand. You could change it after the database has been created but this will not update the database nor have any other effect.
sub_name undef YES This is a special property which is required instead of the name property for the sub-fields inside Compound fields. This property affects the database structure. The actual name of these fields is then forced to be parent field name + '_' + sub_name. E.g. Compound field creators is a sub-field with sub_name => 'name'. In this case the actual name of the name field in the system, database etc. is creators_name/tt>.
virtual 0 NO Whether this field calculates a value or stores in in the database. Compound fields are virtual fields, whereas there sub-fields are not as they store values in the database. Other types of virtual field will require a render_value to be specified, as with no value stored the default render method will have nothing to display.
volatile 0 NO Whether the field is liable to change frequently. Setting volatile => 1 will prevent new revisions being create and avoid other post commit events from being triggered, such as re-indexing.

Rendering Properties

These properties affect how values of the metadata in this field are rendered.

Certain of these properties can be turned on temporarily by the Citation Format files - render_magicstop for example.

Name Default Value Required Description Notes
as_list undef NO Whether to display a collection sub-fields values as a table row or a separate list in the input form This is only applicable for Compound fields that have multiple => 1. It is useful where the length of the table row would exceed the width of a typical user's window.
browse_link undef NO This is the name of a view which values of this field should be linked to. E.g. if there was a Browse by Publishers view configured named pubs, then adding browse_link => 'pubs' to the publisher field would cause it to be linked into the browse view page for the named publisher whenever it is rendered.
render_custom undef NO Whether to use a pre-defined way of rendering the value for this field. E.g. for Name fields by default the name will link to the creators browse view for that name. This property can be re-used within bespoke render functions to specify whether some custom way of rendering this field' svalue (e.g. with a link) should be used.
render_dont_link 0 NO Whether rendered field is not encapsulated in a hyperlink. Currently only affects Url fields and Email fields.
render_dynamic 0 NO Whether the rendering of this field can use JavaScript to make it dynamic. limit_names_shown.pl uses this property to determine if the list of hidden creators/editors can be expanded to show all creators/editors.
render_limit undef NO How many values for this field should be displayed. limit_names_shown.pl uses this property to determine how many creators/editors to display. If undef just render all values.
render_magicstop 0 NO Whether to render a full stop at the end of this field, unless the last character is a dot, question mark or exclamation mark. This helps avoid the ugly World without Cheese?. effect you get when titles end in ? or !.
render_noreturn 0 NO Whether CR (Carriage Return) and LF (Line Feed) characters are turned into normal spaces.
render_quiet 0 NO Whether to prevent a big ugly UNSPECIFIED being rendered if field is unset. E.g. setting render_quiet => 1 on a field means it just gets rendered as nothing if it is unset.
render_single_value undef NO The value of this property is the name of a function to call to render individual values from this field. For a multiple field this is called once per value in the list of values. The function should take the following parameters: ($session, $field, $value, $object). It should return a XHTML DOM object of the rendered value.
render_value undef NO The value of this property is the name of a function to call to render the the field as a whole. As with render_single_value, but this gets passed the entire list of values (an array reference) if it is a multiple field. Parameters passed are: ( code>$session, $self, $value, $all_langs, $no_link, $object ). $all_langs indicates that all language variants should be shown and is only really useful for Multilang fields. $no_link being true is a request to place no hyperlinks in the resulting HTML. The function should return an XHTML DOM object of the rendered value.

Input and Validation Properties

Name Default Value Required Description Notes
default_value undef NO The default value to set for this field. This is mainly used for system fields. For custom fields it is better to use eprint_fields_default.pl or similar.
expanded_subjects [] NO Subjects to show un-collapsed in the subject tree field in the workflow. This is only applicable for Subject fields. All the fields listed will have their paths in the subject tree expanded so they can be seen, making them easier to find.
false_first 0 NO Display the false option before the true option in the input form. This is only applicable to Boolean fields. By default true option is always displayed before the false option.
fromform undef NO The inverse of toform. This takes the value from the form and converts it into the value that will be stored in the database. This function is passed the parameters: $value, $session, $object, $basename when $value is the value entered on the HTML form, and the return value is the value to be stored in the database. This function is not called when editing the eprint is cancelled.
get_item undef NO A bespoke function for how to lookup the Data object based on the stored value. Only applicable to Itemref fields.
help_xhtml undef NO An XHTML DOM object to use for the help text for this field. This can only be set via the Workflow Format configuration not via the metadata field directly. This is so that the workflow can conditionally change the help on a field. If you need to change the help text based on the eprint type, then you can just create a bespoke phrase with the format "eprint_fieldhelp_" + fieldname + "." + eprint_type (e.g. eprint_fieldhelp_id_number.article).
input_add_boxes 2 NO The number of rows to add when clicking the More input rows button for a field that sets multiple => 1. The default value for this property is taken from cfg.d/field_property_defaults.pl.
input_boxes 3 NO The number of input rows to initially show for a field that sets multiple => 1. The default value for this property is taken from cfg.d/field_property_defaults.pl.
input_cols 60 NO The number of columns in an HTML form field. The default value for this property is taken from cfg.d/field_property_defaults.pl. This in combination with the maxlength property determines the value for size attribute for <input> HTML fields or the cols attribute for <textarea> HTML fields (used by Longtext fields).
input_lookup_params undef NO Additional parameters to pass to the input_lookup_url. E.g. an indication of which autocomplete file to use.
input_lookup_url undef NO The URL to use for autocompletion. This is generally set using the workflow configuration rather than directly in the field configuration. The URL must be on the same server hostname as the repository.
input_ordered 1 NO Whether the ordering of values needs to be captured. In some multiple => 1 fields, such as creators, the order of the values is important and by default numbers are shown to the left of input rows and to the right are move up and move down arrows. However, with some multiple => 1 fields the order is not important, in which case you can set this to 0 to stop the arrows and numbers being shown.
input_rows 10 NO The number of rows in an HTML form field. The default value for this property is taken from cfg.d/field_property_defaults.pl This property determines the value for size attribute for <select> HTML fields (used by Set fields) or the rows attribute for <textarea> HTML fields (used by Longtext fields).
maxlength 255 NO The maximum allowed length in characters for a value. This can be a very simple validation check. Also, it may confuse users to be allowed to type in 255 characters in a field intended for something like a postcode/zipcode.
maxwords undef NO The maximum number of words that should be entered for this field. This field is only applicable to Longtext_counter fields. It does not restrict the number of words, it just displays this limit next to a dynamic counter of the number of words already entered.
render_input undef NO The name of a function which will render the input for this field. This can be difficult to use as it must return the same CGI parameters as the default input form would have. It is easiest on simple fields. The subroutine is passed the following parameters: $field, $session, $current_value, $dataset, $staff, $hidden_fields, $object, $basename ). It should return the XHTML DOM object of the chunk of HTML form.
required 0 NO Whether the field must have a value set. If this is set to 1 then the field is always marked as required, no matter what the Workflow Format configuration says.
separator undef NO What character to use to separate elements of the value for the field for purposes of search Only used by default for Keywords fields.
show_help undef NO How to display the help text for this field in the input form. Can be one of three values: always, never or toggle. Toggle (allow to expand or collapse) is used if not explicitly defined. This can only be overridden in Workflow Format configuration.
title_xhtml undef NO An XHTML DOM object to use for the title for this field. This can only be set via the Workflow Format configuration not via the metadata field directly. This is so that the workflow can conditionally change the title of a field. If you need to change the title based on the eprint type, then you can just create a bespoke phrase with the format "eprint_fieldname_" + fieldname + "." + eprint_type (e.g. eprint_fieldname_id_number.article).
toform undef NO This function is allowed to modify the current value which appears in the form. E.g. if your database stores userids in a field, but you want to allow people to edit them as usernames, then this function can be used to take the current value (a userid) and return the associated username. This value is what appears in the field in the search form. It is passed: $value, $session, $object, $basename and returns the user-facing version of $value.

Ordering, Indexing and Searching

Name Default Value Required Description Notes
make_single_value_orderkey undef NO The orderkey function (potentially language specific) used to order by this field. This property allows you to define a function or refer to a predefined function to override the default EPrints' orderkey generation. This property is passed each value from multiple fields, in turn. It is passed: $field, $value, $dataset and returns an ordervalue string.
make_value_orderkey undef NO Like make_single_value_orderkey but this is passed the array reference for a multiple field rather than just single values. It should return the orderkey string for the entire value. It is passed: $field, $value, $session, $langid, $dataset.
match EQ NO How to match the value(s) of this field against search terms. This property can be EQ, EX, IN or SET. Default EQ means treat the search term as a single string. Match only whole search term matches the field value (or one of its values if multiple => 1). This can be modified for the field in the search form configuration.
merge ALL NO Whether this field's values(s) has to match any or all of the search terms This property can be ALL or ANY, Default ALL means all search terms have to match the values(s) in this field. This can be modified for the field in the search form configuration. For certain values of match this can also be changed by the user in the search form itself.
text_index 0 NO Whether the indexer considers this field for full-text indexing. Some types of metadata field have a default of 1, e.g. Text fields and Longtext fields.
search_cols 40 NO How many characters (columns) wide the input field for searching this field. The default value for this property is taken from cfg.d/field_property_defaults.pl. If one search field searches more than one field, then the properties from the first field listed are used.

Other Properties

This may be applicable to text
Name Default Value Required Description Notes
allow_null 1 NO Whether the value(s) stored in the database when no input is entered should be NULL or an appropriate default value based on its database field type. This should generally never be set to 0 and certainly should not be changed to 0 after the field has been added to the database. You are much better off configuring a default value using eprint_fields_default.pl or similar.
can_clone 1 NO Whether the value(s) for this field should be cloned if a new record is created. This property is mostly used by system fields such as dir or datestamp. It is applicable when an eprint is cloned using the Use as template or New version action buttons.
export_as_xml 1 NO Whether the field value(s) should be exported in an XML export. This is handy to suppress either confidential or confusing fields, like the fileinfo</file> system field.
import 1 NO Whether new data objects create from an import can set values for the field. E.g. eprintid, dir are determined when the eprint record is created. Imported metadata would not choose an appropriate value for such fields.
replace_core 0 NO Whether the field configuration should replace the exisiting core (system) field with the same name This is useful for particularly bespoke requirements. It should be used with great care, as system fields are usually hard-coded because they should not be changed.
show_in_fieldlist 1 NO Whether to allow this field to appear in fields lists. If set to 0 will prevent this field appearing in Fields field lists. This is primarily to allow you to remove it from the list of fields in the user configuration which are used to control which fields appear as columns in the Items and Review screens.
show_in_html 1 NO Whether this field is not shown in the Details tab of the eprint control page. This is mostly used to hide confusing internal system fields like dir.

Internal Properties

These are set by the system. Editing them by hand will do strange things.

Name Default Value Required Description Notes
confid n/a YES The ID of the dataset to which this field belongs. The value for this property is automatically set when the field is loaded. It is used to work out what phrase ids etc. it uses.
join_path undef NO How to join a field that references a different Data Object. This should never be defined within a field's configuration. This is built by search to support building a database query to perform a user's search.
parent undef NO A reference to the actual parent Compound field object. This is set automatically as a reference to the parent field object.
parent_name undef NO The name of the parent Compound field to which a sub-field belongs. This is set automatically to be the name of the parent field.
provenance undef NO Where this field configuration was generated. Typically any field configuration either defined within a Data object or in a configuration file (in a cfg.d directory will leave this as undef by not specifying this property. However, fields created using Manage Metadata Fields will set this to user so it is clear from where this field was created.

Deprecated Properties

Do not use these!

Name Default Value Required Description Notes
input_advice_below undef NO Help text to put directly below the form input field Deprecated. Defined but no longer functional.
input_advice_right undef NO Help text to put directly to the right of the form input field Deprecated. Defined but no longer functional.
input_assist undef NO Provides input assistance. Deprecated. Defined but no longer functional.
requiredlangs [] NO The natural languages that should be used in the value(s) for this field Deprecated. Defined but no longer functional.
sql_langid undef NO The language ID for SQL. Deprecated. Defined but no longer functional.
sql_sorted 0 NO Whether SQL should be sorted. Deprecated. Defined but no longer functional.

Required Phrases

These are phrases which you need to define in the local repository phrases file to control how this field renders. Some types of field (eg. set fields) have additional phrases in addition to the ones listed below.

The actual name of the field, as it will appear to users is stored in

datasetid + "_fieldname_" + fieldname

The default help to display, when the field is being input, is stored in

datasetid + "_fieldhelp_" + fieldname

For example:

   <epp:phrase id="eprint_fieldname_abstract">Abstract</epp:phrase>
   <epp:phrase id="eprint_fieldhelp_abstract">A summary of the items content. 
      If the item has a formal abstract then that is what should be entered 
      here. No complicated text formatting is possible.</epp:phrase>


Most fields have a representation in the SQL database using one or more columns. The sub-pages for each field type give the details.


When you request (or set) a value of a metadata field, it is usually handled as a perl scalar (which is a string or number).

ALL values passed around in the API should be encoded in utf-8 or BAD THINGS may happen.

For example,

$eprint->set_value( "title", "For Us, The Living" );

Sets the title to the given string.

my $foo = $eprint->get_value( "title" );

Sets $foo to the string of the title, eg. "For Us, The Living".

Multiple Fields

If a field is set to multiple, then instead of a single value, a reference to an array of values is used. Eg.

$eprint->set_value( "corp_creators", [ "Jims Research", "Jones Research ] );

Other Exceptions

See the specific page for the full details.


Example field definitions that can be copied into an configuration file and edited as appropriates.


Manual Sections

Multi Page Metadata Input (v2.3.0+)

If you want to split the metadata input into more than one page, you can, by adding <page name="foo" /> elements in between <field> elements in metadata-types.xml.

The "name" attribute is used so that EPrints knows which page it's currently on. It can also be used to define a custom title for a page of fields, and to specify validation requirements for that page.

Metafield input page name

Eg. The title of a metadata input page is taken from the phrase "metapage_title_pagename". It may have any of the following pins:

The type of the current submission. Article, Book, or whatever.
The ID number of the current submission.
The short description of the item. Usually the title.

Per-page Validation

The simple validation will be checked for each field on the sub page. This means that an invalid URL will raise a problem and not let the submitter continue. However if you have a more complex validation issue, such as an exclusion or a co-dependancy, you will need to edit the ArchiveValidateConfig.pm config file, and edit this subroutine:

sub validate_eprint_meta_page
       my( $eprint, $session, $page, $for_archive ) = @_;

       my @problems = ();

       return @problems

The options are as for validate_eprint_meta except that $page is the sub-page to validate. @problems should be an array of XHTML objects describing any problems with the data submitted for that page.

Submission Customisation XX

Filters XX

Searches XX


Latest Tool XX

Metadata Field Render Options (v2.3.0)

Render options are settings for a metadata field which control how it is rendered (but nothing else). Some render options are only meaningful for certain types.

Setting in Metadata Fields Configuration

Render options can be specified as properties of a metadata field in ArchiveMetadataFieldsConfig.pm in which case they apply to that field (unless over ridden). In this case they are a hash reference, for example:

{ name => "creators", type => "name", render_opts=>{ order=>"gf" } },

This sets the "order" render option of the creators field to be "gf".

Setting in views and citations

Render options can also be specified in views and citations. If you don't want them to apply except in the given view or citation. For example, in citations:


Magicstop is a boolean option so this is the same as saying:


In views you can use


To make a view that browses by the values of a date field as if it were a "year" field.

Available options

Boolean options with no value default to true (1).

Boolean. Applies to text and longtext fields. If true then render the value with a full stop on the end unless the value already ends with "." "!" or "?". Handy for getting citations right.
Boolean. Applies to text and longtext fields. Turns all Carriage Return and Line Feed characters into whitespace. Handy when you have authors entering titles with linebreaks in which should only be displayed under some circumstances.
"gf" or "fg". Applies to name fields. Override how this name field will be rendered. Either "given-name family-name" or "family-name, given-name".
Boolean. If true then and the value is not set, don't print the ugly "UNSPECIFIED" just print an empty string.
"day", "month" or "year". Default is "day". Applies to date fields only. Resolution at which to deal with the dates. @foo;res=year@ will always render just the year part of the "foo" field.

Trouble Shooting

Manual Sections

Trouble Shooting

This section covers some things which can go wrong and why. If you have a suggestion for this section, let us know!

It will grow as people suggest new problems and solutions. Check the http://www.eprints.org/ website for the latest version.

Installation of EPrints and Required Software

Apache Crashes with a segmentation fault

Possible cause: apache linked against "expat" library. If you did not install apache from source then it is possible it was linked against the "expat" library. The problem arrises that it is also linked against mod_perl and when we use the XML::Parser module, that is also linked against expat. 2 expat's in one apache make it seg-fault.

(Under SuSE Linux) Apache has problems compiling the mod rewrite module

With an error something like:

In file included from mod_rewrite.c:93:
mod_rewrite.h:133: ndbm.h: No such file or directory

Possible cause: Missing the ndbm library which is required (for some reason).

Solution: It comes as part of gdbm which is free. If working from a package you need gdbm-devel to get the header files (.h files).

(under debian sarge with apache2) Apache::const can't be located correctly when executing ./configure.

Solution: execute

export PERLLIB=/usr/lib/perl5/Apache2/

in a sh environment before ./configure.

Setting Up and Configuring a New Archive

System gives a "500 Internal Error" when viewing advanced search or submitting a document

Possible cause: No Subjects, Bug in code.

Solution: Run generate_subjects

If this fails: Look at your apache error log for clues. If reporting a bug, include the errors from the apache error log (often, but now always, found at /usr/local/apache/logs/error_log)


See Common MySQL Problems with EPrints.


Solution: Build apache following the detailed instructions in the "requried software" section of the documentation.

Changes to the configuration didn't appear on the website

Possible cause: Several.

Solution: Rebuild everything by re-running (for the archive in question) generate_static, generate_views, generate_apacheconf then stop and start apache. generate_abstracts can take a long time, so don't run it unless you want to update the abstracts themselves.

Browse View page gives a "404 not found" error or fails to update.

Possible Cause: You didn't run the script which makes them!

Solution: Run generate_views, and ideally set it up to run automatically: see "Browse Views" in the installation section.

Apache takes a really long time to start (over a minute) and so do the command line scripts.

Possible Cause: EPrints loads several XML files at start up, and for some reason this requires a DNS lookup if DNS lookup is unavailable then it has to timeout.

Solution: Make sure that the machine can perform DNS look-ups.

The same page is repeatedly returned when submitting forms under Apache 2/mod_perl 2.0.0RC4

Ensure you have an up to date version of CGI (3.08+).

A Note on SELinux

Secure Linux (SELinux?) adds an additional security layer above that of Unix's U/G/O. By default RedHat installations prevent Apache from accessing files outside of /var/www/html and /tmp (and user's home directories?).

If you run your EPrints Apache server as user eprints this isn't an issue, however if you run Apache as apache you will need to run:

chcon -R -t httpd_sys_content_t /opt/eprints2/

To allow the apache process to access the eprints files (in addition to any Unix permission changes necessary).

If you don't do this, you will not be able to start Apache after you have modified it to include the eprints Apache configuration files; it will give you error messages saying that it was unable to create certain files.

(Ref. http://www.cavebear.com/cbblog-archives/000148.html)


Manual Sections

Why Backup?

It is almost certain that you will be storing valuable information in your Eprints server. Even assuming that the EPrints code is 100% bug free and that you will never delete 8000 records when you run the wrong script at 3am, you still need to back up! Drives and fans break. Computers get stolen. Server rooms get flooded (that happened to us!). Buildings burn down (we lost an EPrints server that way).

What to Backup

You need to backup two things.

The /opt/eprints3/ directory (or whatever you called it). Not all the subdirectories have to be backed up, but it is much easier to backup the whole thing. Make sure that you back up any (symbolic) linked directory too.

Each MySQL database which your archives use. See the MySQL manual for more information on backing up MySQL databases. The mysqldump command will dump the whole of a database as a big list of SQL commands to re-create it.

Best Practice

We strongly recommend that you:

  • Regularly backup your EPrints archive and database.
  • Keep multiple sets of backups following the rule '3 - 2 - 1', i.e.
    • keep 3 sets (1 original + 2 copies)
    • at least on 2 media,
    • but not in 1 place!
  • Keep a recent backup physically separate from the archive - either in another room or ideally another site (s.a., e.g. take it home).
  • Regularly check that you can actually restore from your backup. It's not uncommon for people to produce a daily backup for years without checking it. When they come to need it, they discover that something has gone wrong and the backup is useless.
  • Assume that you will be restoring to different hardware - the tape drive may be stolen or melted too, and you'll be unable to get one just the same because they stopped making them! Check that your backups work on hardware other than that used to create them.
  • Decide who is responsible for backups. Their responsibilities should include making sure that the above policies are implemented even if they are ill or unavailable and making sure that someone else knows how to take over making and restoring the backups if they leave or are hit by a bus.

If you can't do all of these, which is admittedly a lot of extra work, then do as many as you can.

Fortune favours the backed-up. It always seems to be the un-backed-up systems that have disk crashes. Life's like that ...

Contact Information

Manual Sections

Bug Report Policy

We use a Github to record bugs and issues for EPrints. You can search there or the eprints-tech mailing list for existing bugs and possible solutions.

If you identify a new bug or "issue" (issues are not bugs, but are things which could be clearer or better) please post a message to the eprints-tech mailing list - include all the information you can: what version of eprints, operating system etc.

If you think the bug has security implications (i.e. it shouldn't be made public) please email support@eprints.org.

eprints-tech Mailing List

eprints-tech is the mailing list for technical queries or feedback. It can also have general queries, but most traffic is of a technical nature.

To subscribe send an email with a blank message body to sympa@ecs.soton.ac.uk with the subject line:

SUBSCRIBE eprints-tech your name

You do not have to provide your name, so SUBSCRIBE eprints-tech should be sufficient. You can provide a single name or multiple names but it is preferable to use the format of given name followed by family name, e.g. Joe Bloggs.

For more options to manage your subscription to the eprints-tech list click here.

To view the archives of the list go to https://www.eprints.org/eptech/

August 2023 update: Currently these archives are not being updated with the latest posts to the list. We are working on sorting this out and hopefully providing a new interface to the list archive to make it easier to search.


The following twitter accounts and hashtags are currently used:

EPrints User Groups

For general or language dependent EPrints discussion


Manual Sections

Upgrade between same series versions of EPrints

If you are upgrading between the same series versions of EPrints (e.g. 3.3.15 to 3.3.16 or 3.4.2 to 3.4.3) then the upgrade process is somehwat more straightforward that upgrading between EPrints series versions (e.g. 3.3.16 to 3.4.3). This page was originally written for upgrading between 3.3 series versions, Upgrading between EPrints 3.4 versions provides additional specific information about EPrints 3.4 series.

From a Linux Package Manager

If you have installed from the Deb or RPM package, then you can just upgrade this. However, it is recommended that you have done a backup before doing this. Also if this is on a production server, that you do this during an extended period of scheduled downtime and have previously tested the upgrade on a pre-production server.

Once the package upgrade has completed make sure to run both epadmin upgrade and epadmin_upgrade on all archives. The package manager may only tell you need to run epadmin upgrade but epadmin update is also need to ensure newly added fields to data objects are also incorporated.

/opt/eprints3/bin/epadmin upgrade ARCHIVEID
/opt/eprints3/bin/epadmin update ARCHIVEID

From GitHub

If you have installed with GitHub, the preferred method for production repositories, you can update to the latest version using the following commands, substituting y for the version you want and x for the 3 or 4 depending on whether you are upgrading between EPrints 3.3 or 3.4 series:

git fetch origin
git merge tags/v3.x.y

This should warn you if you have made any locally uncommitted changes. You may need to use git stash to move these temporarily and reintegrate after updating from GitHub.

Once updated to the intended tagged release version any resolved and local uncommitted changes then make sure to run both epadmin upgrade and epadmin_upgrade on all archives and then reload the webserver and EPrints indexer, e.g.

/opt/eprints3/bin/epadmin upgrade ARCHIVEID
/opt/eprints3/bin/epadmin update ARCHIVEID
apachectl graceful
/opt/eprints3/bin indexer restart

From a Tarball

Although not impossible it is strongly discouraged to upgrade from a tarball. One major reason not to do this is that if you have made any local changes other than to your original archive's configuration, it will be difficult to ensure all these modifications are retained. However, if you wish to do this, you can follow the instructions below, (as the eprints user unless otherwise stated):

Downloading and Unpacking 3.3

  1. Download the tarball from files.eprints.org, (e.g. https://files.eprints.org/2306/5/eprints-3.3.16.tar.gz)
  2. As the root user, unpack the tarball under the same parent path as your existing EPrints and change the ownership of this whole directory structure
  3. cd /opt
    tar -xzvf eprints-3.3.16.tar.gz
    mv eprints-3.3.16 eprints3_new
    chown -R eprints:eprints /opt/eprints3_new

Downloading and Unpacking 3.4

  1. Download the main tarball from files.eprints.org, (e.g. https://files.eprints.org/2551/7/eprints-3.4.3.tar.gz)
  2. Download the publications tarball from files.eprints.org, (e.g. https://files.eprints.org/2551/8/eprints-3.4.3-flavours.tar.gz)
  3. As the root user, unpack the tarball under the same parent path as your existing EPrints and change the ownership of this whole directory structure
  4. cd /opt
    tar -xzvf /home/eprints/eprints-3.4.3.tar.gz
    tar -xzvf /home/eprints/eprints-3.4.3-flavours.tar.gz
    mv eprints-3.4.3 eprints3_new
    chown -R eprints:eprints /opt/eprints3_new

Installing and Configuring

  1. As the root user, stop Apache and EPrints' indexer.
  2. apachectl graceful-stop
    /opt/eprints3/bin/epindexer stop
  3. Move the archive(s) from your old EPrints to your new one, e.g.
  4. mv /opt/eprints3/archives/* /opt/eprints3_new/archives/
  5. Copy SystemSettings.pm to your new EPrints as well as site-wide settings
  6. cp -p /opt/eprints3/perl_lib/EPrints/SystemSettings.pm /opt/eprints3_new/perl_lib/EPrints/
    cp -pR /opt/eprints3/cfg /opt/eprints3_new/cfg
    cp -pR /opt/eprints3/site_lib /opt/eprints3_new/site_lib
  7. If you are aware of any local modifications to files under your EPrints path (e.g. /opt/eprints3) but outside the archives directory then compare these files using diff to work out how your local changes can be integrated. E.g.
  8. diff /opt/eprints3/perl_lib/EPrints/MetaField/Date.pm /opt/eprints3_new/perl_lib/EPrints/MetaField/Date.pm 
  9. As the root user, switch round the old and new versions of EPrints, e.g.
  10. mv /opt/eprints3 /opt/eprints3_old
    mv /opt/eprints3_new /opt/eprints3
  11. Run epadmin test to check there are no configuration issues, e.g.
  12. /opt/eprints3/bin/epadmin test
  13. Run epadmin upgrade and epadmin update to deploy any structural changes or addition of fields to the database, e.g.
  14. /opt/eprints3/bin/epadmin upgrade ARCHIVEID
    /opt/eprints3/bin/epadmin update ARCHIVEID
  15. Restart EPrints' indexer.
  16. /opt/eprints3/bin/epindexer start
  17. As the root user, restart Apache.
  18. apachectl restart
  19. Don't forget to reinstall your Bazaar packages.

Upgrading to EPrints to 3.3

Upgrading from older 3.1 and 3.2 versions of EPrints to 3.3 is somewhat more involved. Please see the following guides:

Upgrading to EPrints 3.4

To upgrade to EPrints 3.4 you will need to already be running a version of EPrints 3.3. Ideally at least 3.3.12, which has the following guide:


Manual Sections

A Brief History of EPrints

The EPrints project was created by Professor Stevan Harnad.

April 18th 2021 
EPrints 3.4.3 released.
July 11th 2020 
EPrints 3.4.2 released.
April 5th 2019 
EPrints 3.4.1 released.
May 4th 2018 
EPrints 3.4 released.
September 16 2011 
EPrints 3.3 released.
March 10 2010 
GNU EPrints 3.2 released.
September 8 2008 
GNU EPrints 3.1 released.
December 18 2006 
GNU EPrints 3.0 RC-1 released.
December 5 2006 
GNU EPrints 3.0 Beta-3 released.
November 14 2006 
GNU EPrints 3.0 Beta-2 released.
October 26 2006 
GNU EPrints 3.0 Beta-1 released.
July 25 2005 
GNU EPrints 2.3.13 released.
May 24 2005 
GNU EPrints 2.3.12 released.
March 8 2005 
GNU EPrints 2.3.11 released.
March 2 2005 
GNU EPrints 2.3.10 released.
February 17 2005 
GNU EPrints 2.3.9 released.
February 16 2005 
GNU EPrints 2.3.8 released.
November 25 2004 
GNU EPrints 2.3.7 released.
August 9 2004 
GNU EPrints 2.3.6 released.
August 6 2004 
GNU EPrints 2.3.5 released.
July 6 2004 
GNU EPrints 2.3.4 released.
March 4 2004 
GNU EPrints 2.3.3 released.
February 25 2004 
GNU EPrints 2.3.2 released.
February 5 2004 
GNU EPrints 2.3.1 released.
January 12 2004 
GNU EPrints 2.3.0 released.
October 31 2002 
GNU EPrints 2.2 (Pumpkin) released. Added subject editors and GDOME support.
July 4 2002 
GNU EPrints 2.1 (Pineapple) released. Added subscriptions and OAI 2.0 support.
July 1 2002 
EPrints offically joins GNU Project.
Apr 17 2002 
EPrints 2.0.1 (Tuna) released. Mostly bugfixes.
Feb 14 2002 
EPrints 2.0 (Olive) released.
Jan 2002 
EPrints 2 Alpha-2 (Pepperoni) released.
August 2001 
EPrints 2 Alpha-1 (Anchovy) released.
June 2001 
Mike Jewell joins EPrints, working primarily on installer software
January 2001 
EPrints 1.1 released, contains OAI 1.0 support

Work begins on EPrints 2

November 2000 
EPrints 1.0 released, contains OAI 0.2 support

Rob Tansley leaves the EPrints Project Christopher Gutteridge joins the EPrints Project

September 2000 
EPrints beta-2 released
June 2000 
EPrints beta-1 released

Cogprints archive created. http://cogprints.soton.ac.uk/

April 2000 
Rob Tansley begins work on EPrints
October 1999 
A turnkey repository platform promised by Stevan Harnad & Les Carr at initial OAI (UPS) meeting in Santa Fe.