Repository configuration

From EPrints Documentation
Jump to: navigation, search
Warning This page is under development as part of the EPrints 3.4 manual. It may still contain content specific to earlier versions.
Manual Sections

EPrints Archive Configuration

This section describes all the configuration files in an single archive in the EPrints system.

Primary archive configuration file

Once you have created an EPrints archive the information you entered is placed in an XML file in /usr/local/eprint2/archives/ with the name archiveid.xml - this file is documented later in this section.

Archive configuration directory

The bulk of the archive configuration is copied from /opt/eprints2/defaultcfg/ into the archives own configuration directory (usually /opt/eprints2/archives/archiveid/cfg/ This directory will usually contain the following files and directories:

This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.
apachevhost.conf (added v2.2) 
This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information. 
The general configuration items which don't fit anywhere else are in this perl module. It is described fully later in this section of documentation. This module "requires" the other 5 perl modules. They are in seperate files to make them easier to get to grips with. 
This module configures the metadata fields and the default values. 
This module configures how the archive exports itself via the Open Archives protocol. 
This module contains subroutines which handle rendering the data into XHTML (mostly) for display as webpages. 
This module handles turning UTF8 text strings into lists of index words for free text searches. 
This module contains subroutines which check the metadata for problems.
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.
One of these files for each languageid supported by this archive. These XML files describe how to turn metadata for an item into a citation (with markup). They are described fully later in this section of documentation.
One of these files for each languageid supported by this archive. These DTD files are generated automaticly just before eprints loads the archives configuration and should not be edited directly.
This XML file describes the various types of eprints, users etc. and which metadata fields are required or relevant to each. It is described fully later in this section of documentation.
One of these files for each languageid supported by this archive. These XML files contain all the phrases which are specific to this archives such as the titles of metadata fields. They are described fully later in this section of documentation.
This XML file just contains the horizontal divider used in webpages created by the system. It is described fully later in this section of documentation.
This directory contains the data needed to create the static webpages such as the homepage, and about page. It is described fully later in this section of documentation.
This file contains the initial subjects for the system. It is described fully in the documentation for import_subjects.
One of these files for each languageid supported by this archive. These XML/XHTML files describe the outline for webpages for this system. They are described fully later in this section of documentation.

XML Config Files in EPrints

This section contains some general information about the XML archive config files: template, phrases, ruler and citations. metadata-types.xml uses XML but these comments do not apply.


These files use HTML elements (and other elements too). XHTML is a fairly new version of HTML which is backwards compatable with HTML 4 but written using XML, not SGML. This means that it is much stricter but less ambiguous and easier to parse and modify. Assuming you know HTML, the main differences are as follows:

All tags must be closed 
All elements must be closed, even ones such as <li>. Tags which do not have a close tag in HTML, like <br> or <img src="foo"> still must be closed eg. <img src="foo"></img> - this can be abbreviated as: <img src="foo" />
All tags and attributes must be lower case 
Self explanitary.
Strict definition of what tags may appear within others 
Not actually checked by EPrints. It will let any rubbish past as long as it's valid XML. But that's no reason to be naughty.
All attributes must be wrapped in quotes 
In HTML the values of attributes do not have to be wrapped in quotes, but in XML (and therefore XHTML) they do.
All attributes must have a value 
In HTML some attribues do not require a value, for example <hr noshade> In XHTML it is represented as <hr noshade="noshade" />

So in summary, the HTML:

<img SRC=someurl>
<P>Foo bar</P>

should become in XHTML:

<img src="someurl" />
<hr noshade="noshade" width="2" />
<p>Foo bar</p>

And that's more or less it. See for a complete description.

Language specific files.

phrases, templates and citations have one instance per supported language. This allows the system to generate pages and emails in more than one language. Supporting a new language will require translating all the english config files currently shipped. If you do intend to do this (lots of work!) please get in touch with the eprints admin so that we can avoid duplicated effort.

Extra Entities

The XML files all use a DTD which defines a few extra entities. Entities are items in XML (or HTML) which start with "&" and end with ";" like &amp;. These additional entities come from the entities DTD file created by generate_entities. One DTD is created per language, although currently the only variation is the archive name.

The name of the archive in the current language.
The administrators email address.
The base URL of the system (without a trailing slash)
The base URL of the CGI directory (without a trailing slash)
The URL of the system homepage.
The URL of the user homepage.
The current EPrints version.
The XHTML of the standard divider.
Any XHTML character entity (since EPrints v2.1) 
You may now use any XHTML character entity, eg. &nbsp; &eacute; &euro;.
User configured entities 
You can generate your own entities by modifying the function which generates them in

None of these entities are not available in the citations file or the ruler file.

Name Spaces and XHTML

These files contain a mixture of custom tags and XHTML. To keep these distinct the XML files contain a name space definition in the first element. The pratical upshot is that all EPrints own tags have the prefix "ep:". The namespace information is actually ignored by the current version of the eprints system.

example of mixed tags (and entities):

<ep:phrase ref="lib/session:contact"><p>Feel free to contact 
<a href="mailto:&adminemail;%22>&archivename; administration</a> 
with details.</p></ep:phrase>
eprints elements: phrase
xhtml elements: p, a
eprints entities: archiveemail, archivename

The Primary Archive Configuration File

This XML file appears in the archives/ directory, usually /opt/eprints2/archives/, it describes the most very basic details about the archive. It is generated (and modified) by configure_archive and will not normally need to be edited.

EPrints looks in this directory for XML files and attempts to load them all when starting the webserver.

This file should be chmod'd so that it can not be read by random users as it contains the database password.

The top level element is "archive" which has the attribute "id" which is the id of the archive. It should be the same as the filename. If this file is foo.xml then the id should be foo.

<archive> contains a list of XML tags enclosing some text. eg.


The following tags are expected in no special order:

The hostname of this archive.
<alias redirect="yes-or-no"> 
This is optional and may be repeated. It has the attribute "redirect" which may be set to yes or no. This controls what virtual hosts are supported and if they should redirect to the main <host>.
The ISO id of a language supported by this archive. Repeatable. One of these should also be the defaultlanguage. See below.
The port number that the server is running on. Usually 80.
The directory from the root of the server name. Usually /
The filesystem path of the rest of the archive configuration.
The path to the perl module which does the main configuration (
The name of the MySQL database. Usually the same as the archive ID.
The host on which MySQL is running. Usually localhost.
An optional MySQL port, if it's not the standard one. Should be empty if we are to use the default.
An optional MySQL socket. Should be empty if we are to use the default.
The username to use when connecting to MySQL, usually "eprints".
The password to use to connect to MySQL.
One of the supported language. This is the default for this archive.
The email address of the archive administrator. I strongly suggest that this is an alias rather than a personal email address. If all your webpages contain "" and bill takes over from bob you would have to regenerate every page with "". Much better to set up an email alias or forward from "" and point it at bob (for now). Heed these words spoken from grim experience!
<archivename language="langcode"> 
The name of the archive. This has an attribute "language" the value of which is an iso language id. There should be one of these archivename elements per supported language. eg.

   <archivename language="en">White Lemur</archivename>
   <archivename language="fr">La Archive d'Lemur Blanc</archivename>

(apologies to the french, human languages aren't my strong suit)

<securehost> (since v2.2) 
Used for experiemental https support.
<securepath> (since v2.2) 
Used for experiemental https support.

This module imports the other 5 perl modules. It allows lots of little tweaks to the system, which are all commented in the file.

It includes options to hide various features you may not want and to customise the browse, search and subscription functions.

Also you can customise what each type of user can and can't do, and how they authenticate their passwords.

This configuaration file contains perl methods which are called when a session starts and ends, to log things, to generate the entities for the entities file and security on non public files.

Browse Views

The browse views are generated by the script "generate_views" and what that script does is configured by the "browse_views" item in the config.

It is a reference to a perl array [], each item of which is a hash {}.

The hash has 3 required properties and a number of optional ones.

id (required) 
The ID of this view - the view will be placed in a subdirectory of /views/ of this name. The ID is also used to identify the full name of this view in the phrase file. id=>"foo" would find it's title in the phrase "viewname_eprint_foo"
fields (required) 
The list of the names of the fields to browse, seperated by a slash "/". This should normally be a single field unless you want to merge the values of two fields. The id part of a field may be specified by appending ".id" to the fieldname.
order (required) 
A list of fields to sort by in order of priority, sepearted by slashes "/". A minus sign prefixing the fieldname "-" indicates reverse sorting on that field.
Should we make a page for the "unset" condition? A page for items which do not have a year set may be useful. But for other fields this may be meaningless. Set it to 1 for true.
Generate a file for every value, ending in ".include" which contains the XHTML of the citations of records and the number of records, but without wrapping the site standard template around it.
Normally the system generates a page like that described for "include" with a .html suffix and the site template. If nohtml is set to 1 then it won't.
Normally the citation used is that for the "type" of eprint. If this is set then that citation (from the citations file) will be used for all items. This allows for some clever stuff if you want to make page which can get sucked into another website.

Normally the system puts a paragraph tag around each citation, but if you use a custom citation this will not happen.

Do not include the count of how many items at the top of the page.
The system generates an index.html in /view/ with a list of all the browse views available. Setting nolink to 1 will hide this item.
Do not generate an index.html file in /view/foo/ listing all the values of the view and linking to their respective pages.
notimestamp (since v2.2) 
Do not add the timestamp at the bottom of the view page.
hideempty (since v2.2) 
Only applicable to subjects. This option will supress subjects which do not have any records in. This is useful on "young" archives which look very empty if you have a large subject tree and only a few records, and those clustered in 3 or 4 subjects.

The most common view is to browse by subject:

{ id=>"subject", allow_null=>0, fields=>"subjects", 
   order=>"title/authors", hideempty=>1 }

A more complex view generates a view on author & editor ID's which are not advertised but may be captured by some other software to build staff CV pages.

{ id=>"person", allow_null=>0, fields=>"", 
   nohtml=>1, nolink=>1, noindex=>1, include=>1, 
   order=>"-year/title" }

For my example person id "wh" this will generate a webpage called /view/person/wh.include (and one for each other value of authors or editors ID's) which can be captured by an external automated system.

User Privs

The user permission configuration allows you to set what types of user can and can't do. The user home page will only show a user options which they can do.

New types of user, and which data about themselves they can edit is set in metadata-fields.xml.

Permissions are set by "type" of user. By default there are 3 kinds of user: "user", "editor" and "admin".

Admin can, by default, do everything.

subscription (since EPrints v2.1) 
If included then this kind of user can create subscriptions.
Reset their password via the web registration system.
Submit items into the archive.
View the archive status page.
User can edit then approve submitted items into the main archive, or delete them, or return them to sender. Also can remove items from the archive back into the edit buffer for corrections, and move records into the deleted table (delete them).
User can perform a "staff search" of user or eprint records and view ALL the metadata.
User can edit the subject tree via the online interface.
User can edit other users records.
User can change their email address via the web interface. This is safer than allowing them to edit it directly as it ensures they cannot set it to an address which they recieve (it mails them a confirmation pin number)
This allows the sinister feature which lets you log in as someone else. It still requires a password. This is useful if you want to perform admin tasks as a super user, then log-in as a normal user to deposit items.
no_edit_own_record (since v2.2) 
This supresses the "edit my user record" option. This may be useful if you disable web-registration and import the user records from some other database.

Fields Configuration

Metadata is data about data. The information which we store to describe each record (eprint) in the system. Users also have metadata.

This module is the configuration for the metadata. This is probably the most important part of the system.

See the chapter on metadata for all the configuration options.


This section of the file contains subroutines which are called to set default values for Users, Documents and EPrints.


These functions let you set automatic fields. This allows you to make fields which are updated automatically each time the item (User/EPrints/Document) is commited to the database.

This allows you to create "compound" fields. Such fields are created by processing the values of other fields rather than being edited directly.

For example, if you wanted to make an automatic int field which contains the number of authors, you could add the following to set_eprint_automatic_fields:

# no authors at all will be undef, not [] so check first
if( $eprint->is_set( "authors" ) )
       my $auths = $eprint->get_value( "authors" );
       $eprint->set_value( "authcount" , scalar @{$auths} );
       $eprint->set_value( "authcount" , 0 );

This module configures how the archive exports its data via the OAI protocol.

For more inforamtion on the how and why of OAI see

OAI allows a harvestor to request the metadata from your archive and other archives to provide a federated search. The next time the harvestor harvests your archive it only has to ask for items which have changed or been added since last time it asked.

The current version of EPrints supports OAI v2.0. OAI version one is no longer supported.

The base URL for your OAI v2.0 interface will be http://archivepath/perl/oai2

If you want to use the OAI system then you need to fill in the blanks, such as policy and the OAI-id of the archive.

You may create OAI sets in a similar manner to "browse views" in

If you want to change the way that an EPrint is mapped into Dublin Core then edit the make_metadata_oai_dc - which returns a DOM XML object.

To add a new metadata type you need to add a new mapping function and add entries to the namespaces, schemas and functions items near the top of the file.

This module contains fuctions which turn data into XHTML for displaying on the web.

If you want to change the way a user info page, or an eprint "abstract" page is rendered then here's the place to do it.

There are also "full" versions of these functions which display all the internal variables and things. These are the views which the editors and admin see.

The XHTML is generated using DOM (Document Object Model), but eprints provides some functions for easily generating XHTML DOM. The only method of DOM you should need to use is appendChild - which adds an element to this element.

EPrints API functions which return XHTML objects.

Note, all text strings should be in UTF-8.


my $page = $session->make_doc_fragment(); 
my $h1 = $session->make_element( "h1" );
$h1->appendChild( $session->make_text( "Title" ) );
$page->appendChild( $h1 );
       height=>53 ) );

$page now contains:

<h1>Title</h1><img src="/images/cheese.gif" width="128" height="53" />

Many of the EPrints modules are fully documented. For an example try running:

% perldoc /opt/eprints2/perl_lib/EPrints/

The functions most useful to extacting and rendering information are documented here:

$session->make_text( $text )  
Returns a DOM object representing that text.
Returns a document fragment. This renders to nothing but is a container to which you can add stuff.
$session->make_element( $name, %opts )  
Makes a simple XHTML element. %opts is an optional series of attributes.

To make <h1 class="foo">...</h1> you would call:

$session->make_element( "h1", class=>"foo" );

Returns the default ruler for the archive (from ruler.xml).
$session->render_link( $uri, $target )  
Returns the XHTML element (with URI properly escaped):

<a href="uri"></a>

Which you can appendChild stuff into. If $target is specified then a target attribute is included - to make it pop up a new window.

$item->render_value( $fieldname, $showall )  
$item is either an EPrint, a User or a Document.

$fieldname is the name of the field you want to render. If $showall is 1 then ALL values are rendered in a multilang field.

$item->render_citation( $style )  
Renders the citation of the item using the citation for the item's type from the citation file.

If $style is set then it uses the citation with that id instead.

$item->render_citation_link( $style )  
This renders a citation as above, but links it to the url of the item.
This renders a simple description of the item using the default citation for this dataset eg. for eprint it uses citation type "eprint".
$session->html_phrase( $phraseid, %opts )  
Returns the item from the phrase file. If you don't care about supporting multiple languages then just use make_text instead, it's easier.

It looks first in the archive field from the current language. Then in the archive phrase file for english. Then is the system phrase file for the current language. Then is the system phrase file for the english. The %opts are a series of DOM elements to place in the "pin" items in the phrase file.

Some other useful functions you may need

$item->get_value( $fieldname, $no_id )  
Returns the value of field $fieldname from the item. An optional second parameter may be set to 1 to return the value without the "id" part, to keep things simple.
$item->is_set( $fieldname )  
Returns true if the field is set on this object, false otherwise.
Return an array of the document objects belonging to this eprint.

This module you probably won't need to change unless you want to modify how eprints does searches for words in strings.

When a record is added to the system eprints uses this module to turn a string into a list of values which are indexed. By default these are words with 3 letters or more except some predefined stop words. It also turns latin characters with acutes into the their plain ascii (no acute/grave) versions.

It then does the same with the search string and looks for these keys.


The rain in spain falls mainly on the plains.

Is turned (by default) into the keys:

rain spain fall mainly plain

Thus searching for "rain" or "plain" or "plains" or "MaiNlY" will all match this string.

You may wish to add your own "stop words". eg. If you are running an archive about badgers, a search for the word "badger" will return almost all the records.

At a more complex level you may wish to add handling for non-european character sets (I have no idea how well the default setting will work on these), or do "stemming" - removing "ed", "ing", "ies", "s" etc. from the end of words so that "land" will match "land", "landed", "landing" and "lands". (It current removes 's').

Another suggestion is using soundex or similar techniques to match words which sound similar.

Changing the indexing on a live system will require you to regenerate the indexes using the reindex script. (If you don't then some of the search results will be wrong).

This module handles validating data entered by users. Each subroutine is described in more detail in the module itself.

Each subroutine returns a list of DOM elements, each of which describing a single problem. Any problems will prevent the user from continuing with editing until they correct the problems.

As with the rendering functions, if you don't care about making this work in more than one language then you can just make the DOM items by calling $session->make_text( "problem explanation" )

The eprint & document validation routines have a flag $for_archive which, if true, indicates that the item is being checked before going into the actual archive. You can use this to force an editor to enter fields which the user may leave blank.

Validation Functions

Called for all fields. Use it to check individual field values. By default checks that url's look OK.
Check the metadata of an eprint. Use this to test dependencies between fields. eg. if you have a requirement that field "A" OR field "B" must be set.
Validate the whole eprint. The last part of the validation of an eprint.
Validate the metadata of the document (as with eprint_meta)
Validate the whole document, files and metadata.
Validate a user record.


The ciations file describes how to render an item (eprint/user/whatever) into a short piece of XHTML. Each citation has a "type". There are 3 kinds of citation:

default citation 
This is a very short description of the item. Usually "the title or failing that, the id". The type id is just the name of the dataset. eg. "eprint"
type citation 
These are richer descriptions which vary between type of eprint, user or document. The type id is dataset_type eg. eprint_preprint.
other citation 
Used by custom browse views. Any name you like.

The citation file contains a list of citation elements:

<ep:citation type="..."> Each one may contain text and tags. The text may also include the names of fields in the record being rendered. These names should be between @ symbols. eg. @authors@ or @title@. These will be replaced with a rendered version of the value in that field. (if you need an actual @ symbol for some reason two @@ with nothing inside will be rendered as a single @).

Note. The @title@ style was introduced in EPrints 2.2. Before that this file used XML entities such as &title; but this caused problems and didn't solve any. Use of entities is still supported, but deprecated.

In addition you may use XHTML elements and the following elements in the eprints namespace. These elements are always removed but they control if their contents is kept or not. Conditional elements may be placed inside each other since v2.2.

This element is replaced with an XHTML anchor linking to the item. If this citation is being rendered without a link then it is just removed (but not the contents).
The contents of this element are only preserved if we are rendering this citation as a link. Maybe an icon which you don't want if it's not a link.
The opposite of iflink.
<ep:ifset ref="fieldname">  
The contents of this element are only preserved if the field "fieldname" has a value.
<ep:ifnotset ref="fieldname">  
The contents of this element are only preserved if the field "fieldname" does not have a value.
<ep:ifmatch name="fieldname(s)" value="searchparam">  
This is the swiss army knife of the world of conditional rendering. It is also a bit complicated, and few people will need to use it. This actually works like a single search element. The attributes are:
This is the name of one or more fields, specified as in the search fields configuration. eg. "title/abstract"
This is a value to search for. Treated like the value entered in a search field.
merge (optional) 
Can be ANY or ALL. Works like the match all? in a search form.
match (optional) 
Can be IN, EQ, or EX. In, Equal or Exact. Exact on subjects means that subject, but not any below it in the heirarchy.

For example:

@year@<ep:ifmatch name="year" value="-1949"> (approx)</ep:ifmatch>

This will render (approx) after years before 1950. Neat eh?

<ep:ifnotmatch name="fieldname(s)" value="searchparam">  
Like ifmatch but only includes the values inside if the search does not match.


This file allows you to configure the types of eprint, user, document and document security level.

When you add a new type you should add it's name to the archive phrases file(s). The phraseid is "dataset_typename_typename" eg. "document_typename_pdf", and you should add a new citation to the citations file. Any fields which are not required but appear in the citation should probably be inside a <ep:ifset> so that you don't get see "UNSPECIFIED" if they are not, er, specified.

The main element is "metadatatypes". This contains a list of "dataset" elements each of which has a name attribute.

The "type" elements in user and eprint "dataset"s should contain a list of "field" elements. This describes the fields which may be edited for this type and the order that they appear on the form.

You may include system fields in this list, but be careful if you do.

Multi-page metadata (2.3.0+)

You may optionally add <page name="pagename" /> elements to the field list. These break the submission process into smaller stages. The pagename is used to identify the sub-page, for purposes of validation etc. Pages only have an effect on eprint types, not user, document etc.

See the section on paged metadata.XX

Attributes for "field" element

name (May not be ommited) 
The name of the metadata field.
If set to "yes" then this field may not be left blank. Some system fields are always required no matter how this is set.
This field only appears on the "editor" edit eprint form, not the user one. Or, in the case of the user dataset, the staff edit-user page.

The "security" dataset

This is a handy place to define the security levels. The type with no name is special. It is the "public" security type. All other types will require a valid username and password. If that username is acceptable for a given document is decided by the can_user_view_document subroutine in

The "document" dataset

By default eprints requires at least one of ps, pdf, ascii or html to be uploaded before an eprint is valid. You may change this list in - any more complicated conditions will have to be checked in the eprint validation subroutine.


This file contains a list of XML "phrasees". Everything eprints "says" to users is stored in this file and its system-level counterpart. If you want the site to run in more than one language, you need one phrase file per language.

The phrase file is XML and contains a toplevel "phrases" element. This contains the list of phrases.

Each phrase has a "ref" attribute to identify it and contains text and optionally some XHTML tags. It may also contain eprints entities such as &archivename; and also some phrases should contain "pin" elements, described below.

The phrases in the archive phrase file are specific to that archive, the system phrase file contains non-archive specific phrases. The id's of most of the phrases in the archive phrases are generated from the id's of the fields, datasets, types etc.

The archive phrase file contains: names of dataset types, names of metadata fields, help on entering each Ametadata field, the names of options in "set" fields, the description of different search ordering options, names of browse views, phrases used in the render and validation routines, mail which eprints sends out and phrases which override those in the system file.


Some phrases need some "pin" elements to show eprints where to insert values. Usually pins don't contain any elements but occasionally they do when they represent what to place a link around.

Overriding System Phrases

If you don't like some of the phrases in the main system phrases file you can override them by creating a phrase with the same "ref" in the archive file.

Don't edit the system file, if you upgrade eprints to a newer version it will get over-written.


EPrints sends out emails when a user registers/changes their password, when a user changes their email, when a deposited item is rejected/deleted by an editor and when the system is low on resources. These mails can be customised in the phrase file.

Make sure you wrap your text in paragraph

tags. EPrints will automatically word wrap these in the email.

elements in a mail are turned into a line of dashes.

When eprints sends a mail it will send it as plain ASCII text, unless it contains latin-1 elements, in which case it will be latin-1 encoded. If it contains unicode characters not in the latin-1 charset then it will be utf-8 encoded.


This file configures the horizontal divider which eprints uses, which is inserted in place of &ruler;

If you have no great dislike of <hr /> horizontal rulers then you can leave it alone.

You can't use entities like &frontpage; in ruler.

The static/ directory

This directory contains the static pages for the site - the frontpage, the help pages, images, the stylesheet etc.

static/ contains one directory per language, eg. en. Plus a general directory which contains files which don't need translating like images and the stylesheet.

When you run the generate_static command it copies the files for each language, and the gerneral dir, into the static site for that language.

See the generate_static documentation for more details.


This file is not used by the core eprints system. It is used by import_subjects to set up the initial subjects. For more information see the instructions for import_subjects.


This file is the shell of every page in the system. It is more or less a normal XHTML page but you can use the eprints &foo; entities in it and it should contain "pin" elements like a phrase. The pins it should contain are:

<ep:pin ref="title" />  
This is where to put the title of the page. It can be used more than once - in the title in the page header and somewhere in the body. If placing it in the title in the head of the page you must use the additional attribute textonly="yes" which only works here. It removes images from the title (which can happen if using the "Latex" mode).
<ep:pin ref="head" />  
This goes somewhere in the head of the page. It shows eprints where to insert the "meta" and "link" elements.
<ep:pin ref="pagetop" />  
This goes at the top of the body. It is sometimes used as a "target".
<ep:pin ref="page" />  
Where to place the bulk of the content of the page.