Entire Manual

From EPrints Documentation
Revision as of 12:18, 12 August 2009 by Tdb01r (Talk | contribs)

Jump to: navigation, search
Warning This page is under development as part of the EPrints 3.0 manual. It may still contain content specific to earlier versions. Manuals for previous versions of EPrints are also available.

This page was generated on 2019-12-9

Contents

Introduction

Manual Sections

What is EPrints?

EPrints 3 is generic repository building software developed by the University of Southampton. It is intended to create a highly configurable web-based repository.

EPrints is often used as an open archive for research papers, and the default configuration reflects this, but it is also used for other things such as images, research data, audio archives - anything that can be stored digitally.

The EPrints series began in early 2000 and is in use by over 200 sites!

Should I be installing EPrints 3, how much effort will it take?

Start by looking at http://demoprints3.eprints.org/ to get a feel for what the software does.

You can get a vanilla install up and running quite easily, installation notes on the wiki should help you over any snags relating to your operating system. You'll need a UNIX-like machine (linux is good), and a root password is helpful.

The task which will take longest is actually deciding what you want your repository to do (and not do). Many sites want to make significant customisations. EPrints creates a repository with a sensible default, but all our users want something slightly different.

Installing and configuring the software isn't too hard, and we're working on admin tools to make it even easier.

The time taken in running the archive day to day depends on your own policy. Do you want a very light touch on the data submitted or a formal review process on each item - that's up to you!

What will it run on?

We develop EPrints on Redhat Linux (both Fedora Core and Enterprise), but it is used on any number of Linux distributions, and other UNIX-like systems including OS-X. Thanks to support from Microsoft, it also runs on Windows Vista and XP.

EPrints doesn't require any unusual hardware. It's slightly easier to run on a dedicated machine, but that's not essential, and should not affect performance.

Don't forget to budget for a backup system, your data is valuable!

Required Software

Manual Sections

What Additional Software does EPrints Require?

In brief, EPrints minimally requires Apache (with mod_perl), MySQL and Perl with some extra modules. Various utilities like wget, tar and unzip would also be useful.

EPrints bundles some Perl modules which it uses, to save you installing them.


Where to get the Required Software

Apache, MySQL, Perl and mod_perl are all provided as operating system level packages that can be installed on EPrints' Recommended Platforms. If you wish to install on a platform that is not recommended, then you will need to determine the best way to install these applications. It may be possible to infer comparable packages for your platform by checking the dependencies installed on Red Hat based and Debian based Linux.


Other Tools

File uploads

wget, tar, gunzip and unzip are required to allow users to upload files as .tar.gz or .zip or to captures them from a URL.

These all come installed with most modern versions of linux. If you cannot get them working, you can remove the option by editing "archive_formats" in SystemSettings.pm

If there are problems you may need to tweak how these are invoked in SystemSettings.pm


Full Text Indexing

The EPrints indexer requires various tools to extract plain (UTF-8) text from different types of document for indexing.

The full text indexer requires various tools to index each kind of document. These tools may or may not be already installed in your system. EPrints uses these tools to build a "words" file for each document (which contains the text of the document in UTF-8). If it can't run the tool, the "words" file will be empty and EPrints will not retry creating it unless you manually remove it.

PDF

Full text indexing PDF documents requires pdftotext application provided by the poppler-utils Deb or RPM package.

Microsoft Word

Full test indexing of Microsoft Word documents is provided by the antiword Deb or RPM package. The RPM package is available through the forensics RPM repository/

HTML

Full test indexing of HTML documents requires the lynx text-based browser provided by the 'lynx Deb or RPM package.


LaTeX Tools

There is an optional feature which allows you to instruct EPrints to look in certain fields (e.g. title and abstract) for strings that look like LaTeX equations and render them as images. These tools are only required if you want to use this feature.

These are provided by the tetex-latex and ImageMagick RPMs or the texlive-base, texlive-bin and imagemagick Deb packages.

This is a "cosmetic" feature, it only affects the rendering of information, so you can always add it later if you want to save time initially.


Other Platforms

Often the best way to find certain packages of other platforms is to use a search engine to look for the package name for Red Hat or Ubuntu Linux along with the name of your platform. (E.g. antiword Arch Linux). If you platform does not have comparable packages, then the next best option is to download the software tool is the official site. Below are links to the download pages for the essential components of EPrints:

Installing MySQL

Manual Sections

Install a recent version of MySQL 3. You will need the .h and library files later to install the MySQL perl module. MySQL 4 is due soon, but we are not making plans to support it yet (if you try EPrints with MySQL 4 and it works, please let us know)

If installing from RPM you require: mysql-server, mysql-devel and mysql RPMs.

Compatability

EPrints 2.3 was tested with: 3.23.29a-gamma

Installing mod_perl

Manual Sections

Apache is the most commonly used webserver in the world, and it's free! EPrints requires Apache to be configured with mod_perl, as this allows Apache modules that are entirely written in perl, hence providing much improved efficiency.

Get Apache from http://httpd.apache.org/dist/httpd/

EPrints requires that the apache module mod_perl is enabled.

Apache with mod_perl Installation - Step by Step

  • Download mod_perl and apache sources
  • Make mod_perl, I use this command (in the modperl src dir):

% perl Makefile.PL APACHE_PREFIX=/usr/local/apache \
APACHE_SRC=../apache-1.3.14/src DO_HTTPD=1 USE_APACI=1 \
EVERYTHING=1

Remeber to change ../apache-1.3.14/src to wherever your apache source is relative to this directory. The back slashes at the end of the line allow a single command to be split over multiple lines.

  • Make and install apache. From the mod_perl src dir, I use:

% make
% make install

( mod perl should have already run the apache ./configure script for us. )

Compatability notes

EPrints 2.3 Tested with: apache 1.3.14 with mod_perl 1.25

Installing Perl Modules

Manual Sections

EPrints is currently begin developed with perl 5.6.1, there are currently no plans for to make EPrints run under perl 6 on the theory of if-it-ain't-broke-don't-fix-it.

Some perl modules are bundled with the EPrints2 package, others must be installed by you.

Installing a Perl Module

This describes the way to simple perl module, some require a bit more effort. We will use the non-existant FOO module as an example.

Some archives can be installed direct from CPAN. That's great when it works. It doesn't always work, but it's the quickest and easiest, so give it a go first. To install a perl module from CPAN run:


% perl -MCPAN -e 'install Foo::Bar'

Where Foo::Bar is the module you're installing.

I would like to make a list of which modules do/don't install OK from CPAN. If you're reading this before the end of Jan 2003, send me (Christopher Gutteridge) any comments on which ones worked, and on what operating system.

Download the archive. 
Either from cpan.org, or from the tools directory on eprints.org described at the top of this chapter. Our example archive is FOO-5.23.tar.gz.
Unpack the archive
 :

% gunzip FOO-5.23.tar.gz
% tar xf FOO-5.23.tar

Enter the directory this creates
 :

% cd FOO-5.23

Run the following commands
 :

% perl Makefile.PL
% make
% make test
% make install

Perl Modules Bundled with EPrints

You don't have to install these. They are included as part of the EPrints distribution.

XML::DOM, XML::RegExp, Filesys::DiskSpace, URI, Apache::AuthDBI, Unicode::Normalize, Proc::Reliable.

Please note that these modules are not part of the EPrints system and are only included to make things easier. Please note that XML::DOM has has a few lines commented out to prevent it requiring additional modules.

Required Perl Modules (Which you will probably have to install)

This modules are not built into EPrints - you must install them yourself. We recommend installing them in the order they are listed.

Data::ShowTable 
MySQL Interface Module requires this.
DBI 
Tested with: v1.14

MySQL Interface Module requires this.

Msql-Mysql Module 
Tested with: v1.2215

This one can be tricky. It requires access to .h and library files from MySQL. I install MySQL from source first, but some installs of MySQL don't put the lib and include dirs where this module expects. The answer to the first question is that you only need MySQL support. Under Red Hat's GNU/Linux distribution, the zlib-devel RPM should be installed before you install this module.

MIME::Base64 
Tested with: v2.11

Unicode::String requires this.

Unicode::String 
Used for Unicode support. No known problems. Tested with v2.06.
XML::Parser 
Tested with v2.30

Used to parse XML files. Requres the expat library. A .tar.gz and an RPM are available in the tools dir on eprints.org.

Apache 
The perl Apache.pm module is acutally part of mod_perl - installing mod_perl as part of Apache should also have installed the perl Apache module.

Since version 2.3.7 The modules "Apache::Request" and "Apache::Test" (aka. "libapreq") are no longer required. They were a pain to install and the software has been redesigned to not use them at all.

Required Perl Modules (Which you will probably already have)

Most PERL 5.6 or later systems should already include the following modules, but you may have to install some by hand on certain platforms.

CGI, Carp, Cwd, Data::Dumper, Digest::MD5, File::Basename, File::Copy, File::Find, File::Path, Getopt::Long, Pod::Usage, Sys::Hostname.

Installing GDOME

Manual Sections

Since EPrints 2.2 you may use either XML::DOM or XML::GDOME. XML::GDOME is recommended as it's faster and uses much less RAM, but it does require you to install a whole lot of extra libraries and perl modules. If you are running a pilot or demonstration service then XML::DOM is fine, and you can always switch over later by installing the required tools and setting the GDOME flag in perl_lib/EPrints/SystemSettings.pm

Addional Libraries Required for GDOME support

libxml2
libxml2-devel

either get the tarball from: ftp://ftp.gnome.org/pub/GNOME/sources/libxml2/

or the RPMs (but we have had problems with complex RPM dependencies):


http://rpmfind.net/linux/rpm2html/search.php?query=libxml2
http://rpmfind.net/linux/rpm2html/search.php?query=libxml2-devel

The GDOME Library

Obtain this from


http://gdome2.cs.unibo.it/#downloads

You may either use the RPMs (gdome2 and gdome2-devel) or the tarball.

Additional Perl Modules Required for GDOME support

XML-LibXML-Common
XML-NamespaceSupport
XML-GDOME

All of which are in http://www.cpan.org/modules/by-module/XML/

See Also

Installation

Manual Sections

Below describes installing EPrints on a generic Linux distribution. It is recommended that EPrints is installed on either Red Hat based or Debian based Linux.

Installation

If you are upgrading an existing installation of eprints please see the section on Upgrading section of this manual.

EPrints needs to be installed as the same user as the apache webserver runs as. We suggest you install it as user "eprints" and group "eprints". Under some UNIX platforms, creating a user and group can be done using the "adduser" command. Otherwise refer to your operating system documentation.

Unpack the eprints tar.gz file:

% gunzip eprints-3.something.tar.gz
% tar xf eprints-3.something.tar

Now run the "configure" script. This is a /bin/sh script which will attempt to locate various parts of your system such as the perl binary. It will also check your system for required components.

% cd eprints-3.something
% ./configure

By default the system installs as user and group "eprints". You will need to change this if you are not installing as either "root" or "eprints".

The configure script accepts a number of options.

--help 
List all the options (many are intended for compiled software and are ignored).

Recommended:

--prefix=PREFIX 
Where to install EPrints (or look for a version to upgrade). By default /opt/eprints3/
--with-smtp-server=[HOST] 
Use HOST to deliver mail. If the server running EPrints has an MTA such as exim or sendmail, you can specify localhost. If you do not specify this option, you will get a warning to configure it later.
--with-user=[USER] 
Install eprints to run as USER. By default "eprints".
--with-group=[GROUP] 
Install eprints to run as GROUP. By default "eprints".

Optional:

--with-perl=[PATH] 
Path of perl interpreter (in case configure can't find it, or you have more than one and want to use a specific one).
--with-virtualhost=[VIRTUALHOST] 
Use VIRTUALHOST rather than * for apache VirtualHost directives.
--with-toolpath=[PATH] 
An alternate path to search for the required binaries.
--disable-diskfree 
Disable disk free space calls. These can cause problems on some platforms, notably 64-bit.

Deprecated:

--with-apache=1 
Use Apache 1.x.x instead of 2.x.x, but EPrints 3 does not support this.

Once you are happy with your configuration you may install eprints by running install.pl:

% ./install.pl

Now you should edit the configuration file for your copy of apache. This is often /usr/local/apache/conf/http.conf or /etc/httpd/conf/httpd.conf

Add this line: (If you didn't install eprints in /opt/eprints3/ replace that with the location on your system).

Include /opt/eprints3/cfg/apache.conf

Note that this file is only available after you created your archive via epadmin create. See Running epadmin for more information on creating an archive.

You need to make sure the apache user have read/write access to the installation directory (/opt/eprints3). The user must be the same as the user you installed eprints as. We recommend to configure your apache to run as:

User eprints
Group eprints

API Overview

EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects


API: Core API

EPrints is written in the Perl language. Sometimes you may need to write your own Perl code using the EPrints Application Programming Interface (API).

Reasons you may need to write your own EPrints code:

  • customising the way the Eprints summary pages are rendered
  • writing your own command-line script to control EPrints in some way
  • writing a new CGI script (dynamic web page)
  • writing a plugin

Key Sections

  • Core API provides a basic introduction to using the EPrints API.
  • StyleGuide gives guidance on how to code in a compatible style.
  • Modules discusses the framework for Plugins you write for the community.

Principal Modules

This is a list of the principal modules in the EPrints API. These modules are the primary means of connecting to and manipulating objects in the repository.


EPrints::Repository

An repository is an eprints archive with its own website configuration and data. One install of the Eprints software can run several separate repositories. Sharing code but with totally different configurations. Before EPrints 2.4 this was known as EPrints::Archive. This was changed to avoid confusion with the eprint status of "archive".

EPrints::Database

The connection to the MySQL (or other database) back end. datasets are stored in the MySQL system, but you do not have to address it directly.

EPrints::DataSet

A dataset is a collection of items of the same type. It can be searched.

Some datasets all have the same "config id". The "config id" is used to get information about the dataset from the archive config - inbox, buffer, archive and deletion all have the same metadata fields and types.

Core datasets are:

DATASET ID COMMENT
eprint EPrint records are the core of the system.
user Users registered with the system.
subject The subject tree.
document Documents belonging to EPrints. Every document is part of an EPrint record.
subscription Subscriptions made by users. Every subscription is a part of a subscription record.
history Stores actions performed on records.

In addtion to these datasets are four virtual datasets: inbox, buffer, archive and deletion. These act just like "eprint" except that they are filtered to only contain records with those status.

Note that prior to 2.4 the "eprint" dataset was virtual, rather than "inbox", "buffer" etc. The history dataset was introduced in 2.4.

EPrints::MetaField

A single field in a dataset. Each dataset has a few "system" fields which Eprints uses to manage the system and then any number of archive specific fields which you may configure.

EPrints::DataObj

The "super class" of subjects, users, eprints and documents etc. In the very core of the system these are all treated identically and much of the configuration and methods of these classes of "thing" are identical. We use the term item to speak about the general case.

type or user-type or eprint-type 
users, eprints and documents all have a "type". This controls how they are "cited" and also for users and eprints it controls what fields may be edited, and which are required.

EPrints::DataObj::Document

A document is a single format of an eprint, e.g. HTML, PDF, PS etc. It can contain more than one file, for example HTML may contain more than one html page + image files. The actual files are stored in the filesystem. Pre 2.4 this was known as EPrints::Document.

EPrints::DataObj::EPrint

An eprint is a record in the system which has one or more documents and some metadata. Usually, more than one document is to provide the same information in multiple formats, although this is not compulsory. Pre 2.4 this was known as EPrints::EPrint.

EPrints::DataObj::SavedSearch

Some software refers to this concept as alerts.

A stored search which is performed every day/week/month and any new results are then mailed to the user who owns the subscription.

This diagram does not show "Subscription". Subscription is a subclass of DataObj (like EPrint, User etc.). A subscription is associated with one User. A user is associated with 0..n Subscription's.

EPrints::DataObj::Subject

A subject has an id and a list of who its parents are. There is a built-in subject with the id "ROOT" to act as the top level. A subject can have more than one parent to allow you to create a rich lattice, rather than just a tree, but loops are not allowed.

EPrints::DataObj::User

A user registered with the system. (NOT necessarily the author of the eprint they deposit). Pre 2.4 this was known as EPrints::User

EPrints Configuration

Warning This page is under development as part of the EPrints 3.0 manual. It may still contain content specific to earlier versions. Manuals for previous versions of EPrints are also available.
Manual Sections

This page deals with configuring the software.

See also: repository configuration

EPrints General Configuration

This section describes all the configuration files in the EPrints system which do not relate to any specific archive.

EPrints Configuration Directory

The general EPrints configuration directory is usually /opt/eprints2/cfg/ and contains the following files:

apache.conf 
This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.
auto-apache.conf 
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.
auto-apache-includes.conf 
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.
languages.xml 
This XML file contains an (exhaustive) list of all ISO language ID's and their names.
system-phrases-languageid.xml 
One of these files per language needed for any archive in this system. These files contain the phrases needed to render the website and email in each language, not counting names of things like metadata fields which vary between archives. It should not be edited by hand, but may be overridden. See the instructions on phrase files in the archive config documentation.
SystemSettings.pm 
Described below.

SystemSettings.pm

This is a perl module which is created and edited by the eprints installer script when installing or upgrading EPrints. It's found in perl-lib/EPrints/

SystemSettings contains system specific things:

base_path 
The root directory of your eprints install. Normally /opt/eprints2/
executables 
A hash of the path of various external commands such as sendmail and wget.
invocation 
A hash of how eprints is to invoke various external commands. The variables with uppercase names - $(FOO) - are replaced with parameters from eprints, the lowercase names - $(sendmail) - are replaced with the strings in executables.
archive_formats 
An array of id's of archive formats offered in the upload document page. For each their must be an entry in the archive_extension and invocation, $(DIR) is the where eprints wants the contents of the archive and $(ARC) is the archive file.
version_id  
The id of the current eprints version.
version  
The human readable version number.
user  
The UNIX user eprints will run as. Usually "eprints".
group  
The UNIX user eprints will run as. Usually "eprints".
virtualhost (Since v2.1) 
If this is set, it is used for the VirtualHostName in the Apache configuration files. (By default EPrints uses "*").
disable_df (Since v2.1) 
If this is set to 1 then this disables the parts of EPrints which use the df call (disk free). If the "configure" script tested the "df" command and found that it failed the this function will initially be set to 1, otherwise 0.
enable_gdome (Since v2.2) 
If this is set to 1 then it enables the use of the XML::GDOME module, rather than XML::DOM. XML::GDOME is faster and less memory intensive but depends on a number of other libraries and modules which are not worth installing for a trial system.

Repository Configuration

Warning This page is under development as part of the EPrints 3.0 manual. It may still contain content specific to earlier versions. Manuals for previous versions of EPrints are also available.
Manual Sections

EPrints Archive Configuration

This section describes all the configuration files in an single archive in the EPrints system.

Primary archive configuration file

Once you have created an EPrints archive the information you entered is placed in an XML file in /usr/local/eprint2/archives/ with the name archiveid.xml - this file is documented later in this section.

Archive configuration directory

The bulk of the archive configuration is copied from /opt/eprints2/defaultcfg/ into the archives own configuration directory (usually /opt/eprints2/archives/archiveid/cfg/ This directory will usually contain the following files and directories:

apache.conf 
This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.
apachevhost.conf (added v2.2) 
This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.
ArchiveConfig.pm 
The general configuration items which don't fit anywhere else are in this perl module. It is described fully later in this section of documentation. This module "requires" the other 5 perl modules. They are in seperate files to make them easier to get to grips with.
ArchiveMetadataFieldsConfig.pm 
This module configures the metadata fields and the default values.
ArchiveOAIConfig.pm 
This module configures how the archive exports itself via the Open Archives protocol.
ArchiveRenderConfig.pm 
This module contains subroutines which handle rendering the data into XHTML (mostly) for display as webpages.
ArchiveTextIndexingConfig.pm 
This module handles turning UTF8 text strings into lists of index words for free text searches.
ArchiveValidateConfig.pm 
This module contains subroutines which check the metadata for problems.
auto-apache.conf 
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.
citations-languageid.xml 
One of these files for each languageid supported by this archive. These XML files describe how to turn metadata for an item into a citation (with markup). They are described fully later in this section of documentation.
entities-languageid.dtd 
One of these files for each languageid supported by this archive. These DTD files are generated automaticly just before eprints loads the archives configuration and should not be edited directly.
metadata-types.xml 
This XML file describes the various types of eprints, users etc. and which metadata fields are required or relevant to each. It is described fully later in this section of documentation.
phrases-languageid.xml 
One of these files for each languageid supported by this archive. These XML files contain all the phrases which are specific to this archives such as the titles of metadata fields. They are described fully later in this section of documentation.
ruler.xml 
This XML file just contains the horizontal divider used in webpages created by the system. It is described fully later in this section of documentation.
static/ 
This directory contains the data needed to create the static webpages such as the homepage, and about page. It is described fully later in this section of documentation.
subjects 
This file contains the initial subjects for the system. It is described fully in the documentation for import_subjects.
template-languageid.xml 
One of these files for each languageid supported by this archive. These XML/XHTML files describe the outline for webpages for this system. They are described fully later in this section of documentation.

XML Config Files in EPrints

This section contains some general information about the XML archive config files: template, phrases, ruler and citations. metadata-types.xml uses XML but these comments do not apply.

XHTML

These files use HTML elements (and other elements too). XHTML is a fairly new version of HTML which is backwards compatable with HTML 4 but written using XML, not SGML. This means that it is much stricter but less ambiguous and easier to parse and modify. Assuming you know HTML, the main differences are as follows:

All tags must be closed 
All elements must be closed, even ones such as <li>. Tags which do not have a close tag in HTML, like <br> or <img src="foo"> still must be closed eg. <img src="foo"></img> - this can be abbreviated as: <img src="foo" />
All tags and attributes must be lower case 
Self explanitary.
Strict definition of what tags may appear within others 
Not actually checked by EPrints. It will let any rubbish past as long as it's valid XML. But that's no reason to be naughty.
All attributes must be wrapped in quotes 
In HTML the values of attributes do not have to be wrapped in quotes, but in XML (and therefore XHTML) they do.
All attributes must have a value 
In HTML some attribues do not require a value, for example <hr noshade> In XHTML it is represented as <hr noshade="noshade" />

So in summary, the HTML:

<img SRC=someurl>
<hr NOSHADE WIDTH=2>
<P>Foo bar</P>

should become in XHTML:

<img src="someurl" />
<hr noshade="noshade" width="2" />
<p>Foo bar</p>

And that's more or less it. See http://www.w3c.org/ for a complete description.

Language specific files.

phrases, templates and citations have one instance per supported language. This allows the system to generate pages and emails in more than one language. Supporting a new language will require translating all the english config files currently shipped. If you do intend to do this (lots of work!) please get in touch with the eprints admin so that we can avoid duplicated effort.

Extra Entities

The XML files all use a DTD which defines a few extra entities. Entities are items in XML (or HTML) which start with "&" and end with ";" like &amp;. These additional entities come from the entities DTD file created by generate_entities. One DTD is created per language, although currently the only variation is the archive name.

&archivename; 
The name of the archive in the current language.
&adminemail; 
The administrators email address.
&base_url; 
The base URL of the system (without a trailing slash)
&perl_url; 
The base URL of the CGI directory (without a trailing slash)
&frontpage; 
The URL of the system homepage.
&userhome; 
The URL of the user homepage.
&version; 
The current EPrints version.
&ruler; 
The XHTML of the standard divider.
Any XHTML character entity (since EPrints v2.1) 
You may now use any XHTML character entity, eg. &nbsp; &eacute; &euro;.
User configured entities 
You can generate your own entities by modifying the function which generates them in ArchiveConfig.pm

None of these entities are not available in the citations file or the ruler file.

Name Spaces and XHTML

These files contain a mixture of custom tags and XHTML. To keep these distinct the XML files contain a name space definition in the first element. The pratical upshot is that all EPrints own tags have the prefix "ep:". The namespace information is actually ignored by the current version of the eprints system.

example of mixed tags (and entities):


<ep:phrase ref="lib/session:contact"><p>Feel free to contact 
<a href="mailto:&adminemail;&quot;>&archivename; administration</a> 
with details.</p></ep:phrase>
 
eprints elements: phrase
xhtml elements: p, a
eprints entities: archiveemail, archivename

The Primary Archive Configuration File

This XML file appears in the archives/ directory, usually /opt/eprints2/archives/, it describes the most very basic details about the archive. It is generated (and modified) by configure_archive and will not normally need to be edited.

EPrints looks in this directory for XML files and attempts to load them all when starting the webserver.

This file should be chmod'd so that it can not be read by random users as it contains the database password.

The top level element is "archive" which has the attribute "id" which is the id of the archive. It should be the same as the filename. If this file is foo.xml then the id should be foo.

<archive> contains a list of XML tags enclosing some text. eg.


 <host>stoatprints.org</host>

The following tags are expected in no special order:

<host> 
The hostname of this archive.
<alias redirect="yes-or-no"> 
This is optional and may be repeated. It has the attribute "redirect" which may be set to yes or no. This controls what virtual hosts are supported and if they should redirect to the main <host>.
<language> 
The ISO id of a language supported by this archive. Repeatable. One of these should also be the defaultlanguage. See below.
<port> 
The port number that the server is running on. Usually 80.
<urlpath> 
The directory from the root of the server name. Usually /
<archiveroot> 
The filesystem path of the rest of the archive configuration.
<configmodule> 
The path to the perl module which does the main configuration (ArchiveConfig.pm)
<dbname> 
The name of the MySQL database. Usually the same as the archive ID.
<dbhost> 
The host on which MySQL is running. Usually localhost.
<dbport> 
An optional MySQL port, if it's not the standard one. Should be empty if we are to use the default.
<dbsock> 
An optional MySQL socket. Should be empty if we are to use the default.
<dbuser> 
The username to use when connecting to MySQL, usually "eprints".
<dbpass> 
The password to use to connect to MySQL.
<defaultlanguage> 
One of the supported language. This is the default for this archive.
<adminemail> 
The email address of the archive administrator. I strongly suggest that this is an alias rather than a personal email address. If all your webpages contain "bob@footle.edu" and bill takes over from bob you would have to regenerate every page with "bill@footle.edu". Much better to set up an email alias or forward from "archive-support@footle.edu" and point it at bob (for now). Heed these words spoken from grim experience!
<archivename language="langcode"> 
The name of the archive. This has an attribute "language" the value of which is an iso language id. There should be one of these archivename elements per supported language. eg.

   <archivename language="en">White Lemur</archivename>
   <archivename language="fr">La Archive d'Lemur Blanc</archivename>

(apologies to the french, human languages aren't my strong suit)

<securehost> (since v2.2) 
Used for experiemental https support.
<securepath> (since v2.2) 
Used for experiemental https support.

ArchiveConfig.pm

This module imports the other 5 perl modules. It allows lots of little tweaks to the system, which are all commented in the file.

It includes options to hide various features you may not want and to customise the browse, search and subscription functions.

Also you can customise what each type of user can and can't do, and how they authenticate their passwords.

This configuaration file contains perl methods which are called when a session starts and ends, to log things, to generate the entities for the entities file and security on non public files.

Browse Views

The browse views are generated by the script "generate_views" and what that script does is configured by the "browse_views" item in the config.

It is a reference to a perl array [], each item of which is a hash {}.

The hash has 3 required properties and a number of optional ones.

id (required) 
The ID of this view - the view will be placed in a subdirectory of /views/ of this name. The ID is also used to identify the full name of this view in the phrase file. id=>"foo" would find it's title in the phrase "viewname_eprint_foo"
fields (required) 
The list of the names of the fields to browse, seperated by a slash "/". This should normally be a single field unless you want to merge the values of two fields. The id part of a field may be specified by appending ".id" to the fieldname.
order (required) 
A list of fields to sort by in order of priority, sepearted by slashes "/". A minus sign prefixing the fieldname "-" indicates reverse sorting on that field.
allow_null 
Should we make a page for the "unset" condition? A page for items which do not have a year set may be useful. But for other fields this may be meaningless. Set it to 1 for true.
include 
Generate a file for every value, ending in ".include" which contains the XHTML of the citations of records and the number of records, but without wrapping the site standard template around it.
nohtml 
Normally the system generates a page like that described for "include" with a .html suffix and the site template. If nohtml is set to 1 then it won't.
citation 
Normally the citation used is that for the "type" of eprint. If this is set then that citation (from the citations file) will be used for all items. This allows for some clever stuff if you want to make page which can get sucked into another website.

Normally the system puts a paragraph tag around each citation, but if you use a custom citation this will not happen.

nocount 
Do not include the count of how many items at the top of the page.
nolink 
The system generates an index.html in /view/ with a list of all the browse views available. Setting nolink to 1 will hide this item.
noindex 
Do not generate an index.html file in /view/foo/ listing all the values of the view and linking to their respective pages.
notimestamp (since v2.2) 
Do not add the timestamp at the bottom of the view page.
hideempty (since v2.2) 
Only applicable to subjects. This option will supress subjects which do not have any records in. This is useful on "young" archives which look very empty if you have a large subject tree and only a few records, and those clustered in 3 or 4 subjects.

The most common view is to browse by subject:


{ id=>"subject", allow_null=>0, fields=>"subjects", 
   order=>"title/authors", hideempty=>1 }

A more complex view generates a view on author & editor ID's which are not advertised but may be captured by some other software to build staff CV pages.


{ id=>"person", allow_null=>0, fields=>"authors.id/editors.id", 
   nohtml=>1, nolink=>1, noindex=>1, include=>1, 
   order=>"-year/title" }

For my example person id "wh" this will generate a webpage called /view/person/wh.include (and one for each other value of authors or editors ID's) which can be captured by an external automated system.

User Privs

The user permission configuration allows you to set what types of user can and can't do. The user home page will only show a user options which they can do.

New types of user, and which data about themselves they can edit is set in metadata-fields.xml.

Permissions are set by "type" of user. By default there are 3 kinds of user: "user", "editor" and "admin".

Admin can, by default, do everything.

subscription (since EPrints v2.1) 
If included then this kind of user can create subscriptions.
set-password 
Reset their password via the web registration system.
deposit 
Submit items into the archive.
view-status 
View the archive status page.
editor 
User can edit then approve submitted items into the main archive, or delete them, or return them to sender. Also can remove items from the archive back into the edit buffer for corrections, and move records into the deleted table (delete them).
staff-view 
User can perform a "staff search" of user or eprint records and view ALL the metadata.
edit-subject 
User can edit the subject tree via the online interface.
edit-user 
User can edit other users records.
change-email 
User can change their email address via the web interface. This is safer than allowing them to edit it directly as it ensures they cannot set it to an address which they recieve (it mails them a confirmation pin number)
change-user 
This allows the sinister feature which lets you log in as someone else. It still requires a password. This is useful if you want to perform admin tasks as a super user, then log-in as a normal user to deposit items.
no_edit_own_record (since v2.2) 
This supresses the "edit my user record" option. This may be useful if you disable web-registration and import the user records from some other database.

ArchiveMetadataFieldsConfig.pm

Fields Configuration

Metadata is data about data. The information which we store to describe each record (eprint) in the system. Users also have metadata.

This module is the configuration for the metadata. This is probably the most important part of the system.

See the chapter on metadata for all the configuration options.

Defaults

This section of the file contains subroutines which are called to set default values for Users, Documents and EPrints.

Automatics

These functions let you set automatic fields. This allows you to make fields which are updated automatically each time the item (User/EPrints/Document) is commited to the database.

This allows you to create "compound" fields. Such fields are created by processing the values of other fields rather than being edited directly.

For example, if you wanted to make an automatic int field which contains the number of authors, you could add the following to set_eprint_automatic_fields:


# no authors at all will be undef, not [] so check first
if( $eprint->is_set( "authors" ) )
{
       my $auths = $eprint->get_value( "authors" );
       $eprint->set_value( "authcount" , scalar @{$auths} );
}
else
{
       $eprint->set_value( "authcount" , 0 );
}

ArchiveOAIConfig.pm

This module configures how the archive exports its data via the OAI protocol.

For more inforamtion on the how and why of OAI see http://www.openarchives.org/

OAI allows a harvestor to request the metadata from your archive and other archives to provide a federated search. The next time the harvestor harvests your archive it only has to ask for items which have changed or been added since last time it asked.

The current version of EPrints supports OAI v2.0. OAI version one is no longer supported.

The base URL for your OAI v2.0 interface will be http://archivepath/perl/oai2

If you want to use the OAI system then you need to fill in the blanks, such as policy and the OAI-id of the archive.

You may create OAI sets in a similar manner to "browse views" in ArchiveConfig.pm.

If you want to change the way that an EPrint is mapped into Dublin Core then edit the make_metadata_oai_dc - which returns a DOM XML object.

To add a new metadata type you need to add a new mapping function and add entries to the namespaces, schemas and functions items near the top of the file.

ArchiveRenderConfig.pm

This module contains fuctions which turn data into XHTML for displaying on the web.

If you want to change the way a user info page, or an eprint "abstract" page is rendered then here's the place to do it.

There are also "full" versions of these functions which display all the internal variables and things. These are the views which the editors and admin see.

The XHTML is generated using DOM (Document Object Model), but eprints provides some functions for easily generating XHTML DOM. The only method of DOM you should need to use is appendChild - which adds an element to this element.

EPrints API functions which return XHTML objects.

Note, all text strings should be in UTF-8.

Example:


my $page = $session->make_doc_fragment(); 
my $h1 = $session->make_element( "h1" );
$h1->appendChild( $session->make_text( "Title" ) );
$page->appendChild( $h1 );
$page->appendChild( 
   $session->make_element( 
      "img",
       src=>"/images/cheese.gif",
       width=>128,
       height=>53 ) );

$page now contains:


<h1>Title</h1><img src="/images/cheese.gif" width="128" height="53" />

Many of the EPrints modules are fully documented. For an example try running:


% perldoc /opt/eprints2/perl_lib/EPrints/Archive.pm

The functions most useful to extacting and rendering information are documented here:

$session->make_text( $text )  
Returns a DOM object representing that text.
$session->make_doc_fragment()  
Returns a document fragment. This renders to nothing but is a container to which you can add stuff.
$session->make_element( $name, %opts )  
Makes a simple XHTML element. %opts is an optional series of attributes.

To make <h1 class="foo">...</h1> you would call:

$session->make_element( "h1", class=>"foo" );

$session->render_ruler();  
Returns the default ruler for the archive (from ruler.xml).
$session->render_link( $uri, $target )  
Returns the XHTML element (with URI properly escaped):

<a href="uri"></a>

Which you can appendChild stuff into. If $target is specified then a target attribute is included - to make it pop up a new window.

$item->render_value( $fieldname, $showall )  
$item is either an EPrint, a User or a Document.

$fieldname is the name of the field you want to render. If $showall is 1 then ALL values are rendered in a multilang field.

$item->render_citation( $style )  
Renders the citation of the item using the citation for the item's type from the citation file.

If $style is set then it uses the citation with that id instead.

$item->render_citation_link( $style )  
This renders a citation as above, but links it to the url of the item.
$item->render_description()  
This renders a simple description of the item using the default citation for this dataset eg. for eprint it uses citation type "eprint".
$session->html_phrase( $phraseid, %opts )  
Returns the item from the phrase file. If you don't care about supporting multiple languages then just use make_text instead, it's easier.

It looks first in the archive field from the current language. Then in the archive phrase file for english. Then is the system phrase file for the current language. Then is the system phrase file for the english. The %opts are a series of DOM elements to place in the "pin" items in the phrase file.

Some other useful functions you may need

$item->get_value( $fieldname, $no_id )  
Returns the value of field $fieldname from the item. An optional second parameter may be set to 1 to return the value without the "id" part, to keep things simple.
$item->is_set( $fieldname )  
Returns true if the field is set on this object, false otherwise.
$eprint->get_all_documents()  
Return an array of the document objects belonging to this eprint.

ArchiveTextIndexingConfig.pm

This module you probably won't need to change unless you want to modify how eprints does searches for words in strings.

When a record is added to the system eprints uses this module to turn a string into a list of values which are indexed. By default these are words with 3 letters or more except some predefined stop words. It also turns latin characters with acutes into the their plain ascii (no acute/grave) versions.

It then does the same with the search string and looks for these keys.

Example:


The rain in spain falls mainly on the plains.

Is turned (by default) into the keys:


rain spain fall mainly plain

Thus searching for "rain" or "plain" or "plains" or "MaiNlY" will all match this string.

You may wish to add your own "stop words". eg. If you are running an archive about badgers, a search for the word "badger" will return almost all the records.

At a more complex level you may wish to add handling for non-european character sets (I have no idea how well the default setting will work on these), or do "stemming" - removing "ed", "ing", "ies", "s" etc. from the end of words so that "land" will match "land", "landed", "landing" and "lands". (It current removes 's').

Another suggestion is using soundex or similar techniques to match words which sound similar.

Changing the indexing on a live system will require you to regenerate the indexes using the reindex script. (If you don't then some of the search results will be wrong).

ArchiveValidateConfig.pm

This module handles validating data entered by users. Each subroutine is described in more detail in the module itself.

Each subroutine returns a list of DOM elements, each of which describing a single problem. Any problems will prevent the user from continuing with editing until they correct the problems.

As with the rendering functions, if you don't care about making this work in more than one language then you can just make the DOM items by calling $session->make_text( "problem explanation" )

The eprint & document validation routines have a flag $for_archive which, if true, indicates that the item is being checked before going into the actual archive. You can use this to force an editor to enter fields which the user may leave blank.

Validation Functions

validate_field 
Called for all fields. Use it to check individual field values. By default checks that url's look OK.
validate_eprint_meta 
Check the metadata of an eprint. Use this to test dependencies between fields. eg. if you have a requirement that field "A" OR field "B" must be set.
validate_eprint 
Validate the whole eprint. The last part of the validation of an eprint.
validate_document_meta 
Validate the metadata of the document (as with eprint_meta)
validate_document 
Validate the whole document, files and metadata.
validate_user 
Validate a user record.

citations-languageid.xml

The ciations file describes how to render an item (eprint/user/whatever) into a short piece of XHTML. Each citation has a "type". There are 3 kinds of citation:

default citation 
This is a very short description of the item. Usually "the title or failing that, the id". The type id is just the name of the dataset. eg. "eprint"
type citation 
These are richer descriptions which vary between type of eprint, user or document. The type id is dataset_type eg. eprint_preprint.
other citation 
Used by custom browse views. Any name you like.

The citation file contains a list of citation elements:


<ep:citation type="..."> Each one may contain text and tags. The text may also include the names of fields in the record being rendered. These names should be between @ symbols. eg. @authors@ or @title@. These will be replaced with a rendered version of the value in that field. (if you need an actual @ symbol for some reason two @@ with nothing inside will be rendered as a single @).

Note. The @title@ style was introduced in EPrints 2.2. Before that this file used XML entities such as &title; but this caused problems and didn't solve any. Use of entities is still supported, but deprecated.

In addition you may use XHTML elements and the following elements in the eprints namespace. These elements are always removed but they control if their contents is kept or not. Conditional elements may be placed inside each other since v2.2.

<ep:linkhere>  
This element is replaced with an XHTML anchor linking to the item. If this citation is being rendered without a link then it is just removed (but not the contents).
<ep:iflink>  
The contents of this element are only preserved if we are rendering this citation as a link. Maybe an icon which you don't want if it's not a link.
<ep:ifnotlink>  
The opposite of iflink.
<ep:ifset ref="fieldname">  
The contents of this element are only preserved if the field "fieldname" has a value.
<ep:ifnotset ref="fieldname">  
The contents of this element are only preserved if the field "fieldname" does not have a value.
<ep:ifmatch name="fieldname(s)" value="searchparam">  
This is the swiss army knife of the world of conditional rendering. It is also a bit complicated, and few people will need to use it. This actually works like a single search element. The attributes are:
name 
This is the name of one or more fields, specified as in the search fields configuration. eg. "title/abstract"
value 
This is a value to search for. Treated like the value entered in a search field.
merge (optional) 
Can be ANY or ALL. Works like the match all? in a search form.
match (optional) 
Can be IN, EQ, or EX. In, Equal or Exact. Exact on subjects means that subject, but not any below it in the heirarchy.

For example:

@year@<ep:ifmatch name="year" value="-1949"> (approx)</ep:ifmatch>

This will render (approx) after years before 1950. Neat eh?

<ep:ifnotmatch name="fieldname(s)" value="searchparam">  
Like ifmatch but only includes the values inside if the search does not match.

metadata-types.xml

This file allows you to configure the types of eprint, user, document and document security level.

When you add a new type you should add it's name to the archive phrases file(s). The phraseid is "dataset_typename_typename" eg. "document_typename_pdf", and you should add a new citation to the citations file. Any fields which are not required but appear in the citation should probably be inside a <ep:ifset> so that you don't get see "UNSPECIFIED" if they are not, er, specified.

The main element is "metadatatypes". This contains a list of "dataset" elements each of which has a name attribute.

The "type" elements in user and eprint "dataset"s should contain a list of "field" elements. This describes the fields which may be edited for this type and the order that they appear on the form.

You may include system fields in this list, but be careful if you do.

Multi-page metadata (2.3.0+)

You may optionally add <page name="pagename" /> elements to the field list. These break the submission process into smaller stages. The pagename is used to identify the sub-page, for purposes of validation etc. Pages only have an effect on eprint types, not user, document etc.

See the section on paged metadata.XX

Attributes for "field" element

name (May not be ommited) 
The name of the metadata field.
required 
If set to "yes" then this field may not be left blank. Some system fields are always required no matter how this is set.
staffonly 
This field only appears on the "editor" edit eprint form, not the user one. Or, in the case of the user dataset, the staff edit-user page.

The "security" dataset

This is a handy place to define the security levels. The type with no name is special. It is the "public" security type. All other types will require a valid username and password. If that username is acceptable for a given document is decided by the can_user_view_document subroutine in ArchiveConfig.pm

The "document" dataset

By default eprints requires at least one of ps, pdf, ascii or html to be uploaded before an eprint is valid. You may change this list in ArchiveConfig.pm - any more complicated conditions will have to be checked in the eprint validation subroutine.

phrases-languageid.xml

This file contains a list of XML "phrasees". Everything eprints "says" to users is stored in this file and its system-level counterpart. If you want the site to run in more than one language, you need one phrase file per language.

The phrase file is XML and contains a toplevel "phrases" element. This contains the list of phrases.

Each phrase has a "ref" attribute to identify it and contains text and optionally some XHTML tags. It may also contain eprints entities such as &archivename; and also some phrases should contain "pin" elements, described below.

The phrases in the archive phrase file are specific to that archive, the system phrase file contains non-archive specific phrases. The id's of most of the phrases in the archive phrases are generated from the id's of the fields, datasets, types etc.

The archive phrase file contains: names of dataset types, names of metadata fields, help on entering each Ametadata field, the names of options in "set" fields, the description of different search ordering options, names of browse views, phrases used in the render and validation routines, mail which eprints sends out and phrases which override those in the system file.

pins

Some phrases need some "pin" elements to show eprints where to insert values. Usually pins don't contain any elements but occasionally they do when they represent what to place a link around.

Overriding System Phrases

If you don't like some of the phrases in the main system phrases file you can override them by creating a phrase with the same "ref" in the archive file.

Don't edit the system file, if you upgrade eprints to a newer version it will get over-written.

Emails

EPrints sends out emails when a user registers/changes their password, when a user changes their email, when a deposited item is rejected/deleted by an editor and when the system is low on resources. These mails can be customised in the phrase file.

Make sure you wrap your text in paragraph

tags. EPrints will automatically word wrap these in the email.


elements in a mail are turned into a line of dashes.

When eprints sends a mail it will send it as plain ASCII text, unless it contains latin-1 elements, in which case it will be latin-1 encoded. If it contains unicode characters not in the latin-1 charset then it will be utf-8 encoded.

ruler.xml

This file configures the horizontal divider which eprints uses, which is inserted in place of &ruler;

If you have no great dislike of <hr /> horizontal rulers then you can leave it alone.

You can't use entities like &frontpage; in ruler.

The static/ directory

This directory contains the static pages for the site - the frontpage, the help pages, images, the stylesheet etc.

static/ contains one directory per language, eg. en. Plus a general directory which contains files which don't need translating like images and the stylesheet.

When you run the generate_static command it copies the files for each language, and the gerneral dir, into the static site for that language.

See the generate_static documentation for more details.

subjects

This file is not used by the core eprints system. It is used by import_subjects to set up the initial subjects. For more information see the instructions for import_subjects.

template-languageid.xml

This file is the shell of every page in the system. It is more or less a normal XHTML page but you can use the eprints &foo; entities in it and it should contain "pin" elements like a phrase. The pins it should contain are:

<ep:pin ref="title" />  
This is where to put the title of the page. It can be used more than once - in the title in the page header and somewhere in the body. If placing it in the title in the head of the page you must use the additional attribute textonly="yes" which only works here. It removes images from the title (which can happen if using the "Latex" mode).
<ep:pin ref="head" />  
This goes somewhere in the head of the page. It shows eprints where to insert the "meta" and "link" elements.
<ep:pin ref="pagetop" />  
This goes at the top of the body. It is sometimes used as a "target".
<ep:pin ref="page" />  
Where to place the bulk of the content of the page.

Metadata

Manual Sections

Metadata Field Types

There are many different types of metadata field. The type controls how a field is rendered, indexed, searched and so forth. A field always has a type and a name property, and usually has several more. Most properties are documented on this page, but some properties are only available to certain types of field, and they are listed on the page for that field.

Some of these subclasses provide very rich features, others very simple. For example the url field works just like the text field except that it's only valid if it looks like a url and when rendered it is a hyper-link.

A metadata field describes one field of data in one type of Data Object. For example the "title" field of an EPrint Object or the "email" field in a User Object.

Every Data Object has system fields (which are set by the system, and not alterable), but the User Object and EPrint Object have additional fields which are configured on a per-repository basis.

These can be customised in the user_fields.pl and eprint_fields.pl files. Note that changing these files does not automatically modify the underlying database so should (generally) only be done before the database is created. Some metadata properties do not affect the database, and are marked as such.

If you add or remove fields, or modify a property which affects the database then you'll need to alter the database to match. In 3.0 this must be done by hand, but we have plans to build a tool to do this for you.

Default values marked *config indicate that the default value for the repository may be modified in the configuration file field_property_defaults.pl


Inheritance

This is the list of useful field types. Under it is listed the other field types which are just included for completeness and are not intended to be used as part of the configuration.

Some field types inherit the properties of another, and then modify them in some way. For example the namedset field works like a set field except that it gets its options from a namedsets file not from the options=>[] in the field properties.

  • Basic metadata field - this is abstract, fields must be one of the types listed below...
    • Boolean - TRUE or FALSE (or can be unset, of course)
    • Compound - virtual field, joins together several "multiple" fields, e.g. author_name and author_email
      • Multilang - allows language variants of a field, e.g. titles in French, German and/or English.
    • Date - stores a date
      • Time - stores a date and time
    • Float - stores a floating-point value
    • Int - a positive integer value
    • Search - a serialised search
    • Set - a limited set of options
      • Namedset - like a normal set, but takes its options from a namedset configuration file.
      • Subject - possible values are taken from the Subject heirarchy.
    • Text - the basic text field. Maximum 255 bytes. nb. uft-8 means some chars take more than one byte.
      • Email - an email address
      • Longtext - like text but allows much longer text (65,000 bytes)
      • Name - Stores a person's name broken up into logical parts.
      • Url - stores a URL

Internal-use and Deprecated Field Types

  • Basic metadata field
    • File - virtual field represtenting the files in a document
    • Id
    • Int
      • Year - deprecated (do not use)
    • Search - a serialised search
    • Set
      • Arclanguage - as for set, but the options are the valid languages of this repository
      • Fields - as for set, but the options are the fields in a dataset.
      • Langid - used internally by multilang fields to store the language id.
    • Subobject - a virtual field, similar to itemref, but representing an object or objects which are sub-parts of the current object (as oppose to just related in some way)
    • Text
      • Fulltext - virtual field used to represent the full text of an eprint
      • Secret - used to store passwords

Properties

Note that true/false properties use 1 and 0 to indicate their setting.

Some properties can be temporarily set or overridden by the Workflow Format and Citation Format files.

Core Properties

name default description
name n/a This property is always required (except on sub-fields of compound fields). This property affects the database structure. This is the internal name of the field. It should only contain a-z and underscores. It will be used to identify this field in scripts, other configuration files, in the database, and in the XML export/import system, etc. It must be unique within the object (so the EPrint Object can't have two fields called "email" but the eprint object and User Object could have a field each of the same name.
type n/a This property is always required. This property affects the database structure. This sets the type of the metafield, which in turn affects what other properties it may have. The value must be one of the metafield types listed above.
multiple 0 This property affects the database structure. This indicates if this field is a single value or a list of values. eg. "title" is only a single longtext field but "creators" is a multiple name field. In the database a non-multiple field is stored in one (or more) columns in the main object table, but a multiple field gets its own table.
sql_index 1 When the database is created this field indicates that an SQL index should be created to speed searching. Different field types override the default value with the sensible option for that type of field. It's not worth putting an sql index on a field that is only ever searched for words in it (like title or abstract) but it is worth indexing fields whoes values are explicitly searched for, or where ranges are searched - date fields, set fields etc. It's unlikely you'll need to set this by hand. You could change it after the database has been created; it won't break anything. In fact, it won't do anything at all.
sub_name undef This property affects the database structure. This is a special property which is required INSTEAD of the name field for the sub fields inside compound fields. The actual name of these fields is then forced to be parent field name+"_"+sub_name. For example in compound field "creators" is a sub field with sub_name "name". In this case the actual name of the name field in the system, database etc. is creators_name.


Rendering Properties

These properties affect how values of the metadata in this field are rendered.

Certain of these properties can be turned on temporarily by the Citation Format files - render_magicstop for example.

name default description
browse_link undef This is the name of a view which values of this field should be linked to. For example if their was a browse by publishers view configured named "pubs", then adding browse_link=>"pubs" to the publisher field would cause it to be linked into the page for the named publisher whenever it is rendered.
render_quiet 0 Normally if a field is rendered an it isn't set, it is rendered as a big ugly "UNSPECIFIED". Setting render_quiet on a field means it just gets rendered as nothing if it's empty.
render_magicstop 0 If true then this renders a full stop at the end of this field. Unless the last character is a dot, question mark or exclamation mark. This helps avoid the ugly "World without Cheese?." affect you get when titles end in ? or !.
render_noreturn 0 If true then all CR and LF's are turned into normal spaces.
render_dont_link 0 Set this to true to stop this field hyperlinking itself when rendered. Currently only affects url fields and email fields.
render_single_value undef The value of this property is the name of a subroutine to call to render values from this field. For a multiple field this is called once per value in the list of values. The function should take the following parameters: ( $session, $field, $value, $object). It should return an XHTML DOM object of the rendered value.
render_value undef As with render single value, but this gets passed the entire list of values (an array reference) if it's a multiple field. Parameters passed are: ( $session, $self, $value, $all_langs, $no_link, $object ). $all_langs indicates that all language variants should be shown - only really useful for multilang fields. $no_link being true is a request to place no hyperlinks in the resulting HTML. It should return an XHTML DOM object of the rendered value.

Input and Validation Properties

name default description
required 0 If this is set to true then the field is always marked as required, no matter what the workflow says.
input_add_boxes 2 *config The number of rows to add when clicking the "more rows" button in a multiple or multilang field.
input_boxes 3 *config The number of input rows to initially show in a multiple field.
input_cols 60 *config The number of columns in a text input field.
input_rows 10 *config For longtext input fields, the number of rows of input to show. For set fields this is the number of items to show in a select menu.
input_lookup_url undef The URL to use for autocompletion. This is generally set using the workflow configuration rather than directly in the field configuration. The URL must be on the same server hostname as the repository.
input_lookup_params undef Additional parameters to pass to the input_lookup_url. For example an indication of which autocomplete file to use.
input_ordered 1 This is true by default. In some multiple fields, such as creators, the order of the values is important and by default numbers are shown to the left of input rows and to the right are "move up" and "move down" arrows. However, with some multiple fields the order is not important in which case you can set this to zero to stop the arrows and numbers being shown.
render_input undef The name of a subroutine which will render the input for this field. This is a bit tricky to use as it must return the same CGI parameters as the default input form would have. It's easiest on simple fields. The subroutine is passed the following parameters ( $field, $session, $current_value, $dataset, $staff, $hidden_fields, $object, $basename ). It should return the XHTML DOM object of the chunk of HTML form.
maxlength 255 This is a limit to the maximum allowed size of a value. It may be useful, for example, as a very simple validation check. Also it may confuse users to be allowed to type in 255 characters in a "postcode/zipcode" field.
toform undef This function is allowed to modify the current value which appears in the form. For example, if your database stores userids in a field, but you want to allow people to edit them as usernames, then this function can be used to take the current value (a userid) and return the associated username. This value is what appears in the field in the search form. It is passed ( $value, $session, $object, $basename ) and returns the user-facing version of $value. **As of version 3.3.13 this only gets passed $value and $session in Metafield.pm**
fromform undef The inverse of toform. This takes the value from the form and converts it into the value that will be stored in the database. It is passed the parameters ( $value, $session) when $value is the value entered on the web form, and the return value is the value to be stored in the database. This function is not called when editing the eprint is cancelled.
help_xhtml undef This can only be set via the Workflow Format configuration not via the metadata field directly. It is used to contain the XHTML to use as the help for this field. This is so that the workflow can conditionally change the help on a field.

Ordering, Indexing and Searching

name default description
text_index 0 If set to true the the indexer considers this field for full text indexing. Otherwise not. Some types of metadata field have a default of true, for example text and longtext.
search_cols 40 *config How many columns (characters) wide the input field for searching this type of field. If one search field searches more than one field then the properties from the first listed field are used.
make_single_value_orderkey undef The orderkey is the (potentially language specific) string used to order by this field. This property allows you to define a method to override the default eprints orderkey generation. This property is passed each value from multiple fields, in turn. It is passed ($field, $value) and returns an ordervalue string.
make_value_orderkey undef As with make_single_value_orderkey but this is passed the array reference for a multiple field rather than just single values. It should return the orderkey string for the entire value. It is passed ( $field, $value, $session, $language_id ).


Other Properties

name default description
can_clone 1 If this is set to false then this field is not copied when the object is

cloned. This is mostly used by system fields such as "dir" or "datestamp".

show_in_fieldlist 1 Set this to false to prevent this field appearing in fields field lists. This is primarily to allow you to remove it from the list of fields in the user configuration which are used to control which fields appear as columns in the Items and Review screens.
show_in_html 1 If set to false then this field is not shown in the Details tab of the eprint control page. This is mostly used to hide confusing internal system fields like "dir".
export_as_xml 1 Set this to false to prevent the field being exported in the XML export. This is handy to supress either confidential or confusing fields, like the fileinfo system field.
import 1 If set to false then new eprints can't be created with this value. For example "eprintid", "dir" and so forth have this set to false. This can also prevent fields being set when import tools are used.


Internal Properties

These are set by the system. Editing them by hand will do strange things.

name default description
parent_name undef On subfields of compound fields this is set automatically to be the name of the parent field.
confid *special This is set to the id of the dataset that this field belongs to and is used to work out what phrase ids etc. it uses.


Deprecated or Buggy Properties

Don't use these!

name default description
input_advice_right undef Do not use.
input_advice_right undef Do not use.
input_assist undef Do not use.
requiredlangs [] Do not use.
allow_null 0 Do not use. Planned for use with compound fields, but not implemented in 3.0.


Required Phrases

These are phrases which you need to define in the local repository phrases file to control how this field renders. Some types of field (eg. set fields) have additional phrases in addition to the ones listed below.

The actual name of the field, as it will appear to users is stored in

datasetid + "_fieldname_" + fieldname

The default help to display, when the field is being input, is stored in

datasetid + "_fieldhelp_" + fieldname

For example:

   <epp:phrase id="eprint_fieldname_abstract">Abstract</epp:phrase>
   <epp:phrase id="eprint_fieldhelp_abstract">A summary of the items content. 
      If the item has a formal abstract then that is what should be entered 
      here. No complicated text formatting is possible.</epp:phrase>


Database

Most fields have a representation in the SQL database using one or more columns. The sub-pages for each field type give the details.


API

When you request (or set) a value of a metadata field, it is usually handled as a perl scalar (which is a string or number).

ALL values passed around in the API should be encoded in utf-8 or BAD THINGS may happen.

For example,

$eprint->set_value( "title", "For Us, The Living" );

Sets the title to the given string.

my $foo = $eprint->get_value( "title" );

Sets $foo to the string of the title, eg. "For Us, The Living".

Multiple Fields

If a field is set to multiple, then instead of a single value, a reference to an array of values is used. Eg.

$eprint->set_value( "corp_creators", [ "Jims Research", "Jones Research ] );

Other Exceptions

See the specific page for the full details.

Functions

Manual Sections

Multi Page Metadata Input (v2.3.0+)

If you want to split the metadata input into more than one page, you can, by adding <page name="foo" /> elements in between <field> elements in metadata-types.xml.

The "name" attribute is used so that EPrints knows which page it's currently on. It can also be used to define a custom title for a page of fields, and to specify validation requirements for that page.

Metafield input page name

Eg. The title of a metadata input page is taken from the phrase "metapage_title_pagename". It may have any of the following pins:

type 
The type of the current submission. Article, Book, or whatever.
eprintid 
The ID number of the current submission.
desc 
The short description of the item. Usually the title.

Per-page Validation

The simple validation will be checked for each field on the sub page. This means that an invalid URL will raise a problem and not let the submitter continue. However if you have a more complex validation issue, such as an exclusion or a co-dependancy, you will need to edit the ArchiveValidateConfig.pm config file, and edit this subroutine:


sub validate_eprint_meta_page
{
       my( $eprint, $session, $page, $for_archive ) = @_;

       my @problems = ();

       return @problems
}

The options are as for validate_eprint_meta except that $page is the sub-page to validate. @problems should be an array of XHTML objects describing any problems with the data submitted for that page.

Submission Customisation XX

Filters XX

Searches XX

OAI XX

Latest Tool XX

Metadata Field Render Options (v2.3.0)

Render options are settings for a metadata field which control how it is rendered (but nothing else). Some render options are only meaningful for certain types.

Setting in Metadata Fields Configuration

Render options can be specified as properties of a metadata field in ArchiveMetadataFieldsConfig.pm in which case they apply to that field (unless over ridden). In this case they are a hash reference, for example:


{ name => "creators", type => "name", render_opts=>{ order=>"gf" } },

This sets the "order" render option of the creators field to be "gf".

Setting in views and citations

Render options can also be specified in views and citations. If you don't want them to apply except in the given view or citation. For example, in citations:


@title;magicstop@

Magicstop is a boolean option so this is the same as saying:


@title;magicstop=1@

In views you can use


"some_date_field;res=year"

To make a view that browses by the values of a date field as if it were a "year" field.

Available options

Boolean options with no value default to true (1).

magicstop 
Boolean. Applies to text and longtext fields. If true then render the value with a full stop on the end unless the value already ends with "." "!" or "?". Handy for getting citations right.
noreturn 
Boolean. Applies to text and longtext fields. Turns all Carriage Return and Line Feed characters into whitespace. Handy when you have authors entering titles with linebreaks in which should only be displayed under some circumstances.
order 
"gf" or "fg". Applies to name fields. Override how this name field will be rendered. Either "given-name family-name" or "family-name, given-name".
quiet 
Boolean. If true then and the value is not set, don't print the ugly "UNSPECIFIED" just print an empty string.
res 
"day", "month" or "year". Default is "day". Applies to date fields only. Resolution at which to deal with the dates. @foo;res=year@ will always render just the year part of the "foo" field.

Trouble Shooting

Manual Sections

Trouble Shooting

This section covers some things which can go wrong and why. If you have a suggestion for this section, let us know!

It will grow as people suggest new problems and solutions. Check the http://www.eprints.org/ website for the latest version.

Installation of EPrints and Required Software

Apache Crashes with a segmentation fault

Possible cause: apache linked against "expat" library. If you did not install apache from source then it is possible it was linked against the "expat" library. The problem arrises that it is also linked against mod_perl and when we use the XML::Parser module, that is also linked against expat. 2 expat's in one apache make it seg-fault.

(Under SuSE Linux) Apache has problems compiling the mod rewrite module

With an error something like:


In file included from mod_rewrite.c:93:
mod_rewrite.h:133: ndbm.h: No such file or directory

Possible cause: Missing the ndbm library which is required (for some reason).

Solution: It comes as part of gdbm which is free. If working from a package you need gdbm-devel to get the header files (.h files).

(under debian sarge with apache2) Apache::const can't be located correctly when executing ./configure.

Solution: execute

export PERLLIB=/usr/lib/perl5/Apache2/

in a sh environment before ./configure.

Setting Up and Configuring a New Archive

System gives a "500 Internal Error" when viewing advanced search or submitting a document

Possible cause: No Subjects, Bug in code.

Solution: Run generate_subjects

If this fails: Look at your apache error log for clues. If reporting a bug, include the errors from the apache error log (often, but now always, found at /usr/local/apache/logs/error_log)

General

Solution: Build apache following the detailed instructions in the "requried software" section of the documentation.

Changes to the configuration didn't appear on the website

Possible cause: Several.

Solution: Rebuild everything by re-running (for the archive in question) generate_static, generate_views, generate_apacheconf then stop and start apache. generate_abstracts can take a long time, so don't run it unless you want to update the abstracts themselves.

Browse View page gives a "404 not found" error or fails to update.

Possible Cause: You didn't run the script which makes them!

Solution: Run generate_views, and ideally set it up to run automatically: see "Browse Views" in the installation section.

Apache takes a really long time to start (over a minute) and so do the command line scripts.

Possible Cause: EPrints loads several XML files at start up, and for some reason this requires a DNS lookup if DNS lookup is unavailable then it has to timeout.

Solution: Make sure that the machine can perform DNS look-ups.


The same page is repeatedly returned when submitting forms under Apache 2/mod_perl 2.0.0RC4

Ensure you have an up to date version of CGI (3.08+).

A Note on SELinux

Secure Linux (SELinux?) adds an additional security layer above that of Unix's U/G/O. By default RedHat installations prevent Apache from accessing files outside of /var/www/html and /tmp (and user's home directories?).

If you run your EPrints Apache server as user eprints this isn't an issue, however if you run Apache as apache you will need to run:

chcon -R -t httpd_sys_content_t /opt/eprints2/

To allow the apache process to access the eprints files (in addition to any Unix permission changes necessary).

If you don't do this, you will not be able to start Apache after you have modified it to include the eprints Apache configuration files; it will give you error messages saying that it was unable to create certain files.

(Ref. http://www.cavebear.com/cbblog-archives/000148.html)

Backups

Manual Sections

Why Backup?

It is almost certain that you will be storing valuable information in your Eprints server. Even assuming that the EPrints code is 100% bug free and that you will never delete 8000 records when you run the wrong script at 3am, you still need to back up! Drives and fans break. Computers get stolen. Server rooms get flooded (that happened to us!). Buildings burn down (we lost an EPrints server that way).

What to Backup

You need to backup two things.

The /opt/eprints3/ directory (or whatever you called it). Not all the subdirectories have to be backed up, but it is much easier to backup the whole thing. Make sure that you back up any (symbolic) linked directory too.

Each MySQL database which your archives use. See the MySQL manual for more information on backing up MySQL databases. The mysqldump command will dump the whole of a database as a big list of SQL commands to re-create it.

Best Practice

We strongly recommend that you:

  • Regularly backup your EPrints archive and database.
  • Keep multiple sets of backups following the rule '3 - 2 - 1', i.e.
    • keep 3 sets (1 original + 2 copies)
    • at least on 2 media,
    • but not in 1 place!
  • Keep a recent backup physically separate from the archive - either in another room or ideally another site (s.a., e.g. take it home).
  • Regularly check that you can actually restore from your backup. It's not uncommon for people to produce a daily backup for years without checking it. When they come to need it, they discover that something has gone wrong and the backup is useless.
  • Assume that you will be restoring to different hardware - the tape drive may be stolen or melted too, and you'll be unable to get one just the same because they stopped making them! Check that your backups work on hardware other than that used to create them.
  • Decide who is responsible for backups. Their responsibilities should include making sure that the above policies are implemented even if they are ill or unavailable and making sure that someone else knows how to take over making and restoring the backups if they leave or are hit by a bus.

If you can't do all of these, which is admittedly a lot of extra work, then do as many as you can.

Fortune favours the backed-up. It always seems to be the un-backed-up systems that have disk crashes. Life's like that ...

Contact Information

Manual Sections

Bug Report Policy

We use a Github to record bugs and issues for EPrints. You can search there or the eprints-tech mailing list for existing bugs and possible solutions.

If you identify a new bug or "issue" (issues are not bugs, but are things which could be clearer or better) please post a message to the eprints-tech mailing list - include all the information you can: what version of eprints, operating system etc.

If you think the bug has security implications (i.e. it shouldn't be made public) please email support@eprints.org.

eprints-tech Mailing List

eprints-tech is the mailing list for technical queries or feedback. It can also have general queries, but most traffic is of a technical nature.

To subscribe send an email with a blank message body to eprints-tech-join@ecs.soton.ac.uk or follow the configuration instructions.

To browse the archives by month view their Mailman archives; to search use a search engine on site:http://threader.ecs.soton.ac.uk/lists/eprints_tech .

Twitter

The following twitter accounts and hashtags are currently used:

EPrints User Groups

For general or language dependent EPrints discussion

Upgrading

Manual Sections

Upgrade from 3.3.x to 3.3.y

From a Linux Package Manager

If you have installed from the Deb or RPM package, then you can just upgrade this. However, it is recommended that you have done a backup before doing this. Also if this is on a production server, that you do this during an extended period of scheduled downtime and have previously tested the upgrade on a pre-production server.

From GitHub

If you have installed with GitHub you can update to the latest version using the following commands, substituting y form the version you want:

git fetch origin
git merge tags/v3.3.y

This should warn you if you have made any locally uncommitted changes. You may need to use git stash to move these temporarily and reintegrate after updating from GitHUb.

From a Tarball

Although not impossible it is strongly discouraged to upgrade from a tarball. One major reason not to do this is that if you have made any local changes other than to your original archive's configuration, it will be difficult to ensure all these modifications are retained. However, if you wish to do this, you can follow the instructions below, (as the eprints user unless otherwise stated):

1. Download the tarball from files.eprints.org, (e.g. http://files.eprints.org/1073/2/eprints-3.3.15.tar.gz)

2. As the root user, unpack the tarball under the same parent path as your existing EPrints and change the ownership of this whole directory structure:

tar -xzvf eprints-3.3.15.tar.gz /opt/eprints3_new
chown -R eprints:eprints /opt/eprints3_new

3. Move the archive(s) from your old EPrints to your new one, e.g.:

mv /opt/eprints3/archives/* /opt/eprints3_new/archives/

4. Copy SystemSettings.pm to you new EPrints

cp /opt/eprints3/perl_lib/EPrints/SystemSettings.pm /opt/eprints3_new/perl_lib/EPrints/

5. If you are aware of any local modifications to files under your EPrints path (e.g. /opt/eprints3) but outside in the archives directory then compare these files using diff to work out how your local changes can be integrated. E.g.

diff /opt/eprints3/perl_lib/EPrints/MetaField/Date.pm /opt/eprints3_new/perl_lib/EPrints/MetaField/Date.pm 

6. As the root user, switch round the old and new versions of EPrints, e.g.

mv /opt/eprints3 /opt/eprints3_old
mv /opt/eprints3_new /opt/eprints3

7. Run epadmin test to check there are no configuration issues, e.g.:

/opt/eprints3/bin/epadmin test

8. As the root user, restart Apache.

apachectl restart


Upgrading to EPrints to 3.3

Upgrading from older 3.1 and 3.2 versions of EPrints to 3.3 is somewhat more involved. Please see the following guides:

History

Manual Sections

A Brief History of EPrints

The EPrints project was created by Professor Stevan Harnad.


September 16 2011 
EPrints 3.3 released.
March 10 2010 
GNU EPrints 3.2 released.
September 8 2008 
GNU EPrints 3.1 released.
December 18 2006 
GNU EPrints 3.0 RC-1 released.
December 5 2006 
GNU EPrints 3.0 Beta-3 released.
November 14 2006 
GNU EPrints 3.0 Beta-2 released.
October 26 2006 
GNU EPrints 3.0 Beta-1 released.
July 25 2005 
GNU EPrints 2.3.13 released.
May 24 2005 
GNU EPrints 2.3.12 released.
March 8 2005 
GNU EPrints 2.3.11 released.
March 2 2005 
GNU EPrints 2.3.10 released.
February 17 2005 
GNU EPrints 2.3.9 released.
February 16 2005 
GNU EPrints 2.3.8 released.
November 25 2004 
GNU EPrints 2.3.7 released.
August 9 2004 
GNU EPrints 2.3.6 released.
August 6 2004 
GNU EPrints 2.3.5 released.
July 6 2004 
GNU EPrints 2.3.4 released.
March 4 2004 
GNU EPrints 2.3.3 released.
February 25 2004 
GNU EPrints 2.3.2 released.
February 5 2004 
GNU EPrints 2.3.1 released.
January 12 2004 
GNU EPrints 2.3.0 released.
October 31 2002 
GNU EPrints 2.2 (Pumpkin) released. Added subject editors and GDOME support.
July 4 2002 
GNU EPrints 2.1 (Pineapple) released. Added subscriptions and OAI 2.0 support.
July 1 2002 
EPrints offically joins GNU Project.
Apr 17 2002 
EPrints 2.0.1 (Tuna) released. Mostly bugfixes.
Feb 14 2002 
EPrints 2.0 (Olive) released.
Jan 2002 
EPrints 2 Alpha-2 (Pepperoni) released.
August 2001 
EPrints 2 Alpha-1 (Anchovy) released.
June 2001 
Mike Jewell joins EPrints, working primarily on installer software
January 2001 
EPrints 1.1 released, contains OAI 1.0 support

Work begins on EPrints 2

November 2000 
EPrints 1.0 released, contains OAI 0.2 support

Rob Tansley leaves the EPrints Project Christopher Gutteridge joins the EPrints Project

September 2000 
EPrints beta-2 released
June 2000 
EPrints beta-1 released

Cogprints archive created. http://cogprints.soton.ac.uk/

April 2000 
Rob Tansley begins work on EPrints
October 1999 
A turnkey repository platform promised by Stevan Harnad & Les Carr at initial OAI (UPS) meeting in Santa Fe.