EPrints Glossary

From EPrints Documentation
Jump to: navigation, search

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

EPrints repository software uses a long list of terminology. Sometimes this terminology overloads existing terminology in similar realms. Other times different terms are used interchangeably to mean the same of similar things. This glossary is intended to help clarify the various terminology you may come across whilst installing, configuring or using EPrints repository software.

A

Abstract page

An abstract page (sometimes referred to as a summary page) is a web page for an archive providing metadata about a particular eprint (a.k.a publication record). Most prominently it displays the abstract for the publication but also its title, citation links to download associated documents and further metadata in a summary table.

Abstract pages are cached. This cache is cleared when an eprint is modified. Otherwise, these caches can be cleared, (so new cached version will be generated on next request), by running the following command or by clicking on the "Regenerate Abstracts" button in the admin menu:

EPRINTS_PATH/bin/epadmin refresh_abstracts ARCHIVEID

One or all abstracts pages can be regenerated by using the bin/generate_abstracts command.

Access

An access is a data object that represent a HTTP request accessing either an abstract page or document.

Actions tab

The actions tab appears on the view page for a data object. However it typically the term in used in the context of an eprint. This tab include buttons to carry out various action against the data object. For an eprint this may include:

  • Depositing an item.
  • Creating a new version (i.e. one that succeeds the current item).
  • Using as a template to create a completely new item based on the current item.
  • Editing an item.
  • Removing an item (i.e. full deleting it from the archive). With or without a notification.
  • Moving to the live archive.
  • Moving to the review buffer.
  • Reindexing the item.
  • Changing the depositing user of the item.
  • Exporting the item in one of various formats.

Admin menu

The admin menu is only accessible to repository admins that provides access to various adminstrative tools to manage their repository.

Advanced search

An advanced search over eprints in the live archive. Unlike simple search this provides multiple input fields to individually search against particular metadata fields. Advanced search configuration can be found in eprint_search_advanced.pl.

Archive

Often conflated with repository. EPrints repository software provides a repository that can host multiple archives. Commonly a repository will only host a single archive, so the terms are often used interchangeably.

New archives can be created using the epadmin command, as follows:

EPRINTS_PATH/bin/epadmin create <zero|pub>

Where zero is to create an archive with no flavour and pub is to create an archive using the publication flavour. See Getting Started for more information.

Content, configuration and sometimes code for an archive will be found at its archive level, i.e.

EPRINTS_PATH/archives/ARCHIVEID/

E.g. /opt/eprints3/archives/example/

Archive ID

The ID of an archive. This is set by the first input when the archive is created using the bin/epadmin command line tool. This will also be the directory name for the archive under EPrints' archives/ sub-directory. Guides on this wiki often use the placeholder ARCHIVEID.

Archive level

This is a shorthand term to describe a configuration file that appears under the archive's directory structure or to an effect that will only apply to a specific archive rather than the repository as a whole, (which may have multiple archives).

Archive name

The name of an archive. This is set by an early input when the archive is created using the bin/epadmin command line tool. WHere the value set will be copied as a phrase to the archive's archive_name.xml under archives/ARCHIVEID/cfg/lang/en/phrases/.

B

Back end pages

Same as management pages.

Bazaar

The Bazaar hosts EPrints packages of functionality (EPMs), which can be installed on a repository via the EPrints Bazaar tool in the Admin web interface.

Bazaar plugin

See EPM.

Box plugin

This is a type of plugin that generates a box containing some content or functionality, typically to be display at the top, bottom, left or right of an eprint's abstract page. A common box plugin is the Tools plugin that typically appears at the top of abstract page provides the user with various export formats for the eprint. Bazaar plugins often include box plugins such as for Altmetric and the CORE Recommender.

Branding

Branding refers to the changing of the appearance of the web pages for your repository or just a single archive. Typically to line up with your institutional branding used on its other websites. Advice is available on Branding with confidence. Branding is typically done by editing template configuration files and adding your own bespoke CSS, JavaScript and images/icons. This can be done at an archive level of by creating a theme so it can be applied to multiple archives hosted on the same repository.

Browse view

A browse view is a collection of pages that display menus and listings of for all live archive eprints in an archive, organised by a particular metadata field. By default these fields are: year, subjects, divisions and creators. All the browse views can be found under the equivalent URL for archive to the following:

https://example.eprints.org/view/

The configuration for browse views can be found in views.pl, which by default is found under flavours/pub_lib/cfg.d/ but should be copied to the archive's cfg/cfg.d/ for editing. See browse views category for more information.

Browse view pages are cached for a set period of time, by default 24 hours. However, these caches can be cleared earlier, (so new cached version will be generated on next request), by running the following command or by clicking on the "Regenerate Views" button in the admin menu:

EPRINTS_PATH/bin/epadmin refresh_views ARCHIVEID

One, some or all browse views can be regenerated by using the bin/generate_views command. Often these used as a cron job (i.e. scheduled task) so browse views caches can be updated overnight, when the server hosting the archive is less busy.

Buffer

Shorthand for Review buffer.

C

Citation

In EPrints repository software a citation means something slightly different from what in means in research publication. For the former it refers to a text string describing a publication, which may more commonly described in research publications as a reference.

EPrints repository software has citation style files, which describe how to generate the text string, not just for eprints but also other data objects, each of which can have different citation style files for different purposes. These files can be found under lib/citations/. flavours/pub_lib/citations/ and the archive's cfg/citations/ directory.

See here for more information about the citation formats and a training video about citation styles.

Compound metadata field

A Compound metadata field is a type of metadata field that is made up of several sub-fields. A good example of this is the Relationships field for an eprint data object. This has two sub-fields one for the type or relation and the second for the actual thing (represented by a URI), that the eprint is related.

Core codebase

The core codebase refers to the main code directories of the EPrints repository software. In some contexts this refers to all directories except the archives directory which contains the configuration for individual archives. In other contexts this refers to just the following directories (i.e. excludes flavours, ingredients and site_lib, as well as non-code directories).

Contact page

This is a static page that is accessible on an archive at the equivalent URL to the following:

https://example.eprints.org/contact.html

This page is intended to provided contact information, so to those accessing the archive can contact those responsible for it to report problems or ask questions.

This page can be found at EPRINTS_PATH/lib/lang/en/static/contact.xpage but should be copied to the archive's cfg/lang/en/static/ directory for editing. The email address displayed on this page comes from the $c->{adminemail} in adminemail.pl.

Contributor

Sometimes an eprint has a contributor who could not usefully be described as a creator or editor. The contributor metadata field is intended to allow these people to be associated with the eprint. As well as being able to add the contributor's family and given name, the type of contributor can be specified. This list of types is defined in EPRINTS_PATH/lib/namedsets/contributor_type. If additional types are needed, then this file should be copied to the archive's cfg/namedsets/ directory for editing.

Creator

Creator (or creators) is a metadata field for an eprint. Sometimes this is alternatively referred to as author but creator is a more generic term better suited to audiovisual/artistic as well as more traditional research publication items.

A creator typically has several separate sub-fields:

  • Given name (a.k.a. first name)
  • Family name (a.k.a. surname)
  • ID (typically email address)

CRUD API

CRUD API (Create, Read, Update Delete Application Programming Interface) refers to a programmatic interface used for the management of data objects. It is often also referred to as the REST API as when it is called over HTTP it meets the requirements of a REST API.

D

Data flavour

As of EPrints 3.4 there are different flavours to service different purposes for a repository. The data flavour is intended for repository hosted research data items, as opposed to research publications.

Data object

A data object is a single record made up of metadata fields. EPrints repository software has various types of data object. The most prominent are:

All the data objects of the same type make a dataset.

Dataset

A dataset is all the data objects of a particular type within a single repository. (E.g. eprint, user, document, etc.). Virtual datasets exist for subsets of datasets such as live archive, review buffer, etc.

Depositing user

This is the user who created the eprint and has deposited it to the review buffer.

Details tab

This is a tab in a data object's view page that lists all its metadata fields and their values. Where a workflow exists for the data objects these metadata fields are separated into separate sections, depending on the workflow stage in which they appear.

Division

This is a generic term to refer to a part of an organisation or institution, at any level of its hierarchy. A division may be a faculty, school, department, academic unit or even research group. If EPrints repository software is used outside an academic context other names for sub-divisions may be applicable.

Divisions is a default metadata field in the eprint data object. It allows one or more divisions to be selected from a hierarchical list provided by the the archive's subject tree. This can in turn be used to generate a browse view for divisions.

Document

A document is a second class data object in EPrints repository software. Documents must be part of an eprint object. Sometimes the term is used interchangeably with the actual document file (e.g. a PDF, Word document, etc.). To avoid confusion, it is best to refer to a document within EPrints as a "document object". A document object can be associated with multiple file objects. Typically because derived files are generated to provide preview and thumbnail images but also because multiple files can be uploaded for a single document object.

Document content

Document content is a metadata field for the document object. It allow selections of a single value from a pre-defined list (i.e. a named set) metadata field. The options provided are intended to help describe the version of the document (e.g. draft, submitted version, accepted, updated) within the publication lifecycle or other types of content that may be uploaded (e.g. supplemental, presentation, coverimage, metadata, bibliography, other). These are based on the Versions toolkit for authors but are analogous to other definitions, such as:

  • Author's Original Manuscript (AOM) is equivalent to submitted.
  • Accepted Manuscript (AM) is equivalent to accepted.
  • Version of Record (VoR) is equivalent to published.

Document security

Document security is a metadata field for the document object. It defines who can access document file and any files in derived document objects (e.g. preview images). By default this is one of three values:

If one of the last two option is applied with an embargo that expires on a certain date, then the security value for that document will be updated to public on that date.

Documents directory

This is a directory found under the archive's directory structure, (i.e. EPRINTS_PATH/archives/ARCHIVEID/documents/). This is where the uploaded documents for an eprint are stored under the latter's own sub-directory (e.g. eprint 1234 has documents sub-directory documents/00/00/12/34/). Each document is stored in its own two-digit sub-directory based on its pos metadata field, which is zero-padded if the number is less than 10. The sub-directory also stores derived "documents" for those uploaded (e.g. preview and thumbnail images). It also stores revision files in the eprint's own revisions sub-directory.

E

Editor

On EPrints repository software editor could refer to one of two different things:

  1. A type of EPrints user that has permissions to access some or all publication records in the review buffer to make amendments and either push the item to the live archive or return it to the user's workarea.
  2. A metadata field in the eprint data object to record editors of the publication, e.g. the editors for a book. Like creator, by default this field has family name, given name and ID (i.e. email address) sub-fields.

Editorial rights restriction

Same as user review scope.

Education flavour

This is a flavour of EPrints repository software version 3.4+. It is intended to produce repositories for open education. I.e. for hosting files such as presentation slides, videos of lectures/seminars, instructions for learning tasks, in-class quizzes, etc.

Embargo

Documents added to an eprint can have their access restricted. Sometimes, this access need only be restricted for a set amount of time. These documents can be embargoed until a set embargo date. These can then be lifted of this date using the bin/lift_embargos script. More recently embargoes allow a reason to be chosen from a predetermined set of options provided by the embargo_reason named set.

EPM

EPM (EPrints Package Manager) is a package of functionality, sometimes referred to as a Bazaar plugin, which can be installed in an automated fashion, typically from the Bazaar. It can sometimes refer to the pages for managing the installing of these packages, (e.g. the EPrints Bazaar page available through the Admin web interface).

EPrint

An eprint is a first class data object in EPrints repository software. These are often referred to as items or publication records as they are generally intended hold metadata about a publication of some kind.

EPrint history

This is the history of an eprint shown in the history tab of the eprint's view page. It is generated from the eprint revision files generated each time an eprint is modified. These are compared to display differences between revisions to track the eprint's history for administration purposes.

EPrint locking

Whilst a user is editing an eprint it is best that another user does not try to edit it as well. When an eprint is locked, even permitted users will not be able to edit that record, unless they remove the edit lock. Locks are automatically released when a user saves or cancels editing an eprint or (by default) 1 hour after the user started editing the eprint.

EPrint revision

An EPrint revision is a numbered version of the eprint's metadata stored in the rev_number metadata field. Revisions of eprints metadata are exported as XML revision files and stored under the documents directory of the archive.

EPrint status

The status of an eprint represent the stage it is in the depositing lifecycle. By default in EPrints repository software this can be one of four states:

  1. User workarea - The eprint is still being edited by the depositing user as there is still metadata they need to add or change.
  2. Review buffer - The eprint has been submitted for review by an editor for the archive.
  3. Live archive - The eprint has been reviewed by the editor and they have deposited it into the live archive for public access.
  4. Retired - The eprint has been removed from the live archive as it is not longer appropriate for it to have public access.

EPrint type

This is the type of publication the eprint represents. The available types for a repository are specified in a named sets file. ( lib/namedsets/eprint or flavours/pub_lib/namedsets/eprint) or can be copied to the archive's archives/ARCHIVEID/cfg/namedsets/ directory to add extra types for a specific archive. By default EPrints' publication flavour has the following type options:

article
book_section
monograph
conference_item
book
thesis
patent
artefact
exhibition
composition
performance
image
video
audio
dataset
experiment
teaching_resource
other

Type is by default presented on the first stage of the EPrint workflow, as the option chosen will affect the fields that can/need to be completed in later stages.

EPrint URI

The URI for an eprint is its globally unique identifier. In a form like:

http://example.eprints.org/id/eprint/1234

Technically, a URI only need be an identifier and therefore not return the resource itself. However, in the case of EPrints repository software, this will return the resource. Prior to EPrints 3.4, this would be by redirecting to the eprint URL but since then the publication flavour uses the same string for the eprints URL and URI with the old style URL redirecting the the longer URI format URL.

EPrint URL

The URL of an eprint is the location it can be found at on the World Wide Web. Historically with [[#EPrints repository software|EPrints repository software] it has taken a form like:

http://example.eprints.org/1234/

However, since EPrints 3.4, it has been possible to configure the URL to take a long format like that of the eprint URI:

http://example.eprints.org/id/eprint/1234

By using the following setting (typically in the archive's cfg/cfg.d/20_baseurls.pl):

$c->{use_long_url_format] = 1;

By making this change the old style URL will still work but will redirect to the new format. The reason for the change was to make it more apparent that the URL represented a publication within EPrints repository. Web crawlers and bots often record links (i.e. URLs) they find on web pages but they don't actually access them. In particular Google Scholar would disregard these URLs without checking them, as there was nothing from the URL to indicate they represented a publication. By adding /id/eprint/ into the URL it makes it the URL distinct from other common URLs of the form http://HOSTNAME/ID, which are mostly not URLs for EPrints abstract pages.

EPrint view page

The view page for an eprint is a multi-tabbed management page for that particular item. It can typically be reached by clicking on the View Item link towards the bottom of an eprint's abstract page. This page will tend to have the following tabs:

  1. Preview - What will the abstract page look like when the eprint is moved to the live archive.
  2. Details - The values for all the metadata fields for the eprint.
  3. Actions - All the actions that can be carried out against the eprint.
  4. History - The revision history of the eprint.
  5. Issues - The issues detected for this eprint, if any.

EPrint workflow

This is the workflow that pertains specifically to the eprint data object.

EPrints repository software

Often shortened to just EPrints but not to be confused with multiple eprint data objects, (i.e. eprints). The actual software that can be used to create Open Access repositories and available as Open Source from GitHub.

EPrints path

The path on your operating system under which EPrints repository software is installed. Guides on this wiki often use the placeholder EPRINTS_PATH to represent this. Typically EPrints path will either be /opt/eprints3/ or /usr/share/eprints/ depending on how installed EPrints repository software was installed.

EPrints release

A release of the EPrints repository software.

EPrints version

Same as EPrints release but with specific reference to a version number.

EPrints XML

EPrints repository software has its own XML schema for defining how data objects should be described. This schema can be found at the equivalent URL to:

http://example.eprints.org/cgi/schema

Event plugin

A type of plugin that generates Indexer tasks to get the indexer to perform some asynchronous and potential long-running task.

Event queue

The event queue contains all generated and yet to be successfully completed indexer tasks. It can be accessed through the admin menu's System tools tab using the Status button and then clicking on the number next to the Background Task Queue. There is also the Event Queue Object, which is a data object to represent an individual indexer task.

Export plugin

This is a plugin designed to provide a formatted export of a data object's metadata, typically for an eprint. By default, EPrints repository software has export plugins for many different formats, which can be found in the perl_lib/EPrints/Plugin/Export/ and flavours/pub_lib/plugins/EPrints/Plugin/Export/ directories. Bespoke export plugins can also be added to the archive's cfg/plugins/EPrints/Plugin/Export/ directory or may be installed from the Bazaar into the lib/plugins/EPrints/Plugin/Export/ directory.

F

File

A file is a third class data object. This is because it is a sub-object of a document data object, which in turn is a sub-object of an eprint data object. A file data object store metadata relevant to the file such as its filename, size, MIME type and modified time and a hash (by default MD5) to maintain the integrity of the file. Typically uploading a document to an eprint will only create a single data object but it is possible to add additional files to a document after it is uploaded.

File data objects are also used to capture XML revision files created for recording eprint revisions.

First class data object

Data objects in EPrints repository software take a particular order. First class data objects are those that are not a sub-objects of other data objects, such as:

Other data object may be described as either second class or third class. Usually a data object is only described as "first class" if it contains sub-objects. Technically, subject is a first class data object but as it contains no sub-objects, it in unnecessary to make the distinction.

Flavour

From version 3.4 of EPrints repository software different flavours have been introduced. EPrints was originally designed as a repository for research publications but over the years it has been repurposed for different tasks, which has led to the codebase getting a little messy. So in EPrints 3.4 a separate sub-directory structure was created to store code and configuration that makes EPrints suitable for a particular task, (i.e. a flavour). There are three main flavours but most development has continued to focus on research publication:

  1. Publication flavour - For research publications. Provided in flavours/pub_lib/.
  2. Data flavour - For research data. Provided in flavours/data_lib/.
  3. Education flavour - For Open Education, i.e. teaching materials. Provided in flavours/edu_lib/.

As well as flavours in EPrints 3.4, there are also ingredients, which provide additional functionality, similar to a Bazaar plugin / EPM. Any EPrints flavour has an inc file found in its top-level directory (e.g. flavours/pub_lib/inc). This tells the flavour which paths to include. By default, this would include the flavour itself. It may also contain various ingredients and potentially site_lib if the repository has multiple archives, which require the same bespoke functionality. However, it is better if this bespoke functionality can be converted into ingredients to separate out discrete parts of functionality.

Front end pages

Same as public facing pages.

G

H

Help text

A help text is associated with a metadata field, providing advice on how this should be filled in as part of the workflow. This text is defined using a phrase. E.g. the phrase ID for the title metadata field of the eprint data object is eprint_fieldhelp_title.

History

A History Object is a second class data object. It is a sub-object of any first class data object that supports revisions. By default, this is only the eprint data object. A history data object contains a revision number, a timestamp, a reference to a user or an "actor" if the new revision is generated by a script and the "action" that is being carried out, (e.g. create, modify, move_buffer_to_archive, destroy, etc.)

History tab

This tab refers to eprint history or user history both make reference to changes between revision files. This tab can be found in the view page of the particular eprint or user. (In the case of users this tab is labelled "User History".

Homepage

The homepage for an archive is found at the top-level URL for a repository. Usually, that is something like:

https://example.eprints.org/

The homepage is a static page although often contains dynamic elements, like the list of latest publications. By default, the file is located at either lib/lang/en/static/index.xpage or flavours/pub_lib/lang/en/static/index.xpage with the latter taking precedence if it exists. Configuration for the archive ensure this page display bespoke information for your archive such as the archive name. However, you will most likely want to make changes to your homepage, so you should copy the appropriate index.xpage to your archive's cfg/lang/en/static/ directory, creating the directory if it does not already exist.

I

Import plugin

This is a plugin designed to provide a formatted import of metadata for a data object, typically for an eprint. By default, EPrints repository software has import plugins for numerous different formats, which can be found in the perl_lib/EPrints/Plugin/Import/ and flavours/pub_lib/plugins/EPrints/Plugin/Import/ directories. Bespoke import plugins can also be added to the archive's cfg/plugins/EPrints/Plugin/Import/ directory or may be installed from the Bazaar into the lib/plugins/EPrints/Plugin/Import/ directory.

Inbox

Same as User workarea.

Inc file

The inc file is the critical part of a flavour, which defines the paths that should be included for code and configuration. It is found in the top-level directory of the flavour (e.g. /flavours/pub_lib/inc). Initially from the first sub-version of EPrints 3.4 this will be just the path for the flavour (e.g. flavours/pub_lib) but increasingly core ingredients will be added to this (e.g. for providing integration with the Bazaar or for enabling different JavaScript libraries. This is the only file outside the archive's directory structure that should need to be modified, unless you have a site_lib. However, the default inc file will probably not need changing in the case of most repositories.

The ordering of paths within the inc file is important, as it determines, which configuration files get loaded an which takes precedence. If you have a configuration file with the same filename in the cfg.d/ directories for all three paths in the example inc file below, the file in site_lib will be used in precedence to the other two:

flavours/pub_lib
ingredients/bazaar
site_lib

However, if you also have the same filename is the archive's cfg/cfg.d/ directory this will take precedence, although only for that archive, if you have a multi-archive repository.

Indexer

The indexer is used by EPrints repository software to carry out asynchronously tasks that may take a long time. Typically, this would be indexing tasks, which include indexing of both the metadata for an eprint data object and the full text (i.e. all the words in the PDF, Word, etc. document) for documents uploaded for those eprints.

The Indexer can be managed through the admin menu specifically, its System Tools tab, which provides options to stop, start and force start the indexer. Also, the Status option takes you to a page that lists the number of tasks yet to be successfully completed by the indexer. Commonly referred to as the event queue or background task queue.

Indexer task

An indexer task is a scheduled operation to be carried out by the indexer. It is also a data object in its own right but in this context is referred to as an event queue object. As until it is successfully completed it will appear in the event queue. An indexer task has several significant attributes:

  • Status - Whether the task is Waiting, In Progress or Failed.
  • Start time - The earliest time the indexer task can start.
  • Plugin - The event plugin that can perform this indexer task.
  • Action - The specific action of the Event plugin for this indexer task.
  • Parameters - The parameters needed to run the indexer task (e.g. the data object and potentially metadata fields to be used by the indexer task.

Ingredient

Since EPrints version 3.4, the concept of ingredients has been introduced along with flavours. The main purpose of ingredients is to allow complex functionality to be developed but deployed in a way that does not make a mess of the existing core codebase by spewing files in different places, like some Bazaar plugins can do. Also, as Bazaar plugins are designed to be one-click install, they should require limited configuration, as this takes away from ease of deployment that one-click install implies. Instead, ingredients can be much more complex. Although effort should be taken to avoid mixing distinctly separate pieces of functionality when they could be kept apart.

Typically, an ingredient would be deployed by checking out a tagged version of the Git repository for that ingredient on GitHub. Although that requires more technical knowledge than a one-click install, this is also inherently a safety feature, to avoid accidental installation of new functionality that could damage the data stored by the repository, which could happen when installing some of the more complex Bazaar plugins, without understanding of how they behave.

Issue

An issue is a single row in a compound multiple metadata field that is part of an eprint data object. It describes a potential problem discovered about a particular eprint. Unlike validation errors for specific metadata fields displayed whilst navigating the eprint workflow issues are determined by comparing eprints across the dataset or checking for more subjective problems. By default this includes:

  • Duplicate titles - Two or more eprints have the exact same title.
  • Similar titles - Two or more eprints share very similar titles.
  • Old but unpublished - Date set for eprint is more than two years in the past but its publication status is still not published.
  • Short family name - Family name for a creator is less than two characters or not set. (Single name creators should always put this under family name).

Issues are checked for across all eprints in the live archive and review buffer through a scheduled task running the bin/issues_audit script. This can be deployed with a cron job similar to the following in the eprints user's crontab (substituting EPRINTS_PATH and ARCHIVEID as appropriate):

38 23 * * * EPRINTS_PATH/bin/issues_audit ARCHIVEID --quiet

Additional types of issue can be added in one of two ways:

  1. Creating a new Issues plugin.
  2. Adding XML markup to the archive's cfg/issues.xml.

Issues for a particular eprint can be found under its Issues tab of its view page or searched for through the admin menu's Search issues tool (under the Editorial Tools tab).

Issues tab

This is a tab that appear on the eprint view page and lists the issues associated with that eprint.

Item

Same as eprint.

J

K

Key tools

Same as user menu.

L

Latest tool

The latest tool is a piece of functionally that allows JavaScript to be used to include a dynamically updating list of the latest eprints to be put into the live archive on to a static page like the homepage. This is provide through a script that can be requested at the equivalent following URL:

http://example.eprints.org/cgi/latest_tool

It can be configured in the archive's cfg/cfg.d/latest_tool.pl to set the number of eprints shown and the citation style used.

License

Sometimes spelt licence, is the licence under which a document is made available in an archive. By default, EPrints repository software provides the Creative Commons (v4) licenses as options in the named set file licenses in lib/namedsets/. These include:

Live archive

Live archive refers to both the status or an eprint as well as a virtual dataset of all eprints that have their status set to live archive. When an editor reviews and is happy with an eprint, they will change its status to live archive, so it becomes publicly accessible.

M

Main menu

The EPrints main menu by default contains links to the main pages of interest for a visitor to an archive, this includes:

  • Home - A link back to the archive's homepage.
  • About - A static page providing information about the archive.
  • Browse - A listing of browse views available for the archive.
    • Browse by Year
    • Browse by Subject
    • Browse by Division
    • Browse by Author

Management pages

These are the pages in an archive that can only be accessed by users after they have logged in. They include those pages accessible to regular users to add/edit their own eprints or manage their user profile. But also more restricted administration pages such as those accessible through the admin menu.

These pages are sometimes alternatively referred to as back end pages.

Metadata

Metadata is all the values for the metadata fields for a particular data object.

Metadata field

A metadata field, sometimes shortened to metafield (or even just field) is part of a data object and store a particular piece of pieces of information for that data object. Metadata fields have different types and have different types of properties roughly broken down into the following categories:

  • Core - This includes the name, aforementioned type, whether it should have multiple values and whether to add an SQL index for it.
  • Rendering - This includes how the render the metadata field within abstract pages, browse views and search results, where applicable.
  • Input and validation - This is includes how to display the input for the metadata field within a workflow and any additional functionality to how this field is used of subsequently validated.
  • Ordering, indexing and searching - Whether the metadata field should be included in EPrints repository software's full-text indexing and how the data object can be ordered or search over using this field.
  • Other - Miscellaneous properties, such as whether the metadata field's value(s) should be copied across to a cloned version of the data object or be able to be set through an import. Also, whether the field its value should appear in various management pages within the repository or in exports.

Minimal User

Sometimes also referred to as "min user". This is a type of user account that only has minimal access to an archive. Typically, these accounts are intended for those users that may need to access restricted documents but do not need to submit their own eprints. Also, if you have open user registration enabled, you may want to prevent newly registered users from submitting eprints until you can confirm you are happy with their registration.

Multiple metadata field

This is a metadata field that allows for more that one value to be stored for it against a particular data object. This can be configured by setting the field definition's multiple attribute to 1. A good example of a multiple metadata field is the Roles field for the user data object. This allow multiple roles to be assigned to a user, to enable bespoke permissions for that user.

N

Named set

A named set refers to both a type of metadata field with the type name namedset and the file that defines the options for this type of metadata field. A named set metadata field is much like a set field. However, rather than defining the options within EPrints repository software's Perl-based configuration, they can be added to a simple text file where each new option is separated by a new line. Such as for eprint types.

Named set files by default can be found in either the lib/namedsets/ or flavours/pub_lib/namedsets/ directories. If you need to edit them to add or remove options, then the file should be copied to the archive's cfg/namedsets/ directory before editing. It is generally advised not to modify an option in this file once your archive is in use. It is better to change the phrase that relates to that option. E.g. the type field for an eprint would have the phrase eprint_typename_article for the value article in flavours/pub_lib/namedsets/eprint.

O

OAI-PMH

Open Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH) is a standard for harvesting metadata from Open Acesss repository archives like those provided by EPrints repository software. For this EPrints has a separate interface for accessing OAI-PMH, which is available at the equivalent following URL:

http://example.eprints.org/cgi/oai2

Order key

An order key or orderkey is the (potentially language-specific) string used to order by a particular metadata field. EPrints repository software's default orderkey generation can be overridden by defining your own orderkey method and assigning it to the make_value_orderkey or make_single_value_orderkey property of the metadata field. This property is passed value(s) and returns (an) ordervalue string(s) after processing the value(s) for that metadata field. Typically, you may want to define your own order keys to manage special characters, maybe to avoid accented characters appearing separately in a browse view. Some examples of bespoke order keys can be found in make_orderkey.pl.

P

Plugin

These are Perl modules that "Plug in" to the EPrints software to provide a particular piece of functionality. Default plugins for EPrints repository software can be found under the perl_lib/EPrints/Plugin/ and flavours/pub_lib/plugins/EPrints/Plugin/ directories They break down into several categories. The most prominent being:

Other plugin categories include Box, Convert, Event, InputForm, Issues, Search and Storage.

Bespoke plugins can be written for an archive and located in the archive's cfg/plugins/EPrints/Plugin/ directory.

Often the term "Bazaar plugin" is used to refer to a package of functionality (a.k.a. EPM) that can be installed from the Bazaar. Although an EPM can often contain individual plugins, it is a complex set of files rather than a single plugin file. If an EPM does contain individual plugins, they will be installed to the lib/plugins/EPrints/Plugin/ directory.

Phrase

A phrase in EPrints repository software is some text or HTML markup associated with an associated placeholder ID defined using EPrints' bespoke XML markup. Phrases have various purposes. One is to ensure consistent use of terminology throughout the archive. It also essential for supporting multi-language archives. They are used for defining all metadata field names and their help texts. They are also used extensively in screen plugins to allow them to be customised with informative messages.

A prominent example of a phrase is an archive's name which is appears in the archive's cfg/lang/en/phrases/archive_name.xml, as follows:

<epp:phrase id="archive_name">Example Archive</epp:phrase>

Policies page

The policies page is a static page available at the equivalent URL to:

http://example.eprints.org/policies.html

The page contains policies for use of the archive typically based on OpenDOAR. This contains sections for:

  • Metadata Policy stating the access rights and permissions for information describing items in the repository, and the minimum metadata requirements.
  • Data Policy stating the access rights and (re-)use permissions for full-text and other full data items.
  • Content Policy stating the types and versions of documents and datasets held.
  • Submission Policy concerning eligible depositors, quality control and copyright statements.
  • Preservation Policy concerning the long-term retention, migration, and withdrawal protocols.

Preview image

This is a thumbnails or larger image that is a snapshot of the uploaded document. This is typically displayed on the abstract page, where hovering over it will display a larger version, which may be useful in determining if the document in the one the visitor expected. Preview images for a document can be generate from the command line using the epadmin command's redo_thumbnails option.

Preview tab

The preview tab is part of the eprint view page which displays a preview of the abstract page for the eprint.

Public facing pages

This is used to refer to web pages in an archive that are accessible to visitors without need to login. This are also sometimes referred to as front end pages.

Publication flavour

This is a flavour of EPrints repository software version 3.4+. It is intended to produce repositories for research publications. This code for this flavour was separated from the core codebase for EPrints 3.4, so that EPrints as a whole could be more easily customisable for other purpose such as research data and open education. The separated code and configuration can be found under flavours/pub_lib/, where configuration files override those under lib/ but are overriden by those at an archive level.

Publication record

Same as eprint.

Q

R

Regular user

This is a user whose account provides them with standard access to an archive allowing then to deposit eprints for review and manage their user profile. With EPrints repository software this type of user account just referred to as "user". When describing this user, the adjective "regular" is used to distinguish this specific type of user from users more generally.

Repository

This is an individual installation of the EPrints repository software onto a server. A repository may host many archives but they all use the same core codebase. Sometimes a repository will host only a single archive and therefore, the terms repository and archive are often used interchangeably and often the latter is referred to as a "repository archive".

Repository admin

Short for repository administrator, this is a level of user account that give maximum access to an archive. Allowing the user to carry out regular user actions such as depositing an eprint, editorial actions like reviewing eprints submitted by a regular user and administrative actions, in particular those available through the admin menu.

Request

This is a data object that relates to a request made by a user or visitor about a specific eprint or document. The most common request is request copy by a visitor for access to a document that is restricted. In some archives it may also be enabled to allow users to submit change requests to eprints in the live archive, if they notice an error in their metadata.

Request copy

Sometimes also referred to as "request a copy" this is a button or link on an abstract page, allowing a visitor to request a copy of a document that is currently restricted. If the eprint has its contact email metadata field this will email that address with the request sent by the visitor. If that email addresses matches that of a user the email sent will contain accept and reject links. Where the former will provide a link in the email response to the visitor, giving them temporary access to the restricted document.

REST API

EPrints repository software provides a standard REST API that allows third-party applications to create, read, update and destroy various data objects within an archive. In EPrints, the REST API provides similar functionality to the CRUD API, albeit under a URL path like:

https://example.eprints.org/rest/

Retired

This is an eprint status for eprints that have been in the live archive but have been removed for some reason. The advantage of retiring an eprint rather than deleting it, is that it can be restored, if it was removed erroneously from the live archive. Also, the abstract page will display a message saying the eprint has been removed rather than a "404 Not Found".

Review buffer

This is an eprint status of eprints that have be deposited by a user but need to be reviewed by an editor before being added to the live archive. Review buffer also refers to all eprints that currently have this status and can be accessed view the Review link in the user menu.

Revision

Some first class data objects support revision files and history data objects to track their changes over time. By default there are only eprint revisions.

Revision file

This is a file on disk that stores EPrints XML markup for the metadata for a data object. By default, revision files are only generated for eprints and can be found within the revisions directory for that eprint's sub-directory under the documents directory. E.g.

EPRINTS_PATH/archives/ARCHIVEID/documents/disk0/00/00/00/01/revisions/1.xml

The revisions are used in an eprint's history tab to generate a listing of changes between one revision and the next.

S

Saved search

This is a second class data object that records the criteria of a particular search made by a user. Its purpose is to allow the user to receive regular email updates about results that meet this criteria.

Screen plugin

This is a plugin designed to provide a "screen", which is effectively a dynamic page to allow the user to carry out some action on the archive. Some of the most significant screen plugins are for:

  1. Logging in and out of the archive.
  2. Managing a data object's workflow.
  3. Providing the admin menu.
  4. Reviewing eprints.
  5. Editing subjects.
  6. Searching for users or eprints.

By default, screen plugins can be found in the perl_lib/EPrints/Plugin/Screen/ directory. Bespoke screen plugins can also be added to the archive's cfg/plugins/EPrints/Plugin/Screen/ directory or may be installed from the Bazaar into the lib/plugins/EPrints/Plugin/Screen/ directory.

Second class data object

These are data objects that are sub-objects of a first class data object. These include:

Simple search

A simple search over eprints in the live archive. Unlike advanced search this provides a simgle input field to search against multiple metadata fields. Simple search configuration can be found in eprint_search_simple.pl.

Site lib

Creating a directory in the EPrints path with the name site_lib, will allow you modify core code and configuration, which will apply to all of the archives in your repository. If you are running EPrints version 3.4+, be sure to also add site_lib to your flavour's inc file.

Sometimes you may need modify the functionality of files in the core codebase but you don't want to change them, as this will involve having to merge any future changes when you upgrade EPrints repository software. This is particularly useful if you need to modify a non-plugin code file, (plugin files can be overriden by archive level versions), E.g. perl_lib/EPrints/Repository.pm, which can simply be copied to site_lib/EPrints/plugins/Repository.pm rather than modifying it in place. However, when you do upgrade EPrints, you will still need to reconcile any updates to the perl_lib/ version of this file with your version in site_lib/plugins/. This will be a much more straightforward task, (than if you had just modified the version in perl_lib/), as both files will still be in place to compare. One particular use of site_lib is for inter-version patches from GitHub. As when it comes to an upgrade, the versions of the files in site_lib/plugins/ can just be removed, as their modifications will be present in their perl_lib/ equivalents.

Staff search

Search across all eprints whatever their eprint status. See Admin/Editorial Tools/Search items for more information.

Static page

This is a page for an archive that contains static content that can be cached. Although it may also contain dynamic elements through the use of JavaScript and Ajax. Examples of static pages are:

To clear the caches of old static pages and other static content (like JavaScript and CSS) and generate new cached versions, the bin/generate_static command can be used.

Other pages like browse view menus and listings and abstract pages are also cached as they contain static content, which only need be updated periodically or when the underlying eprint changes. These are sometimes also referred to as static pages but can be refreshed using bin/epadmin command (or through the Admin menu) or regenerated using the bin/generate_views and bin/generate_abstracts comands.

Sub-object

This is a concept where a data object is part of another data object, such as a document being part of an eprint. Where the former is known as a second class data object and the latter is known as a first class data object.

EPrints::DataObj::SubObject is an abstract class within EPrints repository software, which second class (and third class) data object classes extend.

Subject

A subject is a data object that stores a node for the subject tree of an archive.

A subject has three separate metadata fields to describe the value it represents:

  • ID - A short text string providing the unique ID for that subject.
  • Name - The human-readable name for that subject.
  • Sort value - The value used to determine the ordering of the subject amongst its siblings in the subject tree (e.g. in a browse view).

A subject also contains a parent metadata field that links it into the rest of the subject tree and a depositable metadata field that indicate whether it is an option that can be selected when the subject tree or part thereof is used to provide a metadata field for another data object (e.g. eprint).

Subject tree

This is made up of subject data objects and is used to provide hierarchical sets of options for various metadata fields or other data objects, such as eprints, as a subject metadata field. In particular:

  • eprint subjects - Records one or more subjects (e.g. from the Library of Congress or Dewey decimal classifications) under which the publication represented by this eprint covers.
  • eprint divisions - Records one or more divisions within an organisation/institution (e.g. faculty, school, department, etc.) for which authors of the publication represented by the eprint are members.

The subject tree can be edited by the admin menu using the "Edit subject" page.

Succeeds

An eprint data object contains a metadata field called succeeds. This metadata field can store the ID of an eprint for which the current eprint supersedes. This may be used where a new version of the publication is produced but you don't want to or cannot replace the document in the original eprint. By using the succeeds field, the original eprint will be removed from search and its abstract page will display a warning saying it is not the latest version of the eprint.

Summary page

Same as abstract page. Sometimes even referred to as the abstract/summary page.

Summary table

A table on an abstract page that lists the names and values for specific Metadata fields for the eprint. These can be specified, typically in the archive's cfg/cfg.d/eprint_render.pl, under $c->{summary_page_metadata}.

Sword API

This is an API provided by EPrints repository software that complies with the SWORD specification for a RESTful API. It is similar to EPrints' CRUD and REST APIs but is intended to facilitate smoother interoperability between different types of Open Access repository, using an atompub based XML format and the Dublin Core schema for metadata terms.

T

Template

EPrints repository software uses templates to apply branding to a repository or individual archive. These templates provide the HTML structure for an archive's web pages and could be can be found in one of the following directories:

  1. lib/templates/
  2. flavours/pub_lib/lang/en/templates/
  3. archives/ARCHIVEID/cfg/templates/
  4. archives/ARCHIVEID/cfg/lang/en/templates/

However, there are additional directories templates maybe found if you use site_lib or a theme. The higher the number determines which template will be used if they have the same name.

The main template file is called default.xml, which will be used if not otherwise specified in the XML markup for the static page. Also, other public facing pages such as simple and advanced search, abstract pages and browse views will use the default template. Since EPrints version 3.4. management pages will use the default_internal.xml template.

Bespoke template files can be defined and used (e.g. mytemplate.xml). For applying to static pages this can be done by setting the <xpage:template>. For applying to browse views by setting the template attribute for the view in views.pl and for applying to abstract pages setting the $template in the parameters returned by $c->{eprint_render} in eprint_render.pl. See Branding woth confidence - Taking control for more information.

Theme

If you have multiple archives you want to apply the same branding you may want to create a theme. Themes can be found at lib/themes/ and can be used by an archive by modifying the following setting in branding.pl:

$c->{theme} = 'THEMEID';

A theme can contain templates, phrases, static pages, JavaScript, CSS, images and icons. Also, as the theme has its own directory structure, it is easy to manage under version control and checkout across multiple repositories or even make into an EPM.

Third class data object

These are data objects that are sub-objects of second class data objects to exist, which in turn are sub-objects of first class data objects. The main example of such a data object in EPrints repository software is a file data object.

File --sub-object of--> Document --sub-object of--> EPrint

U

User

A User is a first class Data object in EPrints repository software. It represent a user of Repository and provides an account under which that user can login. Storing metadata about that user, such as their username, given and family names, email address, department, etc.

User history

This is the history of changes the user has made to an eprint (or in fact any first order data object that supports revisions) and can be found in the user history tab on the user's view page. This is compiled by finding all history data objects for the particular user and then comparing these associated revision files with their previous revision files.

User menu

This is the menu available to users once they are logged in. (Although it does contain the "Login" and "Create account" links when a user is not logged in). This is sometimes also referred to as key tools, as that is the ID used in configuration. By default, this contains the following options:

Additional links (for screen plugins) can be added to the user menu by editing adding a line like the following to a configuration file in the archive's cfg/cfg.d/ directory:

$c->{plugins}->{"Screen::MyScreenPlugin"}->{appears}->{key_tools} = 1000;

The number (i.e. 1000) specifies the position the link to this screen plugin will appear relative to others. The higher the number the later in the menu ot will appear.

To add the user menu to a template, you need to include the following snippet of EPrints XML:

<epc:pin ref="login_status"/>

User registration

By default, EPrints repository software allows users to register by following the "Create account" link in the user menu. This allows them to create a user account, which most then be activated by clicking the link on an email sent to the email address for the is registered user.

Often you may not want unrestricted user registration. If this is the case, you should copy registration.pl that can be found in lib/cfg.d/ to the archive level (i.e. the archive's cfg/cfg.d/ directory) and change the following setting to 0:

$c->{allow_web_signup} = 0;

User review scope

editor users are allowed to review deposited eprints in the review buffer to check whether the can be moved to the live archive and make any modifications required. Sometimes you may only want a particular editor to be able to review certain eprints, which are in their area of expertise. This can be done by setting their editorial rights restriction in their user profile. By default, restrictions can be set on the following three eprint fields:

However, this can be modified by copying user_review_scope.pl to the archive level (i.e. the archive's cfg/cfg.d/ directory) and edit the setting $c->{editor_limit_fields}.

User search

A form for searching across the users in a archive. See Search users for more information.

User profile

A page displaying the metadata about a user. This often links to a page where these metadata can be modified.

User workarea

This is the status of an eprint whilst it is being edited by the depositing user. This status is often also referred to as inbox. User workarea can sometimes refer to the Manage deposits page where the current user can see a listing of all the EPrints they have created.

V

View page

The view page displays information about a data object, typically an eprint. This is why "view page" is often used as shorthand for eprint view page.

Virtual dataset

A virtual dataset is one that is derived from a real dataset. In particular, EPrints repository software has virtual datasets for:

All of these are sub-sets of the eprint dataset where the eprint status is a particular value.

Additional virtual datasets can be configured for a repository or at an archive level.

Virtual metadata field

This is a metadata field whose values are not actually stored in the database but are generated dynamically by a function probably using the values of one or more other metadata fields. This may be useful when editing the citation style as part of designing your archive's abstract page, as being able to reference a field is very straigtforward. A basic example might be that you want to display on the abstract page the number of documents for the eprint. Creating a virtual field for the eprint called num_docs might look as follows:

{
  name => 'num_docs',
  type => 'int',
  virtual => 1,
  render_value => 'get_num_docs',
},

Then you would define a function in a configuration file and assign it to $c->{get_num_docs}. You could then simply include this in the summary_page.xml eprint citation file as:

<epc:print expr="num_docs"/>

Visitor

A visitor to an archive is distinguished from a user, as they are not logged in and they are accessing the archive anonymousl. They are probably just wanting to view an abstract page or download an an open access publication document.

Volatile metadata field

This is a metadata field that is liable to change on a regular basis but does not really affect the overall content of the data object's metadata, so should not prompt a new revision to be generated. A metadata field can be made volatile by setting the following atttribute in the field definition:

volatile => 1,

Metadata fields that are only intended for administrative purposes are often defined as volatile, such as the eprint edit_lock metadata field. If a user takes an edit lock on an eprint, if the edit_lock metadata field field was non-volatile, this would lead to a new revision being create for the eprint when ultimately nothing had yet changed. In fact, worse still, if the edit lock was released without making any changes, a further revision would be created, which would be basically the same as the revision before the user took the edit lock.

W

Workflow

First class data objects in EPrints repository software have a workflow to allow the metadata they store to be edited via a sequence of forms. The most common of these is the eprint workflow for the eprint data object. By default the configuration for this workflow is written in XML and can be found in default.xml in either lib/workflows/eprint/ or flavours/pub_lib/workflows/eprint/. If this needs to be modified it should be copied to the archive's cfg/workflows/eprint/ directory before editing. Explanation of the various elements and attributes are described in the workflow format guide.

Workflow component

These provide sections (i.e. boxes) within each workflow stage. A workflow component contain one or more metadata fields or some other content to assist the metadata submission process. There are various different types of workflow component that serve different purposes:

  • Documents - For managing document sub-objects.
  • Error - Used when there is an issue with the workflow component, (e.g. the metadata field included does not exist).
  • Field - This has various sub-types for including one, several or subjects type metadata fields.
  • Upload - For uploading a file, (e.g. a document to an eprint).
  • XHTML - Just some HTML markup.

Workflow stage

As a workflow is a sequence (or flow) of forms a workflow stage defines one of these forms. Stages can be defined within the workflow format to produce the order of forms required. Each workflow stage contains one or more workflow components, that in turn include metadata fields for the relevant data object. Below, is the EPrints XML markup for the default ordering of workflow stages for an eprint data object:

<flow>
  <stage ref="type"/>
  <stage ref="files"/>
  <stage ref="core"/>
  <stage ref="subjects"/>
</flow>

X

Xapian search

Originally EPrints repository software created its own database index to support search. However, this was not particularly efficient or quick to return results, so it started to use a Xapian search index to speed up simple search. (Advanced search still uses the database index). Also, it made it possible to provide relevance based ordering, so that more relevant results could be displayed nearer the top of the search results. Xapian also gives the potential to provide faceted based search, where search results can be refined by limiting the values for a particular facet (i.e. for EPrints, a metadata field).

Y

Z