Difference between revisions of "Getting Started with EPrints 3"

From EPrints Documentation
Jump to: navigation, search
(Backups)
m (formatting corrected)
(26 intermediate revisions by 12 users not shown)
Line 1: Line 1:
 +
[[Category:Manual]]
 +
[[Category:Management]]
 +
[[Category:Installation]]
 
==Creating an Archive==
 
==Creating an Archive==
  
EPrints 3 can run run multiple archives under one install. Multiple archives will require giving additional DNS aliases to the machine running EPrints, EPrints can then create all the parts of the apache configuration file needed to run the virtual hosts.
+
EPrints 3 can run multiple archives under one install. Multiple archives will require giving additional DNS aliases to the machine running EPrints, EPrints can then create all the parts of the apache configuration file needed to run the virtual hosts.
  
===Creating the Archive===
+
Alternatively you can use different ports to distinguish your different repositories hosted by the same server.
 +
 
 +
===Running epadmin===
 
Make sure MySQL is actually running.
 
Make sure MySQL is actually running.
  
Change to your eprints user (probably "eprints").
+
Change to your eprints user (probably ''eprints'').
  
Change directory to the eprints directory (<tt>/opt/eprints3</tt> by default) and run
+
Change directory to the eprints directory (<tt>/opt/eprints3</tt> by default for a source install and <tt>/usr/share/eprints</tt> for packaged installs) and run
  
<code>
+
bin/epadmin create
bin/epadmin create
 
</code>
 
  
You will get the following prompts (note that when you see something in [square brackets], it's the default value and can be selected by simply hitting enter)
+
If you are running EPrints 3.4 or later you will need a further parameter to define which flavour of repository you want.  The common two choices are either '''pub''' or '''zero''':
  
* Archive ID - the system name for your archive.  It's probably a good idea to think of something short and memorable. Once entered, an <tt>archive/<archive_id></tt> directory will be created, and the standard configuration files will be copied in.
+
bin/epadmin create pub
 +
 
 +
You will get the following prompts (note that when you see something in [square brackets], it's the default value and can be selected by simply hitting <enter>):
 +
 
 +
* Archive ID - the system name for your archive.  It's probably a good idea to think of something short and memorable. Once entered, an <tt>archive/<archive_id></tt> directory will be created, and the standard configuration files will be copied in.
 
* Configure vital settings - Hit enter to say 'yes'. This will lead to more prompting about core settings:
 
* Configure vital settings - Hit enter to say 'yes'. This will lead to more prompting about core settings:
 
** Hostname - What someone will type into a web browser to get to your archive.  Make sure that your systems team have a DNS alias pointing to your server for this.
 
** Hostname - What someone will type into a web browser to get to your archive.  Make sure that your systems team have a DNS alias pointing to your server for this.
** Webserver Port - Which port to you want to serve the archive on?  The default is 80, so unless you can think of a good reason not to, just hit enter to accept the default.
+
** Webserver Port - Which port do you want to serve the archive on?  The default is 80, so unless you can think of a good reason not to, just hit enter to accept the default.
 
** Alias - You can enter any number of aliases that will take users to this archive.  Enter a '#' when you don't want to enter any more.  You could have your archive served on <tt>eprints.myorganisation.org</tt> and <tt>eprints.myorg.org</tt>.  As with the Hostname, your systems team need to be informed about these aliases too.
 
** Alias - You can enter any number of aliases that will take users to this archive.  Enter a '#' when you don't want to enter any more.  You could have your archive served on <tt>eprints.myorganisation.org</tt> and <tt>eprints.myorg.org</tt>.  As with the Hostname, your systems team need to be informed about these aliases too.
 +
*** Redirect ''your chosen alias'' to ''Hostname'' - Yes, usually that is your intention.
 +
** Please enter the path part of the repository's base URL - Make it simple for you and your visitors ... and confirm the suggestion; multiple archives could be managed by different domains or ports!
 +
** If you will use https for your user pages (including login) enter the https hostname - No doubt, for secure usage you need another name.
 
** Administrator Email - Enter the email address of the repository administrator.  This will allow your repository users to send email to the right person.
 
** Administrator Email - Enter the email address of the repository administrator.  This will allow your repository users to send email to the right person.
 
** Archive Name - The full name of your archive.  By default, this will be used on many of the pages, and in the title bar of the browser.
 
** Archive Name - The full name of your archive.  By default, this will be used on many of the pages, and in the title bar of the browser.
** Write these core settings - If you don't say 'yes', then you entered all that data for nothing.
+
** Write these core settings - If you don't say 'yes', then you entered all that data for nothing (relaunch the process by <tt>epadmin config_core ''archive_id''</tt>).
* Configure database - EPrints makes extensive use of a MySQL database.  Enter 'yes' to configure this.
+
* Configure database - EPrints makes extensive use of a MySQL database.  Enter 'yes' to launch <tt>epadmin config_db ''archive_id''</tt>.
** Database Name - The internal name of your database.  It makes sense to use the Archive ID for this, but you don't have to.  You don't need to create this database, epadmin will do it for you.
+
** Database Name - The internal name of your database.  It makes sense to use the Archive ID for this, but you don't have to.  You don't need to create this database, <tt>epadmin</tt> will do it for you.
** MySQL Host - The address of the server that the database is running on.  If the database is on the same machine as the EPrints installation, enter 'localhost'.
+
** MySQL Host - The address of the server that the database is running on.  If the database is on the same machine as the EPrints installation, confirm 'localhost'.
 
** MySQL Port - You probably don't need to enter a value.  If you have problems connecting to the database, talk to your systems team.
 
** MySQL Port - You probably don't need to enter a value.  If you have problems connecting to the database, talk to your systems team.
 
** MySQL Socket - As with MySQL Port, it's unlikely that you need to enter anything.
 
** MySQL Socket - As with MySQL Port, it's unlikely that you need to enter anything.
** Database User - The username with which to log into the MySQL Database.  You don't need to create this user, epadmin will do it for you.  If you enter a MySQL username that already exists, it will be overwritten by epstats.
+
** Database User - The username with which to log into the MySQL Database.  You don't need to create this user, <tt>epadmin</tt> will do it for you.  If you enter a MySQL username that already exists, it will be overwritten by epstats!
** Database Password - The password for the Database User.
+
** Database Password - Just confirm the suggested password for the Database User, <tt>eprints</tt> will keep track of it!
 
** Write these database settings - You should write them, or you'll lose them.
 
** Write these database settings - You should write them, or you'll lose them.
** Create database <Database Name> - Say yes, and epadmin can create the database and populate it with all the right tables.  If you've already created a database and a user for this archive, say no.
+
* Create database <Database Name> - Say ''yes'' to launch <tt>epadmin create_db ''archive_id''</tt>, and <tt>epadmin</tt> can create the database and populate it with all the right tables.  If you've already created a database and a user for this archive, say 'no'.
** MySQL Root Password - To create the database and the user, epadmin needs the MySQL Root Password. This is not saved anywhere.  It is used to log into mysql, create the database and create the user with the right access rights. The password is then forgotten.
+
** Database Superuser Username - 'root' isn't a bad suggestion, but should match your specifications during the MySQL installation.
** Create database tables - say yes to have epadmin create all the database tables.
+
** Database Superuser Password - To create the database and the user, epadmin needs the MySQL Root Password (specified during MySQL installation). This is not saved anywhere.  It is only used to log into MySQL, create the database and create the user with useful access rights. The password is forgotten afterwards, because it's not needed anymore by this process!
* Create an initial user - It's a good idea to create a user account for yourself at this point.
+
* Create database tables - say 'yes' to have epadmin create all the database tables, i.e. initiate the same as with <tt>epadmin create_tables ''archive_id''</tt>.
** Enter a username - The username you will use to log into EPrints in your browser.
+
* Create an initial user - It's a good idea to create a user account for yourself at this point, but you can start the same process by <tt>epadmin create_user ''archive_id''</tt> anytime ...
** Select a user type (user|editor|admin) - There are three levels of user in EPrints.  You probably want to be an administrator, so enter 'admin'.
+
** Enter a username - The username you will be used to log into EPrints on the home page of the new archive.
** Enter Password - A password for this user.  Remember to choose a password that will be hard for someone else to guess.
+
** Select a user type (user|editor|admin) - There are three user roles in EPrints.  You probably want to be an administrator, so enter 'admin'.
** Email - Enter your email address so that administrators can get in contact with you.
+
** Enter Password - A password for this user.  Remember to choose a password that will be hard for someone else to guess!
** Do you want to build the static web pages - There are a number of pages in EPrints which change very rarely.  These are the static pages.  The Home page and the About page are examples of static pages.  Stylesheets are also static.  These pages need to be built, so say 'yes'.
+
** Email - Enter your email address so that users can get in contact with you.
** Do you want to import the LOC subjects - If you will be using the Library Of Congress subject hierarchy, say 'yes'.  Otherwise you will need to create your own subject hierarchy.
+
* Do you want to build the static web pages - There are a number of pages in EPrints which change very rarely.  These are the static pages.  The Home page and the About page are examples of static pages.  Stylesheets are also static.  These pages need to be built, so say 'yes' (otherwise you have to start that process manually later by <tt>generate_static ''archive_id''</tt>).
* Do you want to update the apache config files? (you still need to add the 'Include' line) - Your archive has a number of files which it uses to configure the web server.  These should be updated, so say 'yes'.
+
* Do you want to import the LOC subjects - If you will be using the Library Of Congress subject hierarchy, say 'yes'.  Otherwise you will need to create your own subject hierarchy and import it using [http://wiki.eprints.org/w/API:bin/import_subjects <tt>import_subjects</tt>].
* Before exiting, epadmin will display information about configuring the webserver.
+
* Do you want to update the apache config files? (you still need to add the 'Include' line in the very central http.conf, e.g.) - Your archive has a number of files which it uses to configure the web server.  These should be updated, so say ''yes'' or start <tt>generate_apacheconf ''archive_id''</tt> later.
 +
* Before exiting, <tt>epadmin</tt> will display information about configuring the webserver.
 +
 
 +
====example configuration====
 +
 
 +
The script will run through a number of configuration options, an example of which is listed below. Please change the settings to suit your site configuration.
 +
 
 +
<pre>
 +
-bash-4.1$ ./bin/epadmin create
 +
 
 +
Create an EPrint Repository
 +
 
 +
Please select an ID for the repository, which will be used to create a directory
 +
and identify the repository. Lower case letters and numbers, may not start with
 +
a number. examples: "lemurprints" or "test3"
 +
 
 +
Archive ID? testrepo 
 +
We need to create /usr/share/eprints/archives/testrepo, doing it now…
 +
 
 +
Creating initial files:
 +
Installing: /usr/share/eprints/archives/testrepo/cfg
 +
Installing: /usr/share/eprints/archives/testrepo/cfg/lang
 +
[...]
 +
Installing: /usr/share/eprints/archives/testrepo/cfg/workflows/eprint
 +
 
 +
Ok. I've created the initial config files and directory structure.
 +
I've also created a "disk0" directory under documents/ if you want
 +
your full texts to be stored on a different partition then remove
 +
the disk0, and create a symbolic link to the directory you wish to
 +
store the full texts in. Additional links may be placed here to be
 +
used when the first is full.
 +
 
 +
 
 +
Configure vital settings? [yes] ?
 +
Core configuration for testrepo
 +
 
 +
 
 +
Please enter the fully qualified hostname of the repository.
 +
 
 +
For a production system we recommend against using the real hostname of the
 +
machine.
 +
 
 +
Example: testrepo.footle.ac.uk
 +
 
 +
Hostname? testprint
  
Open a browser, and enter the hostname in the address bar. You should see your new archive, ready to be [[Branding with confidence|branded]].
+
Please enter the port of the webserver. This is probably 80, but you may wish
 +
to run apache on a different port if you are experimenting.
  
==Running a Live Archive==
+
Webserver Port [80] ?
===Creating a crontab===
 
When you create an archive it will start out as a development system while you learn how to set it up (and your manager keeps changing his mind) but at some point (hopefully) you will declare your archive open for business.
 
  
At this point you should schedule certain scripts to run periodically. The best way to do this is to use "cron" which is an integral part of most UNIX systems.
+
Please enter all the aliases which could reach the repository, and indicate if
 +
you would like EPrints to write a Redirect Rule to redirect requests to this
 +
alias to the correct URL.
 +
Some suggestions:
 +
centos610.local
 +
centos610
 +
centos610
  
To set up cron, run (as the eprints user):
+
Enter a single hash (#) when you're done.
  
<code>
+
Alias (enter # when done) [#] ? testprint.local
% crontab -e
+
Redirect testprint.local to testprint [yes] ?
</code>
 
Exactly what to add to the cron table is described in the following sections - "Browse Views" and "Subscriptions".
 
  
There should be one set of crontab entries per archive.
+
Alias (enter # when done) [#] ?
  
===Backups===
+
Please enter the path part of the repository's base URL. This should probably
You should also have made sure that the system is being properly backed up. This is gone into in more detail [[Backups|elsewhere in the documentation]].
+
be '/'.
  
===OAI===
+
Path [/] ?
We would also encourage you to configure the OAI support for your archive and register it.
 
  
====Configuring====
+
If you will use https for your user pages (including login) enter the https hostname
 +
here, or leave blank when using http only.
  
The setting for OAI are held in the <tt>oai.pl</tt> file, in the <tt>eprints3/archives/<archive id>/cfg/cfg.d/</tt> directory.  This is a perl file, but don't let that daunt you. Some of the settings are set to sensible defaults.  This guide cover the essentials.  In the guide, pico is used to edit the files.  Feel free to use your favourite text editor instead if you would rather.
+
HTTPS Hostname [] ?
  
At the command prompt, backup then open the file:
 
>cd /opt/eprints3/archive/<archive id>/cfg/cfg.d
 
>cp oai.pl oai.backup
 
>pico oai.pl
 
  
The following need to be changed:
+
Administrator Email? someone@example.com
  
<strong>The archive ID</strong>. This needs to be unique, so check that it doesn't already exist at http://www.openarchives.org/.
+
Enter the name of the repository in the default language. If you wish to enter
 +
other titles for other languages or enter non ascii characters then you may
 +
enter something as a placeholder and edit the XML config file which this
 +
script generates.
  
Find the following line in oai.pl:
+
Archive Name [Test Repository] ?
$oai->{v2}->{archive_id} = "generic.eprints.org";
 
And change generic.eprints.org to something which identifies your repository.
 
  
<strong>Content Description</strong>. What does your repository contain?  Write a description, then find the lines:
+
Write these core settings? [yes] ?
$oai->{content}->{"text"} = latin1( <<END );
+
Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/adminemail.pl
OAI Site description has not been configured.
+
Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/10_core.pl
END
+
Wrote /usr/share/eprints/archives/testrepo/cfg/lang/en/phrases/archive_name.xml
Do not modify the first or last line in any way. Simply put your new text in the place of the middle line. This text can be as many lines as you wish, but it <em>must not</em> contain the word "END" at the start of a line.
 
  
Next you need to define a number of policies which will define how your repository may be used.  It may be helpful for you to visit http://www.opendoar.org/tools/en/policies.php which has a step-by-step process to create these policies.  It will even output EPrints 3 configuration code. which you can then copy and paste into the oai.pl file.  These policies are:
+
Configure database? [yes] ?
  
* <strong>Metadata Policy</strong>
+
Configuring Database for: testrepo
* <strong>Data Policy</strong>
+
Database Name [testrepo] ?
* <strong>Submission Policy</strong>
+
MySQL Host [localhost] ?
  
These are updated in exactly the same way as the <strong>Content Description</strong> section. Just look for the following lines:
+
You probably don't need to set socket and port (unless you do!?).
 +
MySQL Port (# for no setting) [#] ?
 +
MySQL Socket (# for no setting) [#] ?
 +
Database User [testrepo] ?
 +
Database Password [nxxxxuAw] ?
 +
Database Engine [MyISAM] ?
  
*$oai->{metadata_policy}->{"text"} = latin1( <<END );
+
Write these database settings? [yes] ? 
*$oai->{data_policy}->{"text"} = latin1( <<END );
+
Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/database.pl
*$oai->{submission_policy}->{"text"} = latin1( <<END );
 
  
====Registering====
+
EPrints can create the database, and grant the correct permissions.
  
Once you register your archive (at http://www.openarchives.org) various search systems will be able to collect the metadata (titles, authors, abstract etc.) and allow more people to find records in your archive.
+
Create database "testrepo" [yes] ?
 +
Database Superuser Username [root] ?
 +
Database Superuser Password?
 +
Create database tables? [yes] ?
 +
Creating database tables...
 +
Set DB compatibility flag to '3.3.4'.
 +
Done creating database tables.
  
See http://www.openarchives.org/ for more information on the OAI protocol. For more information setting up the OAI interface archive see the section in this documentation about Configuring an Archive.
 
  
==Browse Views==
+
Create an initial user? [yes] ?
Once every so often you should run the "generate_views" script on each archive in your system to regenerate the browse views section of the site.
+
Creating a new user in testrepo
  
This is a set of static pages. By default one per subject, and one per year (only years with papers in that year not EVERY year ever!). Some users prefer to browse the system than search it. This also gives search engines a way to reach, and index, the abstract pages.
+
Enter a username [admin] ?
 +
Select a user type (user|editor|admin) [admin] ?   
 +
Enter Password?
 +
Email? first.last@example.org
  
See the ArchiveConfig.pm config notes on how to edit the views it generates.
 
  
See the How-To section for some suggestions on how to set up views.
+
Successfully created new user:
 +
      ID: 1
 +
Do you want to build the static web pages? [yes] ?
  
===But I don't want this feature...===
+
Starting EPrints Repository.
If you don't want to use this feature: don't, it's your archive. Remove the link from the template and front page. Don't run the generate_views script.
+
Connecting to DB ... done.
 +
mkdir /usr/share/eprints/archives/testrepo/html/en/codemirror/mode/velocity
 +
/usr/share/eprints/lib/static/codemirror/mode/velocity/velocity.js -> /usr/share/eprints/archives/testrepo/html/en/codemirror/mode/velocity/velocity.js
 +
mkdir /usr/share/eprints/archives/testrepo/html/en/style/images
 +
/usr/share/eprints/lib/static/style/images/action_edit.png -> /usr/share/eprints/archives/testrepo/html/en/style/images/action_edit.png
 +
[...]
 +
/usr/share/eprints/lib/static/style/images/action_unpack.png -> /usr/share/eprints/archives/testrepo/html/en/style/images/action_unpack.png
 +
/usr/share/eprints/lib/static/codemirror/lib/util/loadmode.js -> /usr/share/eprints/archives/testrepo/html/en/codemirror/lib/util/loadmode.js
 +
Ending EPrints Repository.
  
===Setting it up===
+
Do you want to import the LOC subjects? [yes] ?
This is best done by using the UNIX "cron" command (as user "eprints"). Cron will email "eprints" on that machine with the output, so best use the --quiet option so it only bothers you with errors.
 
  
How often you want to run this depends on the size of your archive, and how fast the contents changes. This feature is roughly order "n". Which means if you double the number of items in your archive then you double the time it takes to run (ish).
+
Starting EPrints Repository.
 +
Connecting to DB ... done.
 +
Importing from /usr/share/eprints/archives/testrepo/cfg/subjects...
 +
Done importing 280 subjects from /usr/share/eprints/archives/testrepo/cfg/subjects
 +
Reindexing subject dataset to set ancestor data
 +
Reindexing item: subject/A
 +
Reindexing item: subject/AC
 +
Reindexing item: subject/AI
 +
Reindexing item: subject/AM
 +
[...]
 +
Reindexing item: subject/sch_soc
 +
Reindexing item: subject/subjects
 +
Done reindexing
 +
Ending EPrints Repository.
  
Once an hour would seem a good starting point. If your archive gets real big, say more than 10000 records, then maybe once a day is more realistic - the one thing that you don't want to happen is for a new generate_views to start before the old one finishes as they will mess up each others output.
+
Exiting normally.
 +
Do you want to update the apache config files? (you still need to add the
 +
'Include' line) [yes] ?
 +
Wrote /usr/share/eprints/cfg/apache/testrepo.conf
  
Run generate_views on the command line to find out how long it takes.
+
You must restart apache for any changes to take effect!
  
and add the line
+
--------------------------------------------------------------------------
 +
That seemed to more or less work...
 +
--------------------------------------------------------------------------
  
 +
Now make any required changes to the cfg files.
  
<code>
+
Note that changing the metadata configuration may require the database
23 * * * * /opt/eprints2/bin/generate_views I<archiveid>
+
tables to be regenerated. epadmin erase_data will regenerate the
</code>
+
eprints and documents tables only. erase_data will regenerate everything.
This runs at 23 minutes past each hour. If you have more than one archive, don't make them all start rebuilding stuff at the same time, stagger it. Otherwise once an hour everything will slow down as it fights to run several intensive scripts at once.
+
(nb. these also do erase the contents of the tables, and any uploaded
 +
files).
  
See the crontab man page <tt>man 5 crontab</tt> for more information on using cron.
+
Make sure that your main apache config file contains the line:
  
==Subscriptions==
+
Include /usr/share/eprints/cfg/apache.conf
Subscriptions provide a way in which users of your system can receive regular updates, via email, when new items are added which match a search they specified.
 
  
To automate sending out these subscriptions you must add some entries in the crontab (as for views). You need one set of these per archive.
+
Then stop and start your webserver:
 +
Often:
 +
/etc/rc.d/init.d/httpd stop
 +
/etc/rc.d/init.d/httpd start
 +
(or maybe /usr/local/apache/bin/apachectl stop & start)
  
For example (with dookuprints being the name of the archive):
+
And then try connecting to your repository.
 +
--------------------------------------------------------------------------
  
 +
Don't forget to register your repository at http://roar.eprints.org/
  
<code>
+
-bash-4.1$
    # 00:15 every morning
 
    15 0 * * * /opt/eprints2/bin/send_subscriptions dookuprints daily
 
    # 00:30 every sunday morning
 
    30 0 * * 0 /opt/eprints2/bin/send_subscriptions dookuprints weekly
 
    # 00:45 every first of the month
 
    45 0 1 * * /opt/eprints2/bin/send_subscriptions dookuprints monthly
 
</code>
 
Note the spacing out so that all 3 don't start at once and hammer the database. You may wish to change the times, but we recommend early morning as the best time to send them (midnight-6am).
 
  
===But I don't want users to be able to do this!===
+
</pre>
Then remove the "subscription" power from each type of user in the archives ArchiveConfig.pm file.
 
  
==Default Configuration==
+
Finish apache's configuration before its final restart to be ready for opening a browser and watch the homepage of your new archive, ready to be [[Branding with confidence|branded]].
EPrints configures a new archive with a set of metadata fields aimed at an archive of research papers.
 
  
The initial "types" of eprint (book, poster, conference paper) are configured in metadata-types.xml
+
If you want to add some more users, use the command <tt>epadmin create_user <repository id></tt> or the admin's web dialog.
  
The initial subjects are a subset of the library of congress subjects. Feel free to totally replace them with your own subjects, but the more standard your subject tree the more useful your metadata will be to other people.
+
==Regular Maintenance==
  
The authors and editors have the "hasid" option set which allows people to optionally use a unique id for a person in addition to their name (names are NOT unique!) - this can be useful for generating "CV" pages (see the views how-to) and possibly for generating statistics. Without it you will never be sure which "John Smith" wrote that paper. If you don't like this feature remove the "hasid" from the authors and editors - this will require you to recreate the tables, erasing the archive, so decide before you start. If you want to be more clear about what information goes in that field, edit the phrases <tt>eprint_fieldname_authors_id</tt> and <tt>eprint_fieldname_editors_id</tt> in the archive phrase file(s).
+
EPrints front end web pages and abstracts '''are ''not'' automatically updated''' when you make changes to the repository.
 +
To apply your changes and update the web pages:
  
In general: Change it! It's not a recommended system setup, just a good starting point.
+
===Generate Views===
 +
  eprints@host$ bin/generate_views ''yourarchivename''
  
==New Configurations==
+
===Generate Statics===
If you are setting up more than one archive which are related to each other, a "community", you may wish to establish common subjects and metadata.
+
  eprints@host$ bin/generate_static ''yourarchivename''
  
Removing and adding types is easy. Removing and adding fields is a bit more work. All "screen" names of values are stored in the archives own "phrase file" which comes with phrases for the default config.
+
===Generate Abstracts===
 +
  eprints@host$ bin/generate_abstracts ''yourarchivename''
  
If you create a good default configuration for a different purpose or language(s) (and would like to share it), please contact the eprints admin who may want to put it online as an example or even include it as an alternate default in a later version.
+
Finally a restart of your apache server is recommended, because a lot of settings will only be read initially!

Revision as of 14:45, 7 August 2019

Creating an Archive

EPrints 3 can run multiple archives under one install. Multiple archives will require giving additional DNS aliases to the machine running EPrints, EPrints can then create all the parts of the apache configuration file needed to run the virtual hosts.

Alternatively you can use different ports to distinguish your different repositories hosted by the same server.

Running epadmin

Make sure MySQL is actually running.

Change to your eprints user (probably eprints).

Change directory to the eprints directory (/opt/eprints3 by default for a source install and /usr/share/eprints for packaged installs) and run

bin/epadmin create

If you are running EPrints 3.4 or later you will need a further parameter to define which flavour of repository you want. The common two choices are either pub or zero:

bin/epadmin create pub

You will get the following prompts (note that when you see something in [square brackets], it's the default value and can be selected by simply hitting <enter>):

  • Archive ID - the system name for your archive. It's probably a good idea to think of something short and memorable. Once entered, an archive/<archive_id> directory will be created, and the standard configuration files will be copied in.
  • Configure vital settings - Hit enter to say 'yes'. This will lead to more prompting about core settings:
    • Hostname - What someone will type into a web browser to get to your archive. Make sure that your systems team have a DNS alias pointing to your server for this.
    • Webserver Port - Which port do you want to serve the archive on? The default is 80, so unless you can think of a good reason not to, just hit enter to accept the default.
    • Alias - You can enter any number of aliases that will take users to this archive. Enter a '#' when you don't want to enter any more. You could have your archive served on eprints.myorganisation.org and eprints.myorg.org. As with the Hostname, your systems team need to be informed about these aliases too.
      • Redirect your chosen alias to Hostname - Yes, usually that is your intention.
    • Please enter the path part of the repository's base URL - Make it simple for you and your visitors ... and confirm the suggestion; multiple archives could be managed by different domains or ports!
    • If you will use https for your user pages (including login) enter the https hostname - No doubt, for secure usage you need another name.
    • Administrator Email - Enter the email address of the repository administrator. This will allow your repository users to send email to the right person.
    • Archive Name - The full name of your archive. By default, this will be used on many of the pages, and in the title bar of the browser.
    • Write these core settings - If you don't say 'yes', then you entered all that data for nothing (relaunch the process by epadmin config_core archive_id).
  • Configure database - EPrints makes extensive use of a MySQL database. Enter 'yes' to launch epadmin config_db archive_id.
    • Database Name - The internal name of your database. It makes sense to use the Archive ID for this, but you don't have to. You don't need to create this database, epadmin will do it for you.
    • MySQL Host - The address of the server that the database is running on. If the database is on the same machine as the EPrints installation, confirm 'localhost'.
    • MySQL Port - You probably don't need to enter a value. If you have problems connecting to the database, talk to your systems team.
    • MySQL Socket - As with MySQL Port, it's unlikely that you need to enter anything.
    • Database User - The username with which to log into the MySQL Database. You don't need to create this user, epadmin will do it for you. If you enter a MySQL username that already exists, it will be overwritten by epstats!
    • Database Password - Just confirm the suggested password for the Database User, eprints will keep track of it!
    • Write these database settings - You should write them, or you'll lose them.
  • Create database <Database Name> - Say yes to launch epadmin create_db archive_id, and epadmin can create the database and populate it with all the right tables. If you've already created a database and a user for this archive, say 'no'.
    • Database Superuser Username - 'root' isn't a bad suggestion, but should match your specifications during the MySQL installation.
    • Database Superuser Password - To create the database and the user, epadmin needs the MySQL Root Password (specified during MySQL installation). This is not saved anywhere. It is only used to log into MySQL, create the database and create the user with useful access rights. The password is forgotten afterwards, because it's not needed anymore by this process!
  • Create database tables - say 'yes' to have epadmin create all the database tables, i.e. initiate the same as with epadmin create_tables archive_id.
  • Create an initial user - It's a good idea to create a user account for yourself at this point, but you can start the same process by epadmin create_user archive_id anytime ...
    • Enter a username - The username you will be used to log into EPrints on the home page of the new archive.
    • Select a user type (user|editor|admin) - There are three user roles in EPrints. You probably want to be an administrator, so enter 'admin'.
    • Enter Password - A password for this user. Remember to choose a password that will be hard for someone else to guess!
    • Email - Enter your email address so that users can get in contact with you.
  • Do you want to build the static web pages - There are a number of pages in EPrints which change very rarely. These are the static pages. The Home page and the About page are examples of static pages. Stylesheets are also static. These pages need to be built, so say 'yes' (otherwise you have to start that process manually later by generate_static archive_id).
  • Do you want to import the LOC subjects - If you will be using the Library Of Congress subject hierarchy, say 'yes'. Otherwise you will need to create your own subject hierarchy and import it using import_subjects.
  • Do you want to update the apache config files? (you still need to add the 'Include' line in the very central http.conf, e.g.) - Your archive has a number of files which it uses to configure the web server. These should be updated, so say yes or start generate_apacheconf archive_id later.
  • Before exiting, epadmin will display information about configuring the webserver.

example configuration

The script will run through a number of configuration options, an example of which is listed below. Please change the settings to suit your site configuration.

-bash-4.1$ ./bin/epadmin create

Create an EPrint Repository

Please select an ID for the repository, which will be used to create a directory
and identify the repository. Lower case letters and numbers, may not start with
a number. examples: "lemurprints" or "test3"

Archive ID? testrepo  
We need to create /usr/share/eprints/archives/testrepo, doing it now…

Creating initial files:
Installing: /usr/share/eprints/archives/testrepo/cfg
Installing: /usr/share/eprints/archives/testrepo/cfg/lang
[...]
Installing: /usr/share/eprints/archives/testrepo/cfg/workflows/eprint

Ok. I've created the initial config files and directory structure. 
I've also created a "disk0" directory under documents/ if you want
your full texts to be stored on a different partition then remove 
the disk0, and create a symbolic link to the directory you wish to
store the full texts in. Additional links may be placed here to be
used when the first is full.


Configure vital settings? [yes] ? 
Core configuration for testrepo


Please enter the fully qualified hostname of the repository. 

For a production system we recommend against using the real hostname of the 
machine. 

Example: testrepo.footle.ac.uk

Hostname? testprint

Please enter the port of the webserver. This is probably 80, but you may wish 
to run apache on a different port if you are experimenting.

Webserver Port [80] ? 

Please enter all the aliases which could reach the repository, and indicate if 
you would like EPrints to write a Redirect Rule to redirect requests to this
alias to the correct URL.
Some suggestions:
centos610.local
centos610
centos610

Enter a single hash (#) when you're done.

Alias (enter # when done) [#] ? testprint.local
Redirect testprint.local to testprint [yes] ? 

Alias (enter # when done) [#] ? 

Please enter the path part of the repository's base URL. This should probably
be '/'.

Path [/] ? 

If you will use https for your user pages (including login) enter the https hostname
here, or leave blank when using http only.

HTTPS Hostname [] ? 


Administrator Email? someone@example.com

Enter the name of the repository in the default language. If you wish to enter 
other titles for other languages or enter non ascii characters then you may
enter something as a placeholder and edit the XML config file which this
script generates.

Archive Name [Test Repository] ? 

Write these core settings? [yes] ? 
Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/adminemail.pl
Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/10_core.pl
Wrote /usr/share/eprints/archives/testrepo/cfg/lang/en/phrases/archive_name.xml

Configure database? [yes] ? 

Configuring Database for: testrepo
Database Name [testrepo] ? 
MySQL Host [localhost] ? 

You probably don't need to set socket and port (unless you do!?).
MySQL Port (# for no setting) [#] ? 
MySQL Socket (# for no setting) [#] ? 
Database User [testrepo] ? 
Database Password [nxxxxuAw] ? 
Database Engine [MyISAM] ? 

Write these database settings? [yes] ?  
Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/database.pl

EPrints can create the database, and grant the correct permissions.

Create database "testrepo" [yes] ? 
Database Superuser Username [root] ? 
Database Superuser Password? 
Create database tables? [yes] ? 
Creating database tables...
Set DB compatibility flag to '3.3.4'.
Done creating database tables.


Create an initial user? [yes] ? 
Creating a new user in testrepo

Enter a username [admin] ?
Select a user type (user|editor|admin) [admin] ?    
Enter Password? 
Email? first.last@example.org


Successfully created new user:
       ID: 1
Do you want to build the static web pages? [yes] ? 

Starting EPrints Repository.
Connecting to DB ... done.
mkdir /usr/share/eprints/archives/testrepo/html/en/codemirror/mode/velocity
/usr/share/eprints/lib/static/codemirror/mode/velocity/velocity.js -> /usr/share/eprints/archives/testrepo/html/en/codemirror/mode/velocity/velocity.js
mkdir /usr/share/eprints/archives/testrepo/html/en/style/images
/usr/share/eprints/lib/static/style/images/action_edit.png -> /usr/share/eprints/archives/testrepo/html/en/style/images/action_edit.png
[...]
/usr/share/eprints/lib/static/style/images/action_unpack.png -> /usr/share/eprints/archives/testrepo/html/en/style/images/action_unpack.png
/usr/share/eprints/lib/static/codemirror/lib/util/loadmode.js -> /usr/share/eprints/archives/testrepo/html/en/codemirror/lib/util/loadmode.js
Ending EPrints Repository.

Do you want to import the LOC subjects? [yes] ? 

Starting EPrints Repository.
Connecting to DB ... done.
Importing from /usr/share/eprints/archives/testrepo/cfg/subjects...
Done importing 280 subjects from /usr/share/eprints/archives/testrepo/cfg/subjects
Reindexing subject dataset to set ancestor data
Reindexing item: subject/A
Reindexing item: subject/AC
Reindexing item: subject/AI
Reindexing item: subject/AM
[...]
Reindexing item: subject/sch_soc
Reindexing item: subject/subjects
Done reindexing
Ending EPrints Repository.

Exiting normally.
Do you want to update the apache config files? (you still need to add the
'Include' line) [yes] ? 
Wrote /usr/share/eprints/cfg/apache/testrepo.conf

You must restart apache for any changes to take effect!

--------------------------------------------------------------------------
That seemed to more or less work...
--------------------------------------------------------------------------

Now make any required changes to the cfg files. 

Note that changing the metadata configuration may require the database
tables to be regenerated. epadmin erase_data will regenerate the 
eprints and documents tables only. erase_data will regenerate everything.
(nb. these also do erase the contents of the tables, and any uploaded 
files).

Make sure that your main apache config file contains the line:

 Include /usr/share/eprints/cfg/apache.conf

Then stop and start your webserver:
Often:
 /etc/rc.d/init.d/httpd stop
 /etc/rc.d/init.d/httpd start
(or maybe /usr/local/apache/bin/apachectl stop & start)

And then try connecting to your repository.
--------------------------------------------------------------------------

Don't forget to register your repository at http://roar.eprints.org/

-bash-4.1$ 

Finish apache's configuration before its final restart to be ready for opening a browser and watch the homepage of your new archive, ready to be branded.

If you want to add some more users, use the command epadmin create_user <repository id> or the admin's web dialog.

Regular Maintenance

EPrints front end web pages and abstracts are not automatically updated when you make changes to the repository. To apply your changes and update the web pages:

Generate Views

 eprints@host$ bin/generate_views yourarchivename

Generate Statics

 eprints@host$ bin/generate_static yourarchivename

Generate Abstracts

 eprints@host$ bin/generate_abstracts yourarchivename

Finally a restart of your apache server is recommended, because a lot of settings will only be read initially!