Difference between revisions of "Getting Started"
m (→Example Configuration=) |
m (→Example Configuration) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 35: | Line 35: | ||
To set up cron, run (as the eprints user): | To set up cron, run (as the eprints user): | ||
+ | % crontab -e | ||
− | |||
− | |||
− | |||
− | |||
Exactly what to add to the cron table is described in the following sections - "Browse Views" and "Subscriptions". | Exactly what to add to the cron table is described in the following sections - "Browse Views" and "Subscriptions". | ||
Line 78: | Line 75: | ||
and add the line | and add the line | ||
+ | 23 * * * * /opt/eprints2/bin/generate_views I<archiveid> | ||
− | |||
− | |||
− | |||
− | |||
This runs at 23 minutes past each hour. If you have more than one archive, don't make them all start rebuilding stuff at the same time, stagger it. Otherwise once an hour everything will slow down as it fights to run several intensive scripts at once. | This runs at 23 minutes past each hour. If you have more than one archive, don't make them all start rebuilding stuff at the same time, stagger it. Otherwise once an hour everything will slow down as it fights to run several intensive scripts at once. | ||
Line 93: | Line 87: | ||
For example (with dookuprints being the name of the archive): | For example (with dookuprints being the name of the archive): | ||
− | + | # 00:15 every morning | |
− | + | 15 0 * * * /opt/eprints2/bin/send_subscriptions dookuprints daily | |
− | + | ||
− | + | # 00:30 every sunday morning | |
− | + | 30 0 * * 0 /opt/eprints2/bin/send_subscriptions dookuprints weekly | |
− | + | ||
− | + | # 00:45 every first of the month | |
− | + | 45 0 1 * * /opt/eprints2/bin/send_subscriptions dookuprints monthly | |
− | |||
− | |||
Note the spacing out so that all 3 don't start at once and hammer the database. You may wish to change the times, but we recommend early morning as the best time to send them (midnight-6am). | Note the spacing out so that all 3 don't start at once and hammer the database. You may wish to change the times, but we recommend early morning as the best time to send them (midnight-6am). | ||
Line 120: | Line 112: | ||
===Example Configuration=== | ===Example Configuration=== | ||
− | The script will run through a number of configuration options, an example of which is listed below. Please change the settings to suit your site configuration. | + | The "epadmin create" script will run through a number of configuration options, an example of which is listed below. Please change the settings to suit your site configuration. |
<pre> | <pre> |
Latest revision as of 11:42, 4 October 2023
Manual Sections | ||
|
Contents
Creating an Archive
EPrints 3 can run run multiple archives under one install. Multiple archives will require giving additional DNS aliases to the machine running EPrints, EPrints can then create all the parts of the apache configuration file needed to run the virtual hosts.
Creating MySQL eprints user
If you are running a modern version of Linux (e.g. Red Hat Enterprise Linux 9+ or Ubuntu 20.04+), the eprints user will not be able to connect to the MySQL database as the root user without a password. Therefore, before running "epadmin create" later in these instructions make sure you switch to the root user, open the MySQL client and run the following commands, (substituting the password (i.e. 'changeme') appropriately):
CREATE USER 'eprints'@'localhost' IDENTIFIED by 'changeme'; GRANT ALL PRIVILEGES ON *.* TO 'eprints'@'localhost' WITH GRANT OPTION;
When you are asked for the Database Superuser Username and Database Superuser Password be sure to use the username and password set above. If you are running older version of Linux where you have not needed to do the above steps. You can probably use the default options for these to steps.
Creating the Archive
Make sure MySQL is actually running.
Change to your eprints user (probably "eprints").
Change directory to the eprints directory (/opt/eprints3 by default) and run
bin/epadmin create
If you are running EPrints version 3.4+ you will need to provide an extra parameter, typically this would be pub to create a publications flavour archive. E.g.
bin/epadmin create pub
Follow the prompts and answer the questions. (This is very different from how things worked under version 2; someone who knows how this all works will hopefully expand on this very bare-bones instruction.) For more detailed information about creating a repository see Getting Started with EPrints 3
Creating the Database Tables and Website
These will have been created when you created the archive.
Running a Live Archive
Creating a crontab
When you create an archive it will start out as a development system while you learn how to set it up (and your manager keeps changing his mind) but at some point (hopefully) you will declare your archive open for business.
At this point you should schedule certain scripts to run periodically. The best way to do this is to use "cron" which is an integral part of most UNIX systems.
To set up cron, run (as the eprints user):
% crontab -e
Exactly what to add to the cron table is described in the following sections - "Browse Views" and "Subscriptions".
There should be one set of crontab entries per archive.
Backups
You should also have made sure that the system is being properly backed up. This is gone into in more detail elsewhere in the documentation.
OAI
We would also encourage you to configure the OAI support for your archive and register it. It's quite easy - pretty much fill in the blanks in the ArchiveOAIConfig.pm file in the archive configuration directory.
EPrints 2.1 support OAI versions 1 and 2 at URL paths /perl/oai and /perl/oai2.
Once you register your archive (at http://www.openarchives.org) various search systems will be able to collect the metadata (titles, authors, abstract etc.) and allow more people to find records in your archive.
See http://www.openarchives.org/ for more information on the OAI protocol. For more information setting up the OAI interface archive see the section in this documentation about Configuring an Archive.
Browse Views
Once every so often you should run the "generate_views" script on each archive in your system to regenerate the browse views section of the site.
This is a set of static pages. By default one per subject, and one per year (only years with papers in that year not EVERY year ever!). Some users prefer to browse the system than search it. This also gives search engines a way to reach, and index, the abstract pages.
See the ArchiveConfig.pm config notes on how to edit the views it generates.
See the How-To section for some suggestions on how to set up views.
But I don't want this feature...
If you don't want to use this feature: don't, it's your archive. Remove the link from the template and front page. Don't run the generate_views script.
Setting it up
This is best done by using the UNIX "cron" command (as user "eprints"). Cron will email "eprints" on that machine with the output, so best use the --quiet option so it only bothers you with errors.
How often you want to run this depends on the size of your archive, and how fast the contents changes. This feature is roughly order "n". Which means if you double the number of items in your archive then you double the time it takes to run (ish).
Once an hour would seem a good starting point. If your archive gets real big, say more than 10000 records, then maybe once a day is more realistic - the one thing that you don't want to happen is for a new generate_views to start before the old one finishes as they will mess up each others output.
Run generate_views on the command line to find out how long it takes.
and add the line
23 * * * * /opt/eprints2/bin/generate_views I<archiveid>
This runs at 23 minutes past each hour. If you have more than one archive, don't make them all start rebuilding stuff at the same time, stagger it. Otherwise once an hour everything will slow down as it fights to run several intensive scripts at once.
See the crontab man page man 5 crontab for more information on using cron.
Subscriptions
Subscriptions provide a way in which users of your system can receive regular updates, via email, when new items are added which match a search they specified.
To automate sending out these subscriptions you must add some entries in the crontab (as for views). You need one set of these per archive.
For example (with dookuprints being the name of the archive):
# 00:15 every morning 15 0 * * * /opt/eprints2/bin/send_subscriptions dookuprints daily # 00:30 every sunday morning 30 0 * * 0 /opt/eprints2/bin/send_subscriptions dookuprints weekly # 00:45 every first of the month 45 0 1 * * /opt/eprints2/bin/send_subscriptions dookuprints monthly
Note the spacing out so that all 3 don't start at once and hammer the database. You may wish to change the times, but we recommend early morning as the best time to send them (midnight-6am).
But I don't want users to be able to do this!
Then remove the "subscription" power from each type of user in the archives ArchiveConfig.pm file.
Default Configuration
EPrints configures a new archive with a set of metadata fields aimed at an archive of research papers.
The initial "types" of eprint (book, poster, conference paper) are configured in metadata-types.xml
The initial subjects are a subset of the library of congress subjects. Feel free to totally replace them with your own subjects, but the more standard your subject tree the more useful your metadata will be to other people.
The authors and editors have the "hasid" option set which allows people to optionally use a unique id for a person in addition to their name (names are NOT unique!) - this can be useful for generating "CV" pages (see the views how-to) and possibly for generating statistics. Without it you will never be sure which "John Smith" wrote that paper. If you don't like this feature remove the "hasid" from the authors and editors - this will require you to recreate the tables, erasing the archive, so decide before you start. If you want to be more clear about what information goes in that field, edit the phrases eprint_fieldname_authors_id and eprint_fieldname_editors_id in the archive phrase file(s).
In general: Change it! It's not a recommended system setup, just a good starting point.
Example Configuration
The "epadmin create" script will run through a number of configuration options, an example of which is listed below. Please change the settings to suit your site configuration.
-bash-4.1$ ./bin/epadmin create pub Create an EPrint Repository Please select an ID for the repository, which will be used to create a directory and identify the repository. Lower case letters and numbers, may not start with a number. examples: "lemurprints" or "test3" Archive ID? testrepo We need to create /usr/share/eprints/archives/testrepo, doing it now… Creating initial files: Installing: /usr/share/eprints/archives/testrepo/cfg Installing: /usr/share/eprints/archives/testrepo/cfg/lang [...] Installing: /usr/share/eprints/archives/testrepo/cfg/workflows/eprint Ok. I've created the initial config files and directory structure. I've also created a "disk0" directory under documents/ if you want your full texts to be stored on a different partition then remove the disk0, and create a symbolic link to the directory you wish to store the full texts in. Additional links may be placed here to be used when the first is full. Configure vital settings? [yes] ? Core configuration for testrepo Please enter the fully qualified hostname of the repository. For a production system we recommend against using the real hostname of the machine. Example: testrepo.footle.ac.uk Hostname? testprint Please enter the port of the webserver. This is probably 80, but you may wish to run apache on a different port if you are experimenting. Webserver Port [80] ? Please enter all the aliases which could reach the repository, and indicate if you would like EPrints to write a Redirect Rule to redirect requests to this alias to the correct URL. Some suggestions: centos610.local centos610 centos610 Enter a single hash (#) when you're done. Alias (enter # when done) [#] ? testprint.local Redirect testprint.local to testprint [yes] ? Alias (enter # when done) [#] ? Please enter the path part of the repository's base URL. This should probably be '/'. Path [/] ? If you will use https for your user pages (including login) enter the https hostname here, or leave blank when using http only. HTTPS Hostname [] ? Administrator Email? someone@example.com Enter the name of the repository in the default language. If you wish to enter other titles for other languages or enter non ascii characters then you may enter something as a placeholder and edit the XML config file which this script generates. Archive Name [Test Repository] ? Write these core settings? [yes] ? Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/adminemail.pl Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/10_core.pl Wrote /usr/share/eprints/archives/testrepo/cfg/lang/en/phrases/archive_name.xml Configure database? [yes] ? Configuring Database for: testrepo Database Name [testrepo] ? MySQL Host [localhost] ? You probably don't need to set socket and port (unless you do!?). MySQL Port (# for no setting) [#] ? MySQL Socket (# for no setting) [#] ? Database User [testrepo] ? Database Password [nxxxxuAw] ? Database Engine [MyISAM] ? Write these database settings? [yes] ? Wrote /usr/share/eprints/archives/testrepo/cfg/cfg.d/database.pl EPrints can create the database, and grant the correct permissions. Create database "testrepo" [yes] ? Database Superuser Username [root] ? Database Superuser Password? Create database tables? [yes] ? Creating database tables... Set DB compatibility flag to '3.3.4'. Done creating database tables. Create an initial user? [yes] ? Creating a new user in testrepo Enter a username [admin] ? Select a user type (user|editor|admin) [admin] ? Enter Password? Email? first.last@example.org Successfully created new user: ID: 1 Do you want to build the static web pages? [yes] ? Starting EPrints Repository. Connecting to DB ... done. mkdir /usr/share/eprints/archives/testrepo/html/en/codemirror/mode/velocity /usr/share/eprints/lib/static/codemirror/mode/velocity/velocity.js -> /usr/share/eprints/archives/testrepo/html/en/codemirror/mode/velocity/velocity.js mkdir /usr/share/eprints/archives/testrepo/html/en/style/images /usr/share/eprints/lib/static/style/images/action_edit.png -> /usr/share/eprints/archives/testrepo/html/en/style/images/action_edit.png [...] /usr/share/eprints/lib/static/style/images/action_unpack.png -> /usr/share/eprints/archives/testrepo/html/en/style/images/action_unpack.png /usr/share/eprints/lib/static/codemirror/lib/util/loadmode.js -> /usr/share/eprints/archives/testrepo/html/en/codemirror/lib/util/loadmode.js Ending EPrints Repository. Do you want to import the LOC subjects? [yes] ? Starting EPrints Repository. Connecting to DB ... done. Importing from /usr/share/eprints/archives/testrepo/cfg/subjects... Done importing 280 subjects from /usr/share/eprints/archives/testrepo/cfg/subjects Reindexing subject dataset to set ancestor data Reindexing item: subject/A Reindexing item: subject/AC Reindexing item: subject/AI Reindexing item: subject/AM [...] Reindexing item: subject/sch_soc Reindexing item: subject/subjects Done reindexing Ending EPrints Repository. Exiting normally. Do you want to update the apache config files? (you still need to add the 'Include' line) [yes] ? Wrote /usr/share/eprints/cfg/apache/testrepo.conf You must restart apache for any changes to take effect! -------------------------------------------------------------------------- That seemed to more or less work... -------------------------------------------------------------------------- Now make any required changes to the cfg files. Note that changing the metadata configuration may require the database tables to be regenerated. epadmin erase_data will regenerate the eprints and documents tables only. erase_data will regenerate everything. (nb. these also do erase the contents of the tables, and any uploaded files). Make sure that your main apache config file contains the line: Include /usr/share/eprints/cfg/apache.conf Then stop and start your webserver: Often: /etc/rc.d/init.d/httpd stop /etc/rc.d/init.d/httpd start (or maybe /usr/local/apache/bin/apachectl stop & start) And then try connecting to your repository. -------------------------------------------------------------------------- Don't forget to register your repository at http://roar.eprints.org/ -bash-4.1$
New Configurations
If you are setting up more than one archive which are related to each other, a "community", you may wish to establish common subjects and metadata.
Removing and adding types is easy. Removing and adding fields is a bit more work. All "screen" names of values are stored in the archives own "phrase file" which comes with phrases for the default config.
If you create a good default configuration for a different purpose or language(s) (and would like to share it), please contact the eprints admin who may want to put it online as an example or even include it as an alternate default in a later version.