Webserver authentication

From EPrints Documentation
Revision as of 19:32, 15 March 2012 by Sp (talk | contribs) (Created page with 'How to '''configure EPrints for authentication via the webserver'''. This enables/provides for * re-use of externally managed ("enterprise") user accounts, * automated Just-In-Ti…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

How to configure EPrints for authentication via the webserver. This enables/provides for

  • re-use of externally managed ("enterprise") user accounts,
  • automated Just-In-Time provisioning ("on-access provisioning"), instead of Just-In-Case (seperately managed batch processes)
  • Web Single Sign-On to EPrints (with Shibboleth, CAS/mod_cas, Kerberos or just about any mod_auth_* Module for Apache httpd).

With small changes (not yet included below) EPrints user types (User, Editor, Repository Administrator) could also be assigned dynamically, based on data from an external authoritative source (e.g. an LDAP directory via Net::LDAP or an RDBMS via DBI) or recieved as SAML attributes (in case of Shibboleth).

Conceptual overview

Here's how this integration works conceptually (not in order of steps performed). Filenames in parentheses refer to the files in/from FIXME:

  1. Enable HTTPS for your webserver and the EPrints instance.
  2. Configure EPrints to send requests requiring authentication to a specific resource/URL, we'll assume /shibboleth/login below but this could be any string of your choice and is not visible to people logging in! (auth.pl)
  3. Make EPrints proper ignore these requests (archives/[repo_id]/cfg/cfg.d/20_baseurls.pl)
  4. Require authentication (and possibly also authorization) in the webserver for access to this resource (eprints-httpd-auth.conf)
  5. Add code to the installation which handles those pre-authenticated requests (login), creates new users and sessions, and returns to the originally requested resource.

Note that the recipe below does not provide a parallel authentication method for EPrints -- it completely replaces the default authentication method and login prompt. You can disable the new authentication method to change the user type of some (possibly newly created) user account to "repository administrator" afterwards (when logging in as eprints admin with local/database authentication). Alternatively make sure that a user account already exists within EPrints that is of user type "repository adminstrator" and has a username that can authenticate to the external authentication system (with Shibboleth you'd also need to make sure that this username is returned from the SAML Identity Provider and that the right, i.e., matching username is being mapped to httpd's REMOTE_USER variable).

A word of warning: Don't enable this in a production EPrints instance unless either all existing users have matching usernames in your external authentication system, or you can live with the consequences of any and all users being auto-created (again) with different usernames (thereby possibly losing access to all their data, roles, etc.). Updating usernames in EPrints or other migration strategies are not considered below. Conversely make sure that none of the external usernames unintendedly match any local EPrints accounts with admin or "repository administrator" or "editor" privileges -- unless you expressly want them to have those privileges.

Prerequisites

The EPrints instance this has been tested with was deployed from the latest EPrints RPMs (which, at the time of writing, was at 3.3.7) on a newly installed RHEL6 machine.

TLS/SSL

SSL and HTTPS and Secure logins have been covered numerous times in this wiki, but most of the material is outdated and some seems horribly cumbersome (or both). Still, there's no point in creating yet another how-to for this so we'll keep this brief. TLS/SSL was enabled in the webserver by first installing the mod_ssl package and configuring a key pair in /etc/httpd/conf.d/ssl.conf.

When running epadmin create supply a hostname for https connections. Contrary to a statement from Getting Started with EPrints 3 ("If you will use https for your user pages (including login) enter the https hostname - No doubt, for secure usage you need another name", my emphasis) this can and probably should be your main EPrints hostname also used for plain HTTP.

(To change this after installation either edit archives/[repo_id]/cfg/cfg.d/10_core.pl or do as suggested in this file and run epadmin config_core [repo_id]. Note that leaving this to epadmin does not set or change any of the *root or *cgiroot statements which are mentioned in the wiki. Only $c->{securehost} seems to be needed.)

Since the EPrints RPM already containes /etc/httpd/conf.d/eprints.conf EPrints should already be working fine over plain HTTP. To make EPrints available over TLS/SSL as well include the auto-generated SSL-specific config inside httpd's existing SSL-vhost as defined in /etc/httpd/conf.d/ssl.conf (assuming an RPM-based EPrints install):

Include /usr/share/eprints/cfg/apache_ssl.conf

Check httpd's config for syntax errors (apachtectl -t) and restart httpd. EPrints should now work over both http and https.

Webserver authentication

To keep this document short(er) and generally useful, please refer to your webserver's or authentication system's documentation for installation and configuration. (Note: For Shibboleth you can install from RPMs as well). We'll assume you have the desired unique identifier for a user available in httpd's REMOTE_USER variable, no matter what authentication system used.

Configuration

Exclude resource from EPrints proper

Add the name of the resource where webserver authentication should happen to the end of archives/[repo_id]/cfg/cfg.d/20_baseurls.pl, e.g.:

$c->{rewrite_exceptions} = ['/shibboleth'];

After a service httpd reload EPrints should not present a nicely formatted error message when trying to access this resource (compare with any other non-existing request URI). Instead you should see an ordinary HTTP 404 "File not found" error.

Add login script

Create a directory in your EPrints directory and copy the login script there (non-RPM based installs will possibly need to adjust user, paths and file ownership):

su - eprints
cd /usr/share/eprints/
mkdir shibboleth
cp /path/to/login shibboleth/
chown eprints:eprints shibboleth/login
chmod +x shibboleth/login

Tell httpd about login scipt

Next include the content of the file eprints-httpd-auth.conf inside your SSL vhost webserver configuration in /etc/httpd/conf.d/ssl.conf, adapting file system paths as necessary:

<VirtualHost _default_:443>
ServerName https://your.hostname.example.org:443
[...]
Alias /shibboleth /usr/share/eprints/shibboleth
<Directory "/usr/share/eprints/shibboleth">
   SetHandler perl-script
   PerlHandler ModPerl::Registry
   PerlSendHeader Off
   Options ExecCGI FollowSymLinks
   
   AuthType shibboleth
   ShibRequestSetting requireSession 1
   require valid-user
</Directory>

or copy eprints-httpd-auth.conf to /etc/httpd/conf/ (for example) and only reference it inside your SSL vhost with an include directive. Then you only have two includes in your otherwise unmodified SSL config:

Include /etc/httpd/conf/eprints-httpd-auth.conf
Include /usr/share/eprints/cfg/apache_ssl.conf

Either way, there's an Alias directive that makes the directory created before available in URL space as /shibboleth, then a content handler is defined for everything in this <Directory>, and finally authentication and autorization is enforced, using Shibboleth only as an example.

With Shibboleth you could restrict logins to specific groups of people. e.g. only faculty and staff from your institution, but not students, or, if you're part of an Identity Federation maybe you need to limit logins to members of specific institutions for some reason. You'd then replace require valid-user (which means anyone authenticated is also authorized to access the resource) with something like:

require affiliation ~ ^(faculty|staff)@example\.edu$

There are examples of the syntax in the Shibboleth documentation.

For other WebSSO or authentication systems that don't provide any data about the subject to the webserver other than REMOTE_USER (e.g. CoSign or systems using Kerberos) you can combine some of them with authorization in the webserver, e.g. via mod_authnz_ldap.

Finally, instead of also including this config inside the non-SSL vhost we simply redirect all plain HTTP requests matching our resource to the SSL vhost, where authentication then kicks in. To do that we modify the auto-generated webserver config for EPrints in /usr/share/eprints/cfg/apache/[repo_id].conf and add three Rewrite directives (assuming mod_rewrite is available and has been loaded by the webserver, which usually is the default):

[...]
ServerAdmin you@example.org

RewriteEngine On
RewriteCond %{REQUEST_URI} ^/shibboleth/
RewriteRule ^(.+)$ https://your.hostname.example.org$1 [R]

<Location "">
[...]

Alternatively, to avoid any unintentional changes to the auto-generated config file, which could be overwritten by careless use of epadmin, you could assemble a config file for the non-SSL vhost yourself, based on the series of Includes starting with /etc/httpd/conf.d/eprints.conf and include that within your httpd config instead.

Tell EPrints about it

To finally activate (and deactivate, if you need to) the switch to webserver-based authentication copy auth.pl to archives/[repo_id]/cfg/cfg.d/ and reload httpd.