Difference between revisions of "Webserver authentication"

From EPrints Documentation
Jump to: navigation, search
m (Conceptual overview)
m (Conceptual overview)
Line 11: Line 11:
 
# Make EPrints proper ignore these requests (<tt>archives/[repo_id]/cfg/cfg.d/[[20_baseurls.pl]]</tt>)
 
# Make EPrints proper ignore these requests (<tt>archives/[repo_id]/cfg/cfg.d/[[20_baseurls.pl]]</tt>)
 
# Require authentication (and possibly also authorization) in the webserver for access to this resource (<tt>eprints-httpd-auth.conf</tt>)
 
# Require authentication (and possibly also authorization) in the webserver for access to this resource (<tt>eprints-httpd-auth.conf</tt>)
# Add code to the installation which handles those pre-authenticated requests (<tt>login</tt>), creates new users and sessions, and returns to the originally requested resource.
+
# Add code to the installation which handles those pre-authenticated requests, creates new users and sessions, and returns to the originally requested resource. (<tt>login</tt>)
  
 
Note that the recipe below does '''not''' provide a parallel authentication method for EPrints -- it completely replaces the default authentication method and login prompt. You can disable the new authentication method to change the user type of some (possibly newly created) user account to "repository administrator" afterwards (when logging in as eprints admin with local/database authentication). Alternatively make sure that a user account already exists within EPrints that is of user type "repository adminstrator" and has a username that can authenticate to the external authentication system (with Shibboleth you'd also need to make sure that this username is returned from the SAML Identity Provider and that the right, i.e., matching username is being mapped to httpd's <tt>REMOTE_USER</tt> variable).
 
Note that the recipe below does '''not''' provide a parallel authentication method for EPrints -- it completely replaces the default authentication method and login prompt. You can disable the new authentication method to change the user type of some (possibly newly created) user account to "repository administrator" afterwards (when logging in as eprints admin with local/database authentication). Alternatively make sure that a user account already exists within EPrints that is of user type "repository adminstrator" and has a username that can authenticate to the external authentication system (with Shibboleth you'd also need to make sure that this username is returned from the SAML Identity Provider and that the right, i.e., matching username is being mapped to httpd's <tt>REMOTE_USER</tt> variable).

Revision as of 15:10, 30 June 2012

How to configure EPrints for authentication via the webserver. This enables/provides for

  • re-use of externally managed ("enterprise") user accounts,
  • automated Just-In-Time provisioning ("on-access provisioning"), instead of Just-In-Case (seperately managed batch processes)
  • Web Single Sign-On to EPrints (with Shibboleth, CAS/mod_cas, Kerberos or just about any mod_auth_* Module for Apache httpd).

With small changes (not included below) EPrints user types (User, Editor, Repository Administrator) could also be assigned dynamically, based on data from an external authoritative source (e.g. an LDAP directory via Net::LDAP or an RDBMS via DBI) or recieved as SAML attributes (in case of Shibboleth).

Conceptual overview

Here's how this integration works conceptually (not in order of steps performed). Filenames in parentheses refer to the files from http://files.eprints.org/738/

  1. Enable HTTPS for your webserver and the EPrints instance.
  2. Configure EPrints to send requests requiring authentication to a specific resource/URL, we'll assume /shibboleth/login below but this could be any string of your choice. (auth.pl)
  3. Make EPrints proper ignore these requests (archives/[repo_id]/cfg/cfg.d/20_baseurls.pl)
  4. Require authentication (and possibly also authorization) in the webserver for access to this resource (eprints-httpd-auth.conf)
  5. Add code to the installation which handles those pre-authenticated requests, creates new users and sessions, and returns to the originally requested resource. (login)

Note that the recipe below does not provide a parallel authentication method for EPrints -- it completely replaces the default authentication method and login prompt. You can disable the new authentication method to change the user type of some (possibly newly created) user account to "repository administrator" afterwards (when logging in as eprints admin with local/database authentication). Alternatively make sure that a user account already exists within EPrints that is of user type "repository adminstrator" and has a username that can authenticate to the external authentication system (with Shibboleth you'd also need to make sure that this username is returned from the SAML Identity Provider and that the right, i.e., matching username is being mapped to httpd's REMOTE_USER variable).

A consequence of external authentication is that no self-registration of user accounts within EPrints is possible (or necessary, as many would see it) anymore, unless the system providing external authentication itself offers self-registration.

A word of warning: Don't enable this in a production EPrints instance unless either all existing users have matching usernames in your external authentication system, or you can live with the consequences of any and all users being auto-created (again) with different usernames (thereby possibly losing access to all their data, roles, etc.). Updating usernames in EPrints or other migration strategies are not considered below. Conversely make sure that none of the external usernames unintendedly match any local EPrints accounts with admin or "repository administrator" or "editor" privileges -- unless you expressly want them to have those privileges.

Prerequisites

The EPrints instance this has been tested with was deployed from the latest EPrints RPMs (which, at the time of writing, was at 3.3.7) on a newly installed RHEL6 machine. All paths and file names are hence based on an RPM install on RHEL5/6 and will need to be adapted to your webserver and EPrints installation and configuration.

Download the files from http://files.eprints.org/738/ and unpack them to a directory of choice, e.g.

cd /tmp
wget http://files.eprints.org/738/1/webserver-auth.tgz
tar xvzf webserver-auth.tgz
cd webserver-auth

TLS/SSL

SSL and HTTPS and Secure logins have been covered numerous times in this wiki, but most of the material is outdated and some seems horribly cumbersome (or both). Still, there's no point in creating yet another how-to for this so we'll keep this brief. TLS/SSL was enabled in the webserver by first installing the mod_ssl package and configuring a key pair in /etc/httpd/conf.d/ssl.conf.

When running epadmin create supply a hostname for https connections. Contrary to a statement from Getting Started with EPrints 3 ("If you will use https for your user pages (including login) enter the https hostname - No doubt, for secure usage you need another name", my emphasis) this can and probably should be your main EPrints hostname also used for plain HTTP.

(To change this after installation either edit archives/[repo_id]/cfg/cfg.d/10_core.pl or do as suggested in this file and run epadmin config_core [repo_id]. Note that leaving this to epadmin does not set or change any of the *root or *cgiroot statements which are mentioned in the wiki. Only $c->{securehost} seems to be needed.)

Since the EPrints RPM already containes /etc/httpd/conf.d/eprints.conf EPrints should already be working fine over plain HTTP. To make EPrints available over TLS/SSL as well include the auto-generated SSL-specific config inside httpd's existing SSL-vhost as defined in /etc/httpd/conf.d/ssl.conf:

Include /usr/share/eprints/cfg/apache_ssl.conf

Check httpd's config for syntax errors (apachtectl -t) and restart httpd. EPrints should now work over both http and https.

Webserver authentication

To keep this document short(er) and generally useful, please refer to your webserver's or authentication system's documentation for installation and configuration. (Note: For Shibboleth you can install from RPMs as well). We'll assume you have the desired unique identifier for a user available in httpd's REMOTE_USER variable, no matter what authentication system used.

Configuration

Exclude a resource from EPrints proper

Add the name of the resource where webserver authentication should happen to the end of archives/[repo_id]/cfg/cfg.d/20_baseurls.pl, e.g.:

$c->{rewrite_exceptions} = ['/shibboleth'];

After a service httpd reload EPrints should not present the usual, nicely formatted error message when trying to access this resource (compare with any other non-existing request URI). Instead you should see an ordinary HTTP 404 "File not found" error.

Add the login script

There are two example scripts provided in the package:

  • login-noprovisioning, which failes logins for users not found in your EPrints instance and simply redirects them to a page of your choice. Provisioning user accounts needs to happen via some other process (e.g. manually, batch processes, etc.)
  • login-autocreate, which automatically creates local EPrints users (of type "user") after successful external authentication.

Both variants can be extended according to local needs and capabilities, e.g. if your authentication system does not provide additional profile info (name, email, etc.) you could add a lookup from an LDAP directory or database to the code.

Decide on a variant of the login script. We'll be assuming login-autocreate since it's more practically useful for a new install (which is what we're describing here). Create a directory in your EPrints directory and copy your choice of login script there, naming it login. (Again, non-RPM based installs will possibly need to adjust user, paths and file ownership):

su - eprints
cd /usr/share/eprints/
mkdir shibboleth
cp /path/to/login shibboleth/
chown eprints:eprints shibboleth/login
chmod +x shibboleth/login

Tell httpd about the login scipt

Next include the content of the file eprints-httpd-auth.conf inside your SSL vhost webserver configuration in /etc/httpd/conf.d/ssl.conf, adapting file system paths as necessary:

<VirtualHost _default_:443>
ServerName https://your.hostname.example.org:443
[...]
Alias /shibboleth /usr/share/eprints/shibboleth
<Directory "/usr/share/eprints/shibboleth">
   SetHandler perl-script
   PerlHandler ModPerl::Registry
   PerlSendHeader Off
   Options ExecCGI FollowSymLinks
   
   AuthType shibboleth
   ShibRequestSetting requireSession 1
   require valid-user
</Directory>

or copy eprints-httpd-auth.conf to /etc/httpd/conf/ (for example) and only reference it inside your SSL vhost with an include directive. Then you only have two includes in your otherwise unmodified (as far as EPrints is concerned) SSL config:

Include /etc/httpd/conf/eprints-httpd-auth.conf
Include /usr/share/eprints/cfg/apache_ssl.conf

Either way, there's an Alias directive that makes the directory created before available in URL space as /shibboleth, then a content handler is defined for everything in this <Directory>, and finally authentication and autorization is enforced, using Shibboleth only as an example.

With Shibboleth you could restrict logins to specific groups of people. e.g. only faculty and staff from your institution, but not students, or, if you're part of an Identity Federation maybe you need to limit logins to members of specific institutions for some reason. You'd then replace require valid-user (which means anyone authenticated is also authorized to access the resource) with something like:

require affiliation ~ ^(faculty|staff)@example\.edu$

There are examples of the syntax in the Shibboleth documentation.

For other WebSSO or authentication systems that don't provide any data about the subject to the webserver other than REMOTE_USER (e.g. CoSign or systems using Kerberos) you can combine some of them with authorization in the webserver, e.g. via mod_authnz_ldap.

Finally, instead of also including this config inside the non-SSL vhost we simply redirect all plain HTTP requests matching our resource to the SSL vhost, where authentication then kicks in. To do that we modify the auto-generated webserver config for EPrints in /usr/share/eprints/cfg/apache/[repo_id].conf and add three Rewrite directives (assuming mod_rewrite is available and has been loaded by the webserver, which usually is the default):

[...]
ServerAdmin you@example.org

RewriteEngine On
RewriteCond %{REQUEST_URI} ^/shibboleth/
RewriteRule ^(.+)$ https://your.hostname.example.org$1 [R]

<Location "">
[...]

Alternatively, to avoid any unintentional changes to the auto-generated config file, which could be overwritten by careless use of epadmin, you could assemble a config file for the non-SSL vhost yourself, based on the series of Includes starting with /etc/httpd/conf.d/eprints.conf and include that within your httpd config instead.

Activate the new authentication method

To finally activate the switch to webserver-based authentication copy auth.pl to archives/[repo_id]/cfg/cfg.d/ and reload httpd. Conversely, to deactivate webserver-based authentication and restore EPrints' default authentication method either remove the file or wrap the whole file's content in a while(0){ and } block (Perl's approximation of a block comment) and reload httpd.