Https3

From EPrints Documentation
Jump to: navigation, search

Introduction

EPrints 3.0.5/3.1 introduces a new mechanism to set up HTTPS that greatly simplifies the process, see HTTPS.


Setting up EPrints3 to work with https is a little tricky. There seems to be a few bugs to be worked round. This How To considers the following scenario:

  • Two repositories, repos1 and repos2, being served by virtual hosts repos1.FQDN:80 and repos2.FQDN:80
  • A single https domain, at eprints.FQDN:443 (so that only one certificate is needed). Secure pages for repos1 and repos2 will be accessed at eprints.FQDN:443/repos1 and eprints.FQDN:443/repos2 respectively.

This How To should work with EPrints 3.0 or 3.0.1. It was developed on Ubuntu Server 6.06, but should work on other systems without significant changes. The instructions can be adapted for an arbitrary number of repositories.

It is assumed that EPrints is installed in /opt/eprints3/.

Getting started

Install EPrints 3.x following the appropriate instructions.

Run bin/epadmin create twice to create repos1 and repos2.

Edit /opt/eprints3/archives/repos1/cfg/cfg.d/10_core.pl to read:

 $c->{host} = 'repos1.FQDN';
 $c->{port} = 80;
 $c->{aliases} = [];
 $c->{securehost} = 'eprints.FQDN';
 $c->{securepath} = '/repos1';
 $c->{secureport} = 443;


Make secure versions of the templates:

 cp /opt/eprints3/archives/repos1/cfg/lang/en/templates/default.xml /opt/eprints3/archives/repos1/cfg/lang/en/templates/secure.xml

Repeat these steps for repos2.

Generate the Apache configuration:

 /opt/eprints3/bin/generate_apacheconf

Add 'Include /opt/eprints3/cfg/apache.conf' to the Apache configuration (for Ubuntu / Debian, can replace everything in /etc/apache/sites-avaliable/default with 'Include /opt/eprints3/cfg/apache.conf'). Apache should now be correctly configured to serve the non-secure pages.

Secure Apache Configuration

Next, we want to configure Apache to serve the secure pages. However, generate_apacheconf hasn't created a secure.conf file in /opt/eprints3/cfg/ so this needs to be done manually. Some configuration has been generated for us in /opt/eprints3/archives/repos1/var/auto-secure.conf and /opt/eprints3/archives/repos2/var/auto-secure.conf, but there are some problems with this:

  • Some sections of the configuration overlap;
  • The EPrints_ArchiveID and PerlSetVar EPrints_Secure variables have not been set.
We'll therefore create our own configuration. Create a new file called cfg/secure.conf:
 #cfg/secure.conf:
 NameVirtualHost *:443
 <VirtualHost *:443>
   ServerAdmin itsupport@FQDN
   ServerName  eprints.FQDN
   
   SSLEngine On
   SSLCertificateFile /etc/apache2/ssl/apache.pem
   
   ErrorLog /var/log/apache2/error.log
   
   # Possible values include: debug, info, notice, warn, error, crit,
   # alert, emerg.
   LogLevel warn
   
   CustomLog /var/log/apache2/access.log combined
   ServerSignature On
   
  DocumentRoot "/var/www/eprints"
   
   <Directory "/opt/eprints3/cgi/users">
     AuthName "User Area"
     AuthType "Basic"
     PerlAuthenHandler EPrints::Apache::Auth::authen
     PerlAuthzHandler EPrints::Apache::Auth::authz
     require valid-user
    
     SetHandler perl-script
     PerlHandler ModPerl::Registry
     PerlSendHeader Off
     Options ExecCGI FollowSymLinks
   </Directory>
   <Directory "/opt/eprints3/cgi/users/awstats">
     PerlSendHeader On
   </Directory>
   
   <Directory "/opt/eprints3/cgi">
     SetHandler perl-script
     PerlHandler ModPerl::Registry
     PerlSendHeader Off
     Options ExecCGI FollowSymLinks
   </Directory>
   
   PerlTransHandler EPrints::Apache::Rewrite
   
   Include /opt/eprints3/archives/repos1/var/manual-secure.conf 
   Include /opt/eprints3/archives/repos2/var/manual-secure.conf
 </VirtualHost>

Not the line 'DocumentRoot "/var/www/eprints"'. Create an index.html file in /var/www/eprints/ with a welcome message and links to the home pages of the repositories. Also note that we need to create a manual-secure.conf file for each repository. The contents of this file are as follows:

 #/opt/eprints3/archives/repos1s/var/manual-secure.conf
 
 <Location "/repos1">
   PerlSetVar EPrints_ArchiveID repos1
   PerlSetVar EPrints_Secure yes
   PerlSetVar EPrints_Dir_SecuredCGI /opt/eprints3/cgi/users
   PerlSetVar EPrints_Dir_Documents /opt/eprints3/archives/repos1/documents
   PerlLogHandler EPrints::Apache::LogHandler
 </Location>
 
 Alias /repos1/cgi/accounts/confirm /opt/eprints3/cgi/confirm
 Alias /repos1/cgi/accounts/register /opt/eprints3/cgi/register
 Alias /repos1/cgi/accounts/reset_password /opt/eprints3/cgi/reset_password
 Alias /repos1/cgi/accounts/set_password /opt/eprints3/cgi/set_password
 Alias /repos1/cgi/users/ /opt/eprints3/cgi/users/
 Alias /repos1/ /opt/eprints3/archives/publications/html/

For completeness, we'll also want to add the welcome page to http: Add the following lines to /opt/eprints3/cfg/apache.conf

 <VirtualHost *:80>
   ServerName eprints.FQDN
   ServerAdmin itsupport@FQDN
   DocumentRoot "/var/www/eprints"
 </VirtualHost>

Add 'Include /opt/eprints3/cfg/secure.conf' to the Apache configuration.

One thing remains. In /opt/eprints3/archives/repos1/cfg/cfg.d/misc.pl and /opt/eprints3/archives/repos2/cfg/cfg.d/misc.pl change the line

 $c->{cookie_domain} = $c->{host}; 

to read

 $c->{cookie_domain} = $c->{securehost}; 

Restart Apache. At this point it should be possible to access the repositories at http://publications.modhist.ox.ac.uk and http://oxhistonline.modhist.ox.ac.uk and log in to the secure area.

Debian / Ubuntu specific SSL instructions

Create a file called ssl in /etc/apache2/sites-available/ssl and add the line 'Include /opt/eprints3/cfg/secure.conf'. Run the commands:

 a2ensite ssl
 a2enmod ssl
 apache2-ssl-certificate
 echo "Listen 443" >> /etc/apache2/ports.conf


Bugs

Broken Actions

Links which call Perl cgi scripts are broken – e.g. Under Manage Deposits, click New Item. Select an Item Type and then click Next. You will be returned to the Manage Deposits page, rather than to the next step in the workflow. This appears to be because the form action is pointing to http://publications.modhist.ox.ac.uk/cgi/users/home#t rather than /publications/cgi/users/home. As far as I can see, this is a bug, rather than a configuration mistake, though I'm happy to be advised otherwise.

The workaround I have for this bug is to install patch 252, which can be downloaded from http://files.eprints.org/252/

The patch seeks to resolve the problem by introducing a configuration variable users_url in 20_baseurls.pl. The use of perl_url has been replaced with users_url for all links to scripts in cgi/users. For insecure use, users_url can be set to perl_url. When https is requires, it can be adjusted appropriately.

Apply the patch to the EPrints 3.x source (patch -d eprints-3.0/ -p0 < users-url.patch) and re-run configure and install.pl. Add the following lines to /opt/eprints3/archives/repos1/cfg/cfg.d/20_baseurls.pl :

 $c->{secure_urlpath} = $c->{securepath};
 $c->{secure_url} = "https://".$c->{securehost}.($c->{secureport}!=443?":".$c->{secureport}:"").$c->{secure_urlpath};
 # Mod_perl scripts for users scripts
 # If not using https, make this the same as perl_url
 # Otherwise make it  $c->{secure_url}."/cgi"
 #$c->{users_url} = $c->{perl_url};
 $c->{users_url} = $c->{secure_url}."/cgi";

Similarly for /opt/eprints3/archives/repos2/cfg/cfg.d/20_baseurls.pl. Restart Apache. It should now be possible to upload documents, step through workflows etc.

However, some bugs with image urls remain.

Internet Explorer 'Secure and non Secure items'

We're not done yet! Internet Explorer complains if non-secure (http) and secure (https) items are displayed on the same page. This happens when a full url, beginning http:// is embedded in a page that is https. If the securepath and url_path variables were the same, e.g. we were using http://repos1.FQDN/repos1 and https://eprints.FQDN/repos1 then we could simply make all urls relative, but because they are different, e.g http://repos1.FQDN/ and https://eprints.FQDN/repos1 we must handle the secure and non-secure cases separately.

Edit /opt/eprints3/archives/repos1/cfg/lang/en/templates/secure.xml to use full https:// urls for included javascript and css and for the logo and .ico files. The head section should look something like this:

 <head>
   <title><epc:pin ref="title" textonly="yes"/> - <epc:phrase ref="archive_name"/></title>
   <script src="{$config{secure_url}}/javascript/auto.js" type="text/javascript"></script>
   <style type="text/css" media="screen">@import url(<epc:print expr="$config{secure_url}"/>/style/secure_auto.css);</style>
   <style type="text/css" media="print">@import url(<epc:print expr="$config{secure_url}"/>/style/print.css);</style>
   <link rel="icon" href="{$config{secure_url}}/favicon.ico" type="image/x-icon"/>
   <link rel="shortcut icon" href="{$config{secure_url}}/favicon.ico" type="image/x-icon"/>
   <link rel="Top" href="{$config{frontpage}}"/>
   <link rel="Search" href="{$config{perl_url}}/search"/>
   <epc:pin ref="head"/>
 </head>
 <body bgcolor="#ffffff" text="#000000">
 div class="ep_noprint">
 <noscript><style type='text/css'>@import url(<epc:print expr="$config{secure_url}"/>/style/nojs.css);</style></noscript>
 </div>
 <epc:pin ref="pagetop"/>
 div class="ep_tm_header ep_noprint">
 div class="ep_tm_logo"><a href="{$config{frontpage}}"><img alt="Logo" src="{$config{secure_url}}{$config{site_logo}}" /></a></div>

also change the link to the EPrints logo in the footer:

 <a href="http://eprints.org/software/"><img src="{$config{secure_url}}/images/eprintslogo.gif" border="0"/></a>

We now need to modify 'generate_static' to create the secure_auto.css file as well as auto.css. The relevant section should be modified like this:

  # do the magic auto.js and auto.css
       my $js = "";
       my $css = "";
       my $secure_css ="";
       my $fn;
       my $base_url = $session->get_repository->get_conf( "base_url" );
       my $secure_url =  $session->get_repository->get_conf( "secure_url" );
       foreach my $target ( sort keys %{$map} )
       {
               if( $target =~ m/(\/style\/auto\/.*\.css$)/ )
               {
                       $css .= "\@import url($base_url$1);\n";
                       $secure_css .= "\@import url($secure_url$1);\n";
               }       
               if( $target =~ m/(\/javascript\/auto\/.*\.js$)/ )
               {
                       $fn = $map->{$target};
                       open( JS, $fn ) || EPrints::abort( "Can't read $fn: $!" );
                       $js .= "\n\n\n/* From: $fn */\n\n";
                       $js .= join( "", <JS> );
                       close JS;                        }        
       }
 
       $fn = "$base_target_dir/style/auto.css";
       open( CSS, ">$fn" ) || EPrints::abort( "Can't write $fn: $!" );
       $wrote_files->{$fn} = 1;
       print CSS $css;
       close CSS;
 
       $fn = "$base_target_dir/style/secure_auto.css";
       open( CSS, ">$fn" ) || EPrints::abort( "Can't write $fn: $!" );
       $wrote_files->{$fn} = 1;
       print CSS $secure_css;
       close CSS;
 

Re-run generate_static repos1.

Missing Images

A number of image links will be broken because eprints is looking for them on urlpath rather than on secureurl. In some cases this can be fixed by replacing urlpath with secureurl in the code, but some phrases may appear on both secure and insecure pages. Therefore we also need a solution which can be evaluated by a epin in a template. The cleanest way I could find of doing this is as follows:

Add two methods to EPrints::Session


 ######################################################################
 =pod
 
 =item $foo = $session->get_baseurl()
 
 Gets the base url for images, depending on whether the session is secure
 or not (N.B. by secure we mean EPrints_secure is yes)
 Added by CSH 26 June 2007
 
 =cut
 ######################################################################
 sub get_baseurl
 {
 	my( $self ) = @_;
 	
 	my $esec = $self->get_request->dir_config( "EPrints_Secure" );
    	if( defined $esec && $esec eq "yes" )
 	{
    		return $self->get_repository->get_conf( "securepath" );
  	}
  	else
  	{
  		return $self->get_repository->get_conf( "base_url" );
  	}		
 }
 
 ######################################################################
 =pod
  
 =item $foo = $session->get_imageurl()
 
 Gets the base url for images, depending on whether the session is secure
 or not (N.B. by secure we mean EPrints_secure is yes)
 Added by CSH 26 June 2007
 
 =cut
 ######################################################################
 sub get_imageurl
 {
 	my( $self ) = @_;
 	
 	return $self->get_baseurl."/style/images/";
 }  

Search through the eprints code for every occurrence of the string /style/images/ and replace with a call to get_imageurl

e.g. In /opt/eprints/perl_lib/EPrints/Plugin/Screen/Items.pm, change the line

 my $imagesurl = $self->{session}->get_repository->get_conf( "urlpath" )."/style/images";

to

 my $imagesurl = $self->{session}->get_imageurl;


We will add a variable called baseurl to the variables available in a script:

In the execute method of EPrints::Script, add:

 $state->{baseurl} = [$state->{session}->get_baseurl, "STRING" ];

before the run command is called on the compiler. It will now be possible to get the url for images in a phrase through {$baseurl}

In /opt/eprints3/lib/lang/en/phrases/system.xml, change the links to images in the following phrases to use $baseurl

sys:ep_form_required Plugin/InputForm/Surround/Default:show_help Plugin/InputForm/Surround/Default:hide_help lib/session:show_help lib/session:hide_help


e.g. the ep_form_required phrase becomes:

 <epp:phrase id="sys:ep_form_required"><img src="{$baseurl}/style/images/required.png" border="0" class="ep_required" alt="Required"/> <epc:pin name="label"/></epp:phrase>

If some repositories aren't using https, you may prefer to override these phrases on a per repository basis.

Troubleshooting

Some common problems and solutions:

File does not exist: /htdocs

In the log file you may see messages like:

File does not exist: /htdocs

This may be generated by https pages with invalid links to images (e.g. /images instead of /repos1/images) if no DocumentRoot is set for the https virtual host. Note that if we had only one repository per https address we could use DocumentRoot to point to DocumentRoot /opt/eprints3/archives/ARCHIVEID/html/en/ (as suggested by Peter Schober) but this wont work on a per repository base as you can’t put DocumentRoot in a Location block.

EPrint 1 has no directory set

If you get the message “EPrint 1 has no directory set. This is very dangerous as EPrints has no idea where to write files for this eprint. This may imply a buggy import tool or some other cause of corrupt data.”, it is most probably caused by the web server user having failed to create a target subdirectory for the new eprint under archives/archivename/documents/disk0/... due to insufficient file system permissions. Perhaps you forgot to set Apache to run as eprints:eprints (or alternatively, add the Apache user to the eprints group).

Other

Remember to check that the EPrints_Dir_SecuredCGI and EPrints_Dir_Documents variables are set and cookie_domain is securehost not host.

Useful info

You can set $imagesurl to be secure or insecure at runtime with some code like this (taken from MetaField.pm)

 my $esec = $session->get_request->dir_config( "EPrints_Secure" );
 if( defined $esec && $esec eq "yes" )
 {
   $imagesurl = $session->get_repository->get_conf( "securepath" )."/style/images";
 }