Simplified HTTPS Configuration

From EPrints Documentation
Jump to: navigation, search


* * * YOU MUST USE EPRINTS 3.4.1++ FOR THE CONFIGURATION BELOW TO BE GUARANTEED TO WORK * * *

Trying to configure EPrints for HTTPS can be difficult and the way the code was previously written, even if you configured HTTPS correctly you could still have issues with mixed content pages, amongst other problems. In EPrints 3.4.1 the underlying code has been improved so that you can configure, host, port, securehost, and secureport in your archive's cfg/cfg.d/10_core.pl in three different ways to get the behaviour. Other configuration options in this file should not need to be changed.

Configurations

Make sure you remove or disable your archive's cfg/cfg.d/https.pl if it exists as it may override the configuration below. Once you have updated your configuration you must run generate_apacheconf to regenerate configuration for Apache before restarting the web server.

HTTP Only

It is advised you avoid using this configuration unless you are developing a repository on a non-publicly accessible web host.

$c->{host} = 'example.eprints.org';
$c->{port} = 80;
$c->{securehost} = undef;
$c->{secureport} = undef;

HTTPS When You Login

This is the current default for EPrints. All publicly accessible pages will use HTTP by default (but still be accessible over HTTPS if you modify the URL) and the login page and all login restricted pages will use HTTPS or be redirected from HTTP.

$c->{host} = 'example.eprints.org';
$c->{port} = 80;
$c->{securehost} = $c->{host};
$c->{secureport} = 443;

HTTPS Only

This ensures that no page (image, CSS, JavaScript file, etc.) will be returned over HTTP and if requested it will be redirected to HTTPS.

You may also want to edit the archive's ssl/securevhost.conf to add the HSTS header.

$c->{host} = undef;
$c->{port} = 80;
$c->{securehost} = 'example.eprints.org';
$c->{secureport} = 443;

Issues and Troubleshooting

Inevitably you may still encounter issues even if you use one of the configurations above, so it is advised you test this on a development or pre-production instance of your repository to check you get the behaviour you expect.

EPrint URI Change

When an EPrint made live it will acquire a URI in the form

http://example.eprints.org/eprint/id/1234

If you switch over to HTTPS Only the above URI will be updated (if you refresh abstracts) to

https://example.eprints.org/eprint/id/1234

For most repositories this will not be an issue but if your repository is harvested by a third party application, it may rely on the URI as a unique identifier and if this changes it may that all the EPrints are new as none of the URIs are the same as before.

For third party applications that integrate through the Bazaar (EThoS, PIRUS, Symplectic Repository Tools, etc.) no problems relating to this have been identified. However, if your repository has a bespoke third party application this may be affected and is something you should test beforehand if possible but as soon as you go live with the new configuration otherwise.

If you need to ensure your EPrint URIs do not change you can add the uri_url configuration option at the end of your archive's 10_core.pl configuration as follows:

$c->{uri_url} = "http://" . $c->{securehost};

OAI-PMH (e.g. http://example.eprints.org/cgi/oai2 and https://example.eprints.org/cgi/oai2) provide different relations (http or https) for a publication but the OAI identifier is protocol independent and therefore stays the same. Therefore, third party applications that make use of OAI-PMH should not be affected if they harvest as the protocol specifies.

Search Engine Indexing

It has been observed in the past that some items may briefly disappear from the Google search index when switching to HTTPS Only. There is no way to guarantee this will not happen. One way to try to mitigate and keep on top of this is to setup a Google Webmaster account and register your repository's hostname. After a couple of days this should get populated with all the pages indexed for your repositories, if there are any missing you can submit these to Google to be re-added.

IRStats2 Blip in Downloads

It has also been observed that repositories see a brief drop in downloads (and views) when switching to HTTPS Only. This may be partially due to search engine indexing but is most likely affected by the fact that bots and crawlers (including GoogleBot) will not follow redirects (i.e. from the HTTP URL they already had to the new HTTPS version) and therefore this will not count as a download. IRStats2 has multifarious ways of detecting bots but it is likely a large percentage of downloads will still be due to bots. Therefore, in some ways the blip may actually give a more accurate picture of the amount of downloads from your repository. However, looking a raw statistic is generally a bad idea, IRStats2 is intended to show usage trends and differences more than absolute downloads or views.