Simplified HTTPS Configuration

From EPrints Documentation
Jump to: navigation, search

* * * YOU MUST USE EPRINTS 3.4.1 OR GREATER FOR THE CONFIGURATION BELOW TO BE GUARANTEED TO WORK * * *

Trying to configure EPrints for HTTPS can be difficult and the way the code was previously written, even if you configured HTTPS correctly you could still have issues with mixed content pages, amongsot other problems. In EPrints 3.4.1+ the underlying code has been improved so that you can configure, host, port, securehost, and secureport in your archive's cfg/cfg.d/10_core.pl in three different ways to get the behaviour. Other configuration options in this file should not need to be changed.

Make sure you remove or disabled your archive's cfg/cfg.d/https.pl if it exists as it may override these configuration below. Once you have updated your configuration you must run generate_apacheconf to regenerate configuration for Apache before restarting the web server.

HTTP Only

It is advised you avoid using this configuration unless you developing a repository on a non-publicly accessible web host.

$c->{host} = 'example.eprints.org';
$c->{port} = 80;
$c->{securehost} = undef;
$c->{secureport} = undef;

HTTPS When You Login

This is the current default for EPrints. All publicly accessible pages will use HTTP be default (but still be accessible over HTTPS if you modify the URL) and the login page and all login restricted pages will use HTTPS or be redirected from HTTP.

$c->{host} = 'example.eprints.org';
$c->{port} = 80;
$c->{securehost} = $c->{host};
$c->{secureport} = 443;

HTTPS Only

This ensure that now page (image, CSS, JavaScript file, etc.) will be return over HTTP and if requested it will redirected to HTTPS.

You may also want to edit the archive's ssl/securevhost.conf to add the HSTS header.

$c->{host} = undef;
$c->{port} = 80;
$c->{securehost} = 'example.eprints.org';
$c->{secureport} = 443;

Troubleshooting

Inevitably you may still encounter issues even if you use one of the configuration above, so it is advised you test this on a development or pre-production instance of your repository to check you get the behaviour you expect.

EPrint URI Change

When an EPrint made live it will acquire a URI in the form:

http://example.eprints.org/eprint/id/1234

If you switch over to HTTPS Only the abive URI will be updated (if you refresh abstracts) to:

https://example.eprints.org/eprint/id/1234

For most repositories this will not be an issue but if your repository is harvested by a third party application, it may rely on the URI as a unique identifier and if this change it may this all the EPrints are new as none of the URIs are the same as before.

For third party applications that integrate through the Bazaar (EThoS, PIRUS, Symplectic Repository Tools, etc.) no problems relating to this have been identified. However, if you repository has a bespoke third party application this may be affected and is something you should test beforehand if possible but as soon as you go live with the new configuration otherwise.

If you need to ensure your EPrint URIs do not change you can add the uri_url configuration option at the end of your archive's 10_core.pl configuration as follows:

$c->{uri_url} = "http://" . $c->{securehost};

OAI-PMH (e.g. http://example.eprints.org/cgi/oai2 and https://example.eprints.org/cgi/oai2) provide different relations (http or https) for a publication but the OAI identifier is protocol independent and therefore stays the same. Therefore, third party applications that make use of OAI-PMH should not be affected if they harvest as the protocol specifies.

Search Engine Indexing

It has been observed in the past that some items may briefly disappear from the Google search index when switching to HTTPS Only. There is no way to guarantee this will not happen. One way to try to mitigate and keep on top of this is to setup a Google Webmaster account and register your repository's hostname. After a couple of days this should get populated with all the pages indexed for your repositories, if there are any missing you can submit these to Google to be re-added.

IRStats2 Blip in Downloads

It has also been observed that repositories see a brief drop in downloads (and views) when switching to HTTPS Only. This may be partially due to search engine indexing but is most likely affected by the fact that bots and crawlers (including GoogleBot) will not follow redirects (i.e. from the HTTP URL they already had to the new HTTPS version) and therefore this will not count as a download. IRStats2 has multifarious ways of detecting bots but it is likely a large percentage of downloads will still be due to bots. Therefore, in some ways the blip may actually give a more accurate pictire of the amount of downloads from your repository. However, looking a raw statistic is generally a bad idea, IRStats3 is intended to show usage trends and differences more than absolute downloads or views.