Difference between revisions of "Simplified HTTPS Configuration"
(Added header above config subheaders) |
m (typos corrected, formatting improved) |
||
Line 1: | Line 1: | ||
− | + | [[Category:Authentication]] | |
− | Trying to configure EPrints for HTTPS can be difficult and the way the code was previously written, even if you configured HTTPS correctly you could still have issues with mixed content pages, | + | ''' * * * YOU MUST USE EPRINTS 3.4.1++ FOR THE CONFIGURATION BELOW TO BE GUARANTEED TO WORK * * *''' |
+ | |||
+ | Trying to configure EPrints for HTTPS can be difficult and the way the code was previously written, even if you configured HTTPS correctly you could still have issues with mixed content pages, amongst other problems. In EPrints 3.4.1 the underlying code has been improved so that you can configure, ''host'', ''port'', ''securehost'', and ''secureport'' in your archive's <code>cfg/cfg.d/10_core.pl</code> in three different ways to get the behaviour. Other configuration options in this file should not need to be changed. | ||
== Configurations == | == Configurations == | ||
− | '''Make sure you remove or | + | '''Make sure you remove or disable your archive's <code>cfg/cfg.d/https.pl</code> if it exists as it may override the configuration below. Once you have updated your configuration you must run <code>generate_apacheconf</code> to regenerate configuration for Apache before restarting the web server.''' |
=== HTTP Only === | === HTTP Only === | ||
− | It is advised you avoid using this configuration unless you developing a repository on a non-publicly accessible web host. | + | It is advised you avoid using this configuration unless you are developing a repository on a non-publicly accessible web host. |
$c->{host} = 'example.eprints.org'; | $c->{host} = 'example.eprints.org'; | ||
$c->{port} = 80; | $c->{port} = 80; | ||
Line 14: | Line 16: | ||
=== HTTPS When You Login === | === HTTPS When You Login === | ||
− | This is the current default for EPrints. All publicly accessible pages will use HTTP | + | This is the current default for EPrints. All publicly accessible pages will use HTTP by default (but still be accessible over HTTPS if you modify the URL) and the login page and all login restricted pages will use HTTPS or be redirected from HTTP. |
$c->{host} = 'example.eprints.org'; | $c->{host} = 'example.eprints.org'; | ||
$c->{port} = 80; | $c->{port} = 80; | ||
Line 21: | Line 23: | ||
=== HTTPS Only === | === HTTPS Only === | ||
− | This | + | This ensures that no page (image, CSS, JavaScript file, etc.) will be returned over HTTP and if requested it will be redirected to HTTPS. |
− | You may also want to edit the archive's ssl/securevhost.conf to add the [ | + | You may also want to edit the archive's <code>ssl/securevhost.conf</code> to add the [[HTTPS-only_and_HSTS#Add_the_HSTS_header | HSTS header]]. |
$c->{host} = undef; | $c->{host} = undef; | ||
$c->{port} = 80; | $c->{port} = 80; | ||
Line 30: | Line 32: | ||
== Issues and Troubleshooting == | == Issues and Troubleshooting == | ||
− | Inevitably you may still encounter issues even if you use one of the | + | Inevitably you may still encounter issues even if you use one of the configurations above, so it is advised you test this on a development or pre-production instance of your repository to check you get the behaviour you expect. |
=== EPrint URI Change === | === EPrint URI Change === | ||
− | When an EPrint made live it will acquire a URI in the form | + | When an EPrint made live it will acquire a URI in the form |
− | http://example.eprints.org/eprint/id/1234 | + | <nowiki>http://example.eprints.org/eprint/id/1234</nowiki> |
− | If you switch over to | + | If you switch over to [[Simplified_HTTPS_Configuration#HTTPS_Only | HTTPS Only]] the above URI will be updated (if you refresh abstracts) to |
− | https://example.eprints.org/eprint/id/1234 | + | <nowiki>https://example.eprints.org/eprint/id/1234</nowiki> |
− | For most repositories this will not be an issue but if your repository is harvested by a third party application, it may rely on the URI as a unique identifier and if this | + | For most repositories this will not be an issue but if your repository is harvested by a third party application, it may rely on the URI as a unique identifier and if this changes it may that all the EPrints are new as none of the URIs are the same as before. |
− | For third party applications that integrate through the Bazaar (EThoS, PIRUS, Symplectic Repository Tools, etc.) no problems relating to this have been identified. However, if | + | For third party applications that integrate through the [http://bazaar.eprints.org Bazaar] (EThoS, PIRUS, Symplectic Repository Tools, etc.) no problems relating to this have been identified. However, if your repository has a bespoke third party application this may be affected and is something you should test beforehand if possible but as soon as you go live with the new configuration otherwise. |
− | If you need to ensure your EPrint URIs do not change you can add the | + | If you need to ensure your EPrint URIs do not change you can add the <code>uri_url</code> configuration option at the end of your archive's <code>10_core.pl</code> configuration as follows: |
$c->{uri_url} = "http://" . $c->{securehost}; | $c->{uri_url} = "http://" . $c->{securehost}; | ||
− | OAI-PMH (e.g. http://example.eprints.org/cgi/oai2 and https://example.eprints.org/cgi/oai2) provide different relations (http or https) for a publication but the OAI identifier is protocol independent and therefore stays the same. Therefore, third party applications that make use of OAI-PMH should not be affected if they harvest as the protocol specifies. | + | OAI-PMH (e.g. <nowiki>http://example.eprints.org/cgi/oai2</nowiki> and <nowiki>https://example.eprints.org/cgi/oai2</nowiki>) provide different relations (http or https) for a publication but the OAI identifier is protocol independent and therefore stays the same. Therefore, third party applications that make use of OAI-PMH should not be affected if they harvest as the protocol specifies. |
=== Search Engine Indexing === | === Search Engine Indexing === | ||
− | It has been observed in the past that some items may briefly disappear from the Google search index when switching to | + | It has been observed in the past that some items may briefly disappear from the Google search index when switching to [[Simplified_HTTPS_Configuration#HTTPS_Only | HTTPS Only]]. There is no way to guarantee this will not happen. One way to try to mitigate and keep on top of this is to setup a Google Webmaster account and register your repository's hostname. After a couple of days this should get populated with all the pages indexed for your repositories, if there are any missing you can submit these to Google to be re-added. |
=== IRStats2 Blip in Downloads === | === IRStats2 Blip in Downloads === | ||
− | It has also been observed that repositories see a brief drop in downloads (and views) when switching to | + | It has also been observed that repositories see a brief drop in downloads (and views) when switching to [[Simplified_HTTPS_Configuration#HTTPS_Only | HTTPS Only]]. This may be partially due to search engine indexing but is most likely affected by the fact that bots and crawlers (including GoogleBot) will not follow redirects (i.e. from the HTTP URL they already had to the new HTTPS version) and therefore this will not count as a download. [https://eprints.github.io/irstats2/ IRStats2] has multifarious ways of detecting bots but it is likely a large percentage of downloads will still be due to bots. Therefore, in some ways the blip may actually give a more accurate picture of the amount of downloads from your repository. However, looking a raw statistic is generally a bad idea, [https://eprints.github.io/irstats2/ IRStats2] is intended to show usage trends and differences more than absolute downloads or views. |
Revision as of 13:55, 6 August 2019
* * * YOU MUST USE EPRINTS 3.4.1++ FOR THE CONFIGURATION BELOW TO BE GUARANTEED TO WORK * * *
Trying to configure EPrints for HTTPS can be difficult and the way the code was previously written, even if you configured HTTPS correctly you could still have issues with mixed content pages, amongst other problems. In EPrints 3.4.1 the underlying code has been improved so that you can configure, host, port, securehost, and secureport in your archive's cfg/cfg.d/10_core.pl
in three different ways to get the behaviour. Other configuration options in this file should not need to be changed.
Contents
Configurations
Make sure you remove or disable your archive's cfg/cfg.d/https.pl
if it exists as it may override the configuration below. Once you have updated your configuration you must run generate_apacheconf
to regenerate configuration for Apache before restarting the web server.
HTTP Only
It is advised you avoid using this configuration unless you are developing a repository on a non-publicly accessible web host.
$c->{host} = 'example.eprints.org'; $c->{port} = 80; $c->{securehost} = undef; $c->{secureport} = undef;
HTTPS When You Login
This is the current default for EPrints. All publicly accessible pages will use HTTP by default (but still be accessible over HTTPS if you modify the URL) and the login page and all login restricted pages will use HTTPS or be redirected from HTTP.
$c->{host} = 'example.eprints.org'; $c->{port} = 80; $c->{securehost} = $c->{host}; $c->{secureport} = 443;
HTTPS Only
This ensures that no page (image, CSS, JavaScript file, etc.) will be returned over HTTP and if requested it will be redirected to HTTPS.
You may also want to edit the archive's ssl/securevhost.conf
to add the HSTS header.
$c->{host} = undef; $c->{port} = 80; $c->{securehost} = 'example.eprints.org'; $c->{secureport} = 443;
Issues and Troubleshooting
Inevitably you may still encounter issues even if you use one of the configurations above, so it is advised you test this on a development or pre-production instance of your repository to check you get the behaviour you expect.
EPrint URI Change
When an EPrint made live it will acquire a URI in the form
http://example.eprints.org/eprint/id/1234
If you switch over to HTTPS Only the above URI will be updated (if you refresh abstracts) to
https://example.eprints.org/eprint/id/1234
For most repositories this will not be an issue but if your repository is harvested by a third party application, it may rely on the URI as a unique identifier and if this changes it may that all the EPrints are new as none of the URIs are the same as before.
For third party applications that integrate through the Bazaar (EThoS, PIRUS, Symplectic Repository Tools, etc.) no problems relating to this have been identified. However, if your repository has a bespoke third party application this may be affected and is something you should test beforehand if possible but as soon as you go live with the new configuration otherwise.
If you need to ensure your EPrint URIs do not change you can add the uri_url
configuration option at the end of your archive's 10_core.pl
configuration as follows:
$c->{uri_url} = "http://" . $c->{securehost};
OAI-PMH (e.g. http://example.eprints.org/cgi/oai2 and https://example.eprints.org/cgi/oai2) provide different relations (http or https) for a publication but the OAI identifier is protocol independent and therefore stays the same. Therefore, third party applications that make use of OAI-PMH should not be affected if they harvest as the protocol specifies.
Search Engine Indexing
It has been observed in the past that some items may briefly disappear from the Google search index when switching to HTTPS Only. There is no way to guarantee this will not happen. One way to try to mitigate and keep on top of this is to setup a Google Webmaster account and register your repository's hostname. After a couple of days this should get populated with all the pages indexed for your repositories, if there are any missing you can submit these to Google to be re-added.
IRStats2 Blip in Downloads
It has also been observed that repositories see a brief drop in downloads (and views) when switching to HTTPS Only. This may be partially due to search engine indexing but is most likely affected by the fact that bots and crawlers (including GoogleBot) will not follow redirects (i.e. from the HTTP URL they already had to the new HTTPS version) and therefore this will not count as a download. IRStats2 has multifarious ways of detecting bots but it is likely a large percentage of downloads will still be due to bots. Therefore, in some ways the blip may actually give a more accurate picture of the amount of downloads from your repository. However, looking a raw statistic is generally a bad idea, IRStats2 is intended to show usage trends and differences more than absolute downloads or views.