Anatomy of a request

From EPrints Documentation
Revision as of 16:12, 23 October 2014 by Libjlrs (talk | contribs) (Working through Rewrite module)
Jump to: navigation, search

THIS PAGE IS UNDER CONSTRUCTION! JLRS 2014-10-23

This is a description of how EPrints and Apache handles an incoming request. Understanding this flow helps understand how an Access Control layer can be added to the system.

I will assume that you know how to locate a perl module file from the module name (e.g. EPrints::Apache::Rewrite will probably be ~/perl_lib/EPrints/Apache/Rewrite.pm, although this is not always the case!).

Flow of a request

Below are relevant parts of config files and perl modules that are used with when processing a request. The request will generally be dealt with by the EPrints::Apache::Rewrite module, and farmed out from there. Hoe the request reaches this module is also explained below.

Apache core config ~/cfg/apache.conf

PerlSwitches -I/home/eprints/eprints-3.3.12/perl_lib
PerlModule EPrints
PerlPostConfigHandler +EPrints::post_config_handler

The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler

Apache repository config ~/cfg/apache/ARCHIVEID.conf

<VirtualHost *:80>
...
  PerlTransHandler +EPrints::Apache::Rewrite

</VirtualHost>

This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!

EPrints::Apache::Rewrite module

This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm

It's worth taking a few minutes to look at this file - specifically the sub handler.

EP_TRIGGER_URL_REWRITE

Line 123 calls the 'EP_TRIGGER_URL_REWRITE' trigger:

$repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE,
	request => $r,
	   lang => $lang,    # en
	   args => $args,    # "" or "?foo=bar"
	urlpath => $urlpath, # "" or "/subdir"
	cgipath => $cgipath, # /cgi or /subdir/cgi
	    uri => $uri,     # /foo/bar
	 secure => $secure,  # boolean
    return_code => \$rc,     # set to trigger a return
);

If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned. Information on triggers

CGI scripts

Line 157 deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations Lines 195-199:

  • ~/archives/ARCHIVEID/cgi/
  • ~/site_lib/cgi/
  • ~/cgi/

If the cgi script is a 'user' script, it also defines a PerlAccessHandler Lines 214-220

if( $uri =~ m! ^/users\b !x )
{
	$r->push_handlers(PerlAccessHandler => [
		\&EPrints::Apache::Auth::authen,
		\&EPrints::Apache::Auth::authz
	] );
}

SWORD servicedocument

Lines 233-258 deal with the 'Sword' service document, via the CRUD interface.

REST interface

Lines 281-290 handle the REST interface, via EPrints::Apache::REST

EPrints URIs

Lines 292-374: EPrint URIs are normally of the form http://repository.blah/id/.... There are three main if blocks in this section that use regex's to match the URI:

  • Line 293 $uri =~ m! ^$urlpath/id/(repository|dump)$ !x matches two cases.
  • Line 318 $uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x matches ??? Some RDF type stuff!? 'event/ext-foo'..?
  • Lines 345-347 (shown on one line here) $uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz:
$r->push_handlers(PerlAccessHandler => [
	sub { $crud->authen },
	sub { $crud->authz },
] );

EPrint IDs, Documents

Under construction!

Lines 377-493 This block of code looks for a request for http://repository.blah/123 - '123' is EPrint ID 123.

$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x

TODO (some should be seperate pages)

  • Explain flow of Rewrite
    • triggers
    • cgi
    • content negotiation
    • CRUD
    • ???
  • permit on DataObj
    • can_request_view / can_user_view
  • summary pages (content neg/ URL rewrite)
  • DOI - 5 metadata elements
  • 40x handling
Access Control Layer