Anatomy of a request
THIS PAGE IS UNDER CONSTRUCTION! JLRS 2014-10-23
This is a description of how EPrints and Apache handles an incoming request. Understanding this flow helps understand how an Access Control layer can be added to the system.
I will assume that you know how to locate a perl module file from the module name (e.g. EPrints::Apache::Rewrite
will probably be ~/perl_lib/EPrints/Apache/Rewrite.pm
, although this is not always the case!).
Flow of a request
Below are relevant parts of config files and perl modules that are used with when processing a request.
The request will generally be dealt with by the EPrints::Apache::Rewrite
module, and farmed out from there.
Hoe the request reaches this module is also explained below.
Apache core config ~/cfg/apache.conf
PerlSwitches -I/home/eprints/eprints-3.3.12/perl_lib
PerlModule EPrints
PerlPostConfigHandler +EPrints::post_config_handler
The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler
Apache repository config ~/cfg/apache/ARCHIVEID.conf
<VirtualHost *:80>
...
PerlTransHandler +EPrints::Apache::Rewrite
</VirtualHost>
This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!
EPrints::Apache::Rewrite
module
This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm
It's worth taking a few minutes to look at this file - specifically the sub handler
.
EP_TRIGGER_URL_REWRITE
Line 123 calls the 'EP_TRIGGER_URL_REWRITE' trigger:
$repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE,
request => $r,
lang => $lang, # en
args => $args, # "" or "?foo=bar"
urlpath => $urlpath, # "" or "/subdir"
cgipath => $cgipath, # /cgi or /subdir/cgi
uri => $uri, # /foo/bar
secure => $secure, # boolean
return_code => \$rc, # set to trigger a return
);
If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned. Information on triggers
CGI scripts
Line 157 deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations Lines 195-199:
~/archives/ARCHIVEID/cgi/
~/site_lib/cgi/
~/cgi/
If the cgi script is a 'user' script, it also defines a PerlAccessHandler
Lines 214-220
if( $uri =~ m! ^/users\b !x )
{
$r->push_handlers(PerlAccessHandler => [
\&EPrints::Apache::Auth::authen,
\&EPrints::Apache::Auth::authz
] );
}
SWORD servicedocument
Lines 233-258 deal with the 'Sword' service document, via the CRUD interface.
REST interface
Lines 281-290 handle the REST interface, via EPrints::Apache::REST
EPrints URIs
Lines 292-374: EPrint URIs are normally of the form http://repository.blah/id/...
. There are three main if
blocks in this section that use regex's to match the URI:
- Line 293
$uri =~ m! ^$urlpath/id/(repository|dump)$ !x
matches two cases. - Line 318
$uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x
matches ??? Some RDF type stuff!? 'event/ext-foo'..? - Lines 345-347 (shown on one line here)
$uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x
matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz:
$r->push_handlers(PerlAccessHandler => [
sub { $crud->authen },
sub { $crud->authz },
] );
EPrint IDs, Documents
Under construction!
Lines 377-493 This block of code looks for requests starting with e.g. http://repository.blah/123
- where '123' is an EPrintID.
There are some redir
ects in this block to account for older URL that may be requested that had EPrintIDs and/or document positions zero-padded http://repository.blah/00000123
or http://repository.blah/00000123/01/Document.txt
.
Each subsequent match on the $uri
consumes part of it - e.g.
$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x
will remove the EPrintID from the start of $uri
.
$uri =~ s! ^/(0*)([1-9][0-9]*)\b !!x
will match elements after the EPrintID in the original URL - matching '45' in
http://repository.blah/123/45/Document.txt
orhttp://repository.blah/123/45.hassmallThumbnailVersion/Document.txt
(the second example shows the use of a 'relationship' that is processed using a EP_TRIGGER_DOC_URL_REWRITE trigger).
Lines 418-419 may be a bit confusing at first glance.
$uri =~ s! ^([^/]*)/ !!x;
my @relations = grep { length($_) } split /\./, $1;
They deal with document relationships - that are of the form .../DocID.relationship1.relationship2.relationshipN/...
.
For documents, thumbnails are presented as related documents, the relationship is e.g. 'hassmallThumbnailVersion'.
The first line gets anything from the start of $uri
(the DocID already having been removed by line 398), to the next '/'.
The second line (possibly the least readable line of code in EPrints?):
- takes the captured match ($1):
.relationship1.relationship2.relationshipN
- splits on '.'s:
"","relationship1","relationship2","relationshipN"
- for each of the split values (referenced as $_): grep for the length of the value. This effectively strips out empty elements (length = 0, grep doesn't return the value).
- @relationships =
"relationship1","relationship2","relationshipN"
TODO (some should be seperate pages)
- Explain flow of Rewrite
- triggers
- cgi
- content negotiation
- CRUD
- ???
- permit on DataObj
- can_request_view / can_user_view
- summary pages (content neg/ URL rewrite)
- DOI - 5 metadata elements
- 40x handling
Access Control Layer | ||