Difference between revisions of "Anatomy of a request"
(→EPrint IDs, Documents) |
m |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | {{AccessControl}} | |
This is a description of how EPrints and Apache handles an incoming request. | This is a description of how EPrints and Apache handles an incoming request. | ||
Line 6: | Line 6: | ||
I will assume that you know how to locate a perl module file from the module name (e.g. <code>EPrints::Apache::Rewrite</code> will probably be <code>~/perl_lib/EPrints/Apache/Rewrite.pm</code>, although this is not always the case!). | I will assume that you know how to locate a perl module file from the module name (e.g. <code>EPrints::Apache::Rewrite</code> will probably be <code>~/perl_lib/EPrints/Apache/Rewrite.pm</code>, although this is not always the case!). | ||
− | + | =Flow of a request= | |
Below are relevant parts of config files and perl modules that are used with when processing a request. | Below are relevant parts of config files and perl modules that are used with when processing a request. | ||
The request will generally be dealt with by the <code>EPrints::Apache::Rewrite</code> module, and farmed out from there. | The request will generally be dealt with by the <code>EPrints::Apache::Rewrite</code> module, and farmed out from there. | ||
− | + | How the request reaches this module is also explained below. | |
− | + | ==Apache core config <code>~/cfg/apache.conf</code>== | |
<source lang="apache"> | <source lang="apache"> | ||
Line 22: | Line 22: | ||
See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler | See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler | ||
− | + | ==Apache repository config <code>~/cfg/apache/ARCHIVEID.conf</code>== | |
<source lang="apache"> | <source lang="apache"> | ||
<VirtualHost *:80> | <VirtualHost *:80> | ||
Line 32: | Line 32: | ||
This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things! | This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things! | ||
− | + | ==<code>EPrints::Apache::Rewrite</code> module== | |
This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm | This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm | ||
It's worth taking a few minutes to look at this file - specifically the <code>sub handler</code>. | It's worth taking a few minutes to look at this file - specifically the <code>sub handler</code>. | ||
− | + | ===EP_TRIGGER_URL_REWRITE=== | |
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L123 Line 123] calls the 'EP_TRIGGER_URL_REWRITE' trigger: | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L123 Line 123] calls the 'EP_TRIGGER_URL_REWRITE' trigger: | ||
<source lang="perl"> | <source lang="perl"> | ||
Line 54: | Line 54: | ||
[[API:EPrints/Const#:trigger|Information on triggers]] | [[API:EPrints/Const#:trigger|Information on triggers]] | ||
− | + | ===CGI scripts=== | |
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L157 Line 157] deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L195-199 Lines 195-199]: | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L157 Line 157] deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L195-199 Lines 195-199]: | ||
*<code>~/archives/ARCHIVEID/cgi/</code> | *<code>~/archives/ARCHIVEID/cgi/</code> | ||
Line 71: | Line 71: | ||
</source> | </source> | ||
− | + | ===SWORD servicedocument=== | |
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L233-258 Lines 233-258] deal with the 'Sword' service document, via the CRUD interface. | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L233-258 Lines 233-258] deal with the 'Sword' service document, via the CRUD interface. | ||
− | + | ===REST interface=== | |
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L281-290 Lines 281-290] handle the REST interface, via <code>EPrints::Apache::REST</code> | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L281-290 Lines 281-290] handle the REST interface, via <code>EPrints::Apache::REST</code> | ||
− | + | ===EPrints URIs=== | |
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L292-374 Lines 292-374]: EPrint URIs are normally of the form <code><nowiki>http://repository.blah/id/...</nowiki></code>. There are three main <code>if</code> blocks in this section that use regex's to match the URI: | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L292-374 Lines 292-374]: EPrint URIs are normally of the form <code><nowiki>http://repository.blah/id/...</nowiki></code>. There are three main <code>if</code> blocks in this section that use regex's to match the URI: | ||
*Line 293 <code>$uri =~ m! ^$urlpath/id/(repository|dump)$ !x</code> matches two cases. | *Line 293 <code>$uri =~ m! ^$urlpath/id/(repository|dump)$ !x</code> matches two cases. | ||
Line 89: | Line 89: | ||
</source> | </source> | ||
− | + | ===EPrint IDs, Documents and EP_TRIGGER_DOC_URL_REWRITE=== | |
− | '' | + | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L377-493 Lines 377-493] This block of code looks for requests starting with e.g. <code><nowiki>http://repository.blah/123</nowiki></code> - where '123' is an EPrintID. |
− | |||
There are some <code>redir</code>ects in this block to account for older URL that may be requested that had EPrintIDs and/or document positions zero-padded <code><nowiki>http://repository.blah/00000123</nowiki></code> or <code><nowiki>http://repository.blah/00000123/01/Document.txt</nowiki></code>. | There are some <code>redir</code>ects in this block to account for older URL that may be requested that had EPrintIDs and/or document positions zero-padded <code><nowiki>http://repository.blah/00000123</nowiki></code> or <code><nowiki>http://repository.blah/00000123/01/Document.txt</nowiki></code>. | ||
Line 101: | Line 100: | ||
<source lang="perl">$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x</source> | <source lang="perl">$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x</source> | ||
will remove the EPrintID from the start of <code>$uri</code>. | will remove the EPrintID from the start of <code>$uri</code>. | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L398-467 Lines 398-467] deal with document requests. | ||
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L398 Line 398] | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L398 Line 398] | ||
Line 125: | Line 126: | ||
*for each of the split values (referenced as $_): grep for the length of the value. This effectively strips out empty elements (length = 0, grep doesn't return the value). | *for each of the split values (referenced as $_): grep for the length of the value. This effectively strips out empty elements (length = 0, grep doesn't return the value). | ||
*@relationships = <code>"relationship1","relationship2","relationshipN"</code> | *@relationships = <code>"relationship1","relationship2","relationshipN"</code> | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L421 Line 421] then assigns whatever is left of the URL as the filename. | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L423-426 Lines 423-426] use 'pnotes' (http://perl.apache.org/docs/2.0/api/Apache2/RequestUtil.html#C_pnotes_) to pass objects to the request handler. | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L433-436 Lines 433-436] call document specific authen and authz methods. These deal with the 'security' flag on the document, and are part of the core EPrints that needs to be addressed with ACL. | ||
+ | <source lang="perl"> | ||
+ | $r->push_handlers(PerlAccessHandler => [ | ||
+ | \&EPrints::Apache::Auth::authen_doc, | ||
+ | \&EPrints::Apache::Auth::authz_doc | ||
+ | ] ); | ||
+ | </source> | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L440 Line 440] add the document-download log handler to the request cleanup phase: | ||
+ | <source lang="perl">$r->pool->cleanup_register(\&EPrints::Apache::LogHandler::document, $r);</source> | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L443-459 Lines 443-458] call the EP_TRIGGER_DOC_URL_REWRITE triggers. | ||
+ | There is one default trigger for this - in <code>~/lib/cfg.d/doc_rewrite.pl</code> that handles the relationship-based document requests. | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L470-490 Lines 470-490] If the URL has an EPrint, but doesn't have a document, we reach this block - which updates the abstract page if necessary, and uses the EPrints templating system to render the page. It also logs the page view using <code>EPrints::Apache::LogHandler::eprint</code>. | ||
+ | |||
+ | ===Views=== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L504-518 Lines 504-518] handle 'view' requests - and regenerating view pages. | ||
+ | |||
+ | ===auto javascript and css=== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L519-543 Lines 519-543] deal with generating the 'auto' javascript and css files | ||
+ | |||
+ | ===Other static pages=== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L545-560 Lines 545-560] deals with anything else. | ||
+ | |||
+ | ===Other headers=== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L562-582 Lines 562-582] set a few headers for specific cases, and also register EPrints::Apache::Template as the handler for any .html files. | ||
+ | |||
+ | |||
+ | ==Notes== | ||
+ | *There are two 'content negotiation' functions - one in <code>EPrints::Apache::Rewrite</code>, and another in <code>EPrints::Apache::CRUD</code>. | ||
+ | *[[API:EPrints/Const#:trigger]] learn about triggers ;o) | ||
+ | * | ||
==TODO (some should be seperate pages)== | ==TODO (some should be seperate pages)== | ||
− | + | ||
**triggers | **triggers | ||
− | |||
− | |||
**CRUD | **CRUD | ||
**??? | **??? | ||
Line 138: | Line 175: | ||
*DOI - 5 metadata elements | *DOI - 5 metadata elements | ||
*40x handling | *40x handling | ||
− | |||
− | |||
− | |||
− |
Latest revision as of 21:11, 14 June 2016
Access Control Layer | ||
This is a description of how EPrints and Apache handles an incoming request. Understanding this flow helps understand how an Access Control layer can be added to the system.
I will assume that you know how to locate a perl module file from the module name (e.g. EPrints::Apache::Rewrite
will probably be ~/perl_lib/EPrints/Apache/Rewrite.pm
, although this is not always the case!).
Contents
Flow of a request
Below are relevant parts of config files and perl modules that are used with when processing a request.
The request will generally be dealt with by the EPrints::Apache::Rewrite
module, and farmed out from there.
How the request reaches this module is also explained below.
Apache core config ~/cfg/apache.conf
PerlSwitches -I/home/eprints/eprints-3.3.12/perl_lib
PerlModule EPrints
PerlPostConfigHandler +EPrints::post_config_handler
The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler
Apache repository config ~/cfg/apache/ARCHIVEID.conf
<VirtualHost *:80>
...
PerlTransHandler +EPrints::Apache::Rewrite
</VirtualHost>
This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!
EPrints::Apache::Rewrite
module
This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm
It's worth taking a few minutes to look at this file - specifically the sub handler
.
EP_TRIGGER_URL_REWRITE
Line 123 calls the 'EP_TRIGGER_URL_REWRITE' trigger:
$repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE,
request => $r,
lang => $lang, # en
args => $args, # "" or "?foo=bar"
urlpath => $urlpath, # "" or "/subdir"
cgipath => $cgipath, # /cgi or /subdir/cgi
uri => $uri, # /foo/bar
secure => $secure, # boolean
return_code => \$rc, # set to trigger a return
);
If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned. Information on triggers
CGI scripts
Line 157 deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations Lines 195-199:
~/archives/ARCHIVEID/cgi/
~/site_lib/cgi/
~/cgi/
If the cgi script is a 'user' script, it also defines a PerlAccessHandler
Lines 214-220
if( $uri =~ m! ^/users\b !x )
{
$r->push_handlers(PerlAccessHandler => [
\&EPrints::Apache::Auth::authen,
\&EPrints::Apache::Auth::authz
] );
}
SWORD servicedocument
Lines 233-258 deal with the 'Sword' service document, via the CRUD interface.
REST interface
Lines 281-290 handle the REST interface, via EPrints::Apache::REST
EPrints URIs
Lines 292-374: EPrint URIs are normally of the form http://repository.blah/id/...
. There are three main if
blocks in this section that use regex's to match the URI:
- Line 293
$uri =~ m! ^$urlpath/id/(repository|dump)$ !x
matches two cases. - Line 318
$uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x
matches ??? Some RDF type stuff!? 'event/ext-foo'..? - Lines 345-347 (shown on one line here)
$uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x
matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz:
$r->push_handlers(PerlAccessHandler => [
sub { $crud->authen },
sub { $crud->authz },
] );
EPrint IDs, Documents and EP_TRIGGER_DOC_URL_REWRITE
Lines 377-493 This block of code looks for requests starting with e.g. http://repository.blah/123
- where '123' is an EPrintID.
There are some redir
ects in this block to account for older URL that may be requested that had EPrintIDs and/or document positions zero-padded http://repository.blah/00000123
or http://repository.blah/00000123/01/Document.txt
.
Each subsequent match on the $uri
consumes part of it - e.g.
$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x
will remove the EPrintID from the start of $uri
.
Lines 398-467 deal with document requests.
$uri =~ s! ^/(0*)([1-9][0-9]*)\b !!x
will match elements after the EPrintID in the original URL - matching '45' in
http://repository.blah/123/45/Document.txt
orhttp://repository.blah/123/45.hassmallThumbnailVersion/Document.txt
(the second example shows the use of a 'relationship' that is processed using a EP_TRIGGER_DOC_URL_REWRITE trigger).
Lines 418-419 may be a bit confusing at first glance.
$uri =~ s! ^([^/]*)/ !!x;
my @relations = grep { length($_) } split /\./, $1;
They deal with document relationships - that are of the form .../DocID.relationship1.relationship2.relationshipN/...
.
For documents, thumbnails are presented as related documents, the relationship is e.g. 'hassmallThumbnailVersion'.
The first line gets anything from the start of $uri
(the DocID already having been removed by line 398), to the next '/'.
The second line (possibly the least readable line of code in EPrints?):
- takes the captured match ($1):
.relationship1.relationship2.relationshipN
- splits on '.'s:
"","relationship1","relationship2","relationshipN"
- for each of the split values (referenced as $_): grep for the length of the value. This effectively strips out empty elements (length = 0, grep doesn't return the value).
- @relationships =
"relationship1","relationship2","relationshipN"
Line 421 then assigns whatever is left of the URL as the filename.
Lines 423-426 use 'pnotes' (http://perl.apache.org/docs/2.0/api/Apache2/RequestUtil.html#C_pnotes_) to pass objects to the request handler.
Lines 433-436 call document specific authen and authz methods. These deal with the 'security' flag on the document, and are part of the core EPrints that needs to be addressed with ACL.
$r->push_handlers(PerlAccessHandler => [
\&EPrints::Apache::Auth::authen_doc,
\&EPrints::Apache::Auth::authz_doc
] );
Line 440 add the document-download log handler to the request cleanup phase:
$r->pool->cleanup_register(\&EPrints::Apache::LogHandler::document, $r);
Lines 443-458 call the EP_TRIGGER_DOC_URL_REWRITE triggers.
There is one default trigger for this - in ~/lib/cfg.d/doc_rewrite.pl
that handles the relationship-based document requests.
Lines 470-490 If the URL has an EPrint, but doesn't have a document, we reach this block - which updates the abstract page if necessary, and uses the EPrints templating system to render the page. It also logs the page view using EPrints::Apache::LogHandler::eprint
.
Views
Lines 504-518 handle 'view' requests - and regenerating view pages.
auto javascript and css
Lines 519-543 deal with generating the 'auto' javascript and css files
Other static pages
Lines 545-560 deals with anything else.
Other headers
Lines 562-582 set a few headers for specific cases, and also register EPrints::Apache::Template as the handler for any .html files.
Notes
- There are two 'content negotiation' functions - one in
EPrints::Apache::Rewrite
, and another inEPrints::Apache::CRUD
. - API:EPrints/Const#:trigger learn about triggers ;o)
TODO (some should be seperate pages)
- triggers
- CRUD
- ???
- permit on DataObj
- can_request_view / can_user_view
- summary pages (content neg/ URL rewrite)
- DOI - 5 metadata elements
- 40x handling