Difference between revisions of "Anatomy of a request"
(Working through Rewrite module) |
|||
Line 1: | Line 1: | ||
− | This is a description of how EPrints and Apache | + | '''THIS PAGE IS UNDER CONSTRUCTION! JLRS 2014-10-23''' |
+ | |||
+ | This is a description of how EPrints and Apache handles an incoming request. | ||
Understanding this flow helps understand how an Access Control layer can be added to the system. | Understanding this flow helps understand how an Access Control layer can be added to the system. | ||
I will assume that you know how to locate a perl module file from the module name (e.g. <code>EPrints::Apache::Rewrite</code> will probably be <code>~/perl_lib/EPrints/Apache/Rewrite.pm</code>, although this is not always the case!). | I will assume that you know how to locate a perl module file from the module name (e.g. <code>EPrints::Apache::Rewrite</code> will probably be <code>~/perl_lib/EPrints/Apache/Rewrite.pm</code>, although this is not always the case!). | ||
− | Below are relevant parts of config files and perl modules that are used with when processing a request: | + | ==Flow of a request== |
− | + | Below are relevant parts of config files and perl modules that are used with when processing a request. | |
+ | The request will generally be dealt with by the <code>EPrints::Apache::Rewrite</code> module, and farmed out from there. | ||
+ | Hoe the request reaches this module is also explained below. | ||
+ | |||
+ | ===Apache core config <code>~/cfg/apache.conf</code>=== | ||
<source lang="apache"> | <source lang="apache"> | ||
Line 15: | Line 21: | ||
The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) | The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) | ||
See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler | See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler | ||
− | + | ||
+ | ===Apache repository config <code>~/cfg/apache/ARCHIVEID.conf</code>=== | ||
<source lang="apache"> | <source lang="apache"> | ||
<VirtualHost *:80> | <VirtualHost *:80> | ||
Line 25: | Line 32: | ||
This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things! | This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things! | ||
− | *<code>EPrints::Apache::Rewrite</code> | + | ===<code>EPrints::Apache::Rewrite</code> module=== |
+ | This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm | ||
+ | |||
+ | It's worth taking a few minutes to look at this file - specifically the <code>sub handler</code>. | ||
+ | |||
+ | ====EP_TRIGGER_URL_REWRITE==== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L123 Line 123] calls the 'EP_TRIGGER_URL_REWRITE' trigger: | ||
+ | <source lang="perl"> | ||
+ | $repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE, | ||
+ | request => $r, | ||
+ | lang => $lang, # en | ||
+ | args => $args, # "" or "?foo=bar" | ||
+ | urlpath => $urlpath, # "" or "/subdir" | ||
+ | cgipath => $cgipath, # /cgi or /subdir/cgi | ||
+ | uri => $uri, # /foo/bar | ||
+ | secure => $secure, # boolean | ||
+ | return_code => \$rc, # set to trigger a return | ||
+ | ); | ||
+ | </source> | ||
+ | If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned. | ||
+ | [[API:EPrints/Const#:trigger|Information on triggers]] | ||
+ | |||
+ | ====CGI scripts==== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L157 Line 157] deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L195-199 Lines 195-199]: | ||
+ | *<code>~/archives/ARCHIVEID/cgi/</code> | ||
+ | *<code>~/site_lib/cgi/</code> | ||
+ | *<code>~/cgi/</code> | ||
+ | |||
+ | If the cgi script is a 'user' script, it also defines a <code>PerlAccessHandler</code> [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L214-220 Lines 214-220] | ||
+ | <source lang="perl"> | ||
+ | if( $uri =~ m! ^/users\b !x ) | ||
+ | { | ||
+ | $r->push_handlers(PerlAccessHandler => [ | ||
+ | \&EPrints::Apache::Auth::authen, | ||
+ | \&EPrints::Apache::Auth::authz | ||
+ | ] ); | ||
+ | } | ||
+ | </source> | ||
+ | |||
+ | ====SWORD servicedocument==== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L233-258 Lines 233-258] deal with the 'Sword' service document, via the CRUD interface. | ||
+ | |||
+ | ====REST interface==== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L281-290 Lines 281-290] handle the REST interface, via <code>EPrints::Apache::REST</code> | ||
+ | |||
+ | ====EPrints URIs==== | ||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L292-374 Lines 292-374]: EPrint URIs are normally of the form <code><nowiki>http://repository.blah/id/...</nowiki></code>. There are three main <code>if</code> blocks in this section that use regex's to match the URI: | ||
+ | *Line 293 <code>$uri =~ m! ^$urlpath/id/(repository|dump)$ !x</code> matches two cases. | ||
+ | *Line 318 <code>$uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x</code> matches ??? Some RDF type stuff!? 'event/ext-foo'..? | ||
+ | *Lines 345-347 (shown on one line here) <code>$uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x</code> matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz: | ||
+ | <source lang="perl"> | ||
+ | $r->push_handlers(PerlAccessHandler => [ | ||
+ | sub { $crud->authen }, | ||
+ | sub { $crud->authz }, | ||
+ | ] ); | ||
+ | </source> | ||
+ | |||
+ | ====EPrint IDs, Documents ==== | ||
+ | |||
+ | '''Under construction!''' | ||
+ | |||
+ | [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L377-493 Lines 377-493] This block of code looks for a request for <code><nowiki>http://repository.blah/123</nowiki></code> - '123' is EPrint ID 123. | ||
+ | |||
+ | $uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x | ||
==TODO (some should be seperate pages)== | ==TODO (some should be seperate pages)== |
Revision as of 16:12, 23 October 2014
THIS PAGE IS UNDER CONSTRUCTION! JLRS 2014-10-23
This is a description of how EPrints and Apache handles an incoming request. Understanding this flow helps understand how an Access Control layer can be added to the system.
I will assume that you know how to locate a perl module file from the module name (e.g. EPrints::Apache::Rewrite
will probably be ~/perl_lib/EPrints/Apache/Rewrite.pm
, although this is not always the case!).
Flow of a request
Below are relevant parts of config files and perl modules that are used with when processing a request.
The request will generally be dealt with by the EPrints::Apache::Rewrite
module, and farmed out from there.
Hoe the request reaches this module is also explained below.
Apache core config ~/cfg/apache.conf
PerlSwitches -I/home/eprints/eprints-3.3.12/perl_lib
PerlModule EPrints
PerlPostConfigHandler +EPrints::post_config_handler
The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler
Apache repository config ~/cfg/apache/ARCHIVEID.conf
<VirtualHost *:80>
...
PerlTransHandler +EPrints::Apache::Rewrite
</VirtualHost>
This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!
EPrints::Apache::Rewrite
module
This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm
It's worth taking a few minutes to look at this file - specifically the sub handler
.
EP_TRIGGER_URL_REWRITE
Line 123 calls the 'EP_TRIGGER_URL_REWRITE' trigger:
$repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE,
request => $r,
lang => $lang, # en
args => $args, # "" or "?foo=bar"
urlpath => $urlpath, # "" or "/subdir"
cgipath => $cgipath, # /cgi or /subdir/cgi
uri => $uri, # /foo/bar
secure => $secure, # boolean
return_code => \$rc, # set to trigger a return
);
If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned. Information on triggers
CGI scripts
Line 157 deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations Lines 195-199:
~/archives/ARCHIVEID/cgi/
~/site_lib/cgi/
~/cgi/
If the cgi script is a 'user' script, it also defines a PerlAccessHandler
Lines 214-220
if( $uri =~ m! ^/users\b !x )
{
$r->push_handlers(PerlAccessHandler => [
\&EPrints::Apache::Auth::authen,
\&EPrints::Apache::Auth::authz
] );
}
SWORD servicedocument
Lines 233-258 deal with the 'Sword' service document, via the CRUD interface.
REST interface
Lines 281-290 handle the REST interface, via EPrints::Apache::REST
EPrints URIs
Lines 292-374: EPrint URIs are normally of the form http://repository.blah/id/...
. There are three main if
blocks in this section that use regex's to match the URI:
- Line 293
$uri =~ m! ^$urlpath/id/(repository|dump)$ !x
matches two cases. - Line 318
$uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x
matches ??? Some RDF type stuff!? 'event/ext-foo'..? - Lines 345-347 (shown on one line here)
$uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x
matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz:
$r->push_handlers(PerlAccessHandler => [
sub { $crud->authen },
sub { $crud->authz },
] );
EPrint IDs, Documents
Under construction!
Lines 377-493 This block of code looks for a request for http://repository.blah/123
- '123' is EPrint ID 123.
$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x
TODO (some should be seperate pages)
- Explain flow of Rewrite
- triggers
- cgi
- content negotiation
- CRUD
- ???
- permit on DataObj
- can_request_view / can_user_view
- summary pages (content neg/ URL rewrite)
- DOI - 5 metadata elements
- 40x handling
Access Control Layer | ||