Difference between revisions of "Anatomy of a request"

From EPrints Documentation
Jump to: navigation, search
(Working through Rewrite module)
Line 1: Line 1:
This is a description of how EPrints and Apache handle an incoming request.
+
'''THIS PAGE IS UNDER CONSTRUCTION! JLRS 2014-10-23'''
 +
 
 +
This is a description of how EPrints and Apache handles an incoming request.
 
Understanding this flow helps understand how an Access Control layer can be added to the system.
 
Understanding this flow helps understand how an Access Control layer can be added to the system.
  
 
I will assume that you know how to locate a perl module file from the module name (e.g. <code>EPrints::Apache::Rewrite</code> will probably be <code>~/perl_lib/EPrints/Apache/Rewrite.pm</code>, although this is not always the case!).
 
I will assume that you know how to locate a perl module file from the module name (e.g. <code>EPrints::Apache::Rewrite</code> will probably be <code>~/perl_lib/EPrints/Apache/Rewrite.pm</code>, although this is not always the case!).
  
Below are relevant parts of config files and perl modules that are used with when processing a request:
+
==Flow of a request==
*Apache core config <code>~/cfg/apache.conf</code>
+
Below are relevant parts of config files and perl modules that are used with when processing a request.
 +
The request will generally be dealt with by the <code>EPrints::Apache::Rewrite</code> module, and farmed out from there.
 +
Hoe the request reaches this module is also explained below.
 +
 
 +
===Apache core config <code>~/cfg/apache.conf</code>===
 
<source lang="apache">
 
<source lang="apache">
  
Line 15: Line 21:
 
The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under)
 
The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under)
 
See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler
 
See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler
*Apache repository config <code>~/cfg/apache/ARCHIVEID.conf</code>
+
 
 +
===Apache repository config <code>~/cfg/apache/ARCHIVEID.conf</code>===
 
<source lang="apache">
 
<source lang="apache">
 
<VirtualHost *:80>
 
<VirtualHost *:80>
Line 25: Line 32:
  
 
This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!
 
This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!
*<code>EPrints::Apache::Rewrite</code> module
+
===<code>EPrints::Apache::Rewrite</code> module===
 +
This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm
 +
 
 +
It's worth taking a few minutes to look at this file - specifically the <code>sub handler</code>.
 +
 
 +
====EP_TRIGGER_URL_REWRITE====
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L123 Line 123] calls the 'EP_TRIGGER_URL_REWRITE' trigger:
 +
<source lang="perl">
 +
$repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE,
 +
request => $r,
 +
  lang => $lang,    # en
 +
  args => $args,    # "" or "?foo=bar"
 +
urlpath => $urlpath, # "" or "/subdir"
 +
cgipath => $cgipath, # /cgi or /subdir/cgi
 +
    uri => $uri,    # /foo/bar
 +
secure => $secure,  # boolean
 +
    return_code => \$rc,    # set to trigger a return
 +
);
 +
</source>
 +
If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned.
 +
[[API:EPrints/Const#:trigger|Information on triggers]]
 +
 
 +
====CGI scripts====
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L157 Line 157] deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L195-199 Lines 195-199]:
 +
*<code>~/archives/ARCHIVEID/cgi/</code>
 +
*<code>~/site_lib/cgi/</code>
 +
*<code>~/cgi/</code>
 +
 
 +
If the cgi script is a 'user' script, it also defines a <code>PerlAccessHandler</code> [https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L214-220 Lines 214-220]
 +
<source lang="perl">
 +
if( $uri =~ m! ^/users\b !x )
 +
{
 +
$r->push_handlers(PerlAccessHandler => [
 +
\&EPrints::Apache::Auth::authen,
 +
\&EPrints::Apache::Auth::authz
 +
] );
 +
}
 +
</source>
 +
 
 +
====SWORD servicedocument====
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L233-258 Lines 233-258] deal with the 'Sword' service document, via the CRUD interface.
 +
 
 +
====REST interface====
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L281-290 Lines 281-290] handle the REST interface, via <code>EPrints::Apache::REST</code>
 +
 
 +
====EPrints URIs====
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L292-374 Lines 292-374]: EPrint URIs are normally of the form <code><nowiki>http://repository.blah/id/...</nowiki></code>. There are three main <code>if</code> blocks in this section that use regex's to match the URI:
 +
*Line 293 <code>$uri =~ m! ^$urlpath/id/(repository|dump)$ !x</code> matches two cases.
 +
*Line 318 <code>$uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x</code> matches ??? Some RDF type stuff!? 'event/ext-foo'..?
 +
*Lines 345-347 (shown on one line here) <code>$uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x</code> matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz:
 +
<source lang="perl">
 +
$r->push_handlers(PerlAccessHandler => [
 +
sub { $crud->authen },
 +
sub { $crud->authz },
 +
] );
 +
</source>
 +
 
 +
====EPrint IDs, Documents ====
 +
 
 +
'''Under construction!'''
 +
 
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L377-493 Lines 377-493] This block of code looks for a request for <code><nowiki>http://repository.blah/123</nowiki></code> - '123' is EPrint ID 123.
 +
 
 +
$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x
  
 
==TODO (some should be seperate pages)==  
 
==TODO (some should be seperate pages)==  

Revision as of 16:12, 23 October 2014

THIS PAGE IS UNDER CONSTRUCTION! JLRS 2014-10-23

This is a description of how EPrints and Apache handles an incoming request. Understanding this flow helps understand how an Access Control layer can be added to the system.

I will assume that you know how to locate a perl module file from the module name (e.g. EPrints::Apache::Rewrite will probably be ~/perl_lib/EPrints/Apache/Rewrite.pm, although this is not always the case!).

Flow of a request

Below are relevant parts of config files and perl modules that are used with when processing a request. The request will generally be dealt with by the EPrints::Apache::Rewrite module, and farmed out from there. Hoe the request reaches this module is also explained below.

Apache core config ~/cfg/apache.conf

PerlSwitches -I/home/eprints/eprints-3.3.12/perl_lib
PerlModule EPrints
PerlPostConfigHandler +EPrints::post_config_handler

The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler

Apache repository config ~/cfg/apache/ARCHIVEID.conf

<VirtualHost *:80>
...
  PerlTransHandler +EPrints::Apache::Rewrite
 
</VirtualHost>

This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!

EPrints::Apache::Rewrite module

This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm

It's worth taking a few minutes to look at this file - specifically the sub handler.

EP_TRIGGER_URL_REWRITE

Line 123 calls the 'EP_TRIGGER_URL_REWRITE' trigger:

$repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE,
	request => $r,
	   lang => $lang,    # en
	   args => $args,    # "" or "?foo=bar"
	urlpath => $urlpath, # "" or "/subdir"
	cgipath => $cgipath, # /cgi or /subdir/cgi
	    uri => $uri,     # /foo/bar
	 secure => $secure,  # boolean
    return_code => \$rc,     # set to trigger a return
);

If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned. Information on triggers

CGI scripts

Line 157 deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations Lines 195-199:

  • ~/archives/ARCHIVEID/cgi/
  • ~/site_lib/cgi/
  • ~/cgi/

If the cgi script is a 'user' script, it also defines a PerlAccessHandler Lines 214-220

if( $uri =~ m! ^/users\b !x )
{
	$r->push_handlers(PerlAccessHandler => [
		\&EPrints::Apache::Auth::authen,
		\&EPrints::Apache::Auth::authz
	] );
}

SWORD servicedocument

Lines 233-258 deal with the 'Sword' service document, via the CRUD interface.

REST interface

Lines 281-290 handle the REST interface, via EPrints::Apache::REST

EPrints URIs

Lines 292-374: EPrint URIs are normally of the form http://repository.blah/id/.... There are three main if blocks in this section that use regex's to match the URI:

  • Line 293 $uri =~ m! ^$urlpath/id/(repository|dump)$ !x matches two cases.
  • Line 318 $uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x matches ??? Some RDF type stuff!? 'event/ext-foo'..?
  • Lines 345-347 (shown on one line here) $uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz:
$r->push_handlers(PerlAccessHandler => [
	sub { $crud->authen },
	sub { $crud->authz },
] );

EPrint IDs, Documents

Under construction!

Lines 377-493 This block of code looks for a request for http://repository.blah/123 - '123' is EPrint ID 123.

$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x

TODO (some should be seperate pages)

  • Explain flow of Rewrite
    • triggers
    • cgi
    • content negotiation
    • CRUD
    • ???
  • permit on DataObj
    • can_request_view / can_user_view
  • summary pages (content neg/ URL rewrite)
  • DOI - 5 metadata elements
  • 40x handling
Access Control Layer