Difference between revisions of "Anatomy of a request"

From EPrints Documentation
Jump to: navigation, search
(Working through Rewrite module)
(EPrint IDs, Documents)
Line 93: Line 93:
 
'''Under construction!'''
 
'''Under construction!'''
  
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L377-493 Lines 377-493] This block of code looks for a request for <code><nowiki>http://repository.blah/123</nowiki></code> - '123' is EPrint ID 123.
+
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L377-493 Lines 377-493] This block of code looks for requests starting with e.g. <code><nowiki>http://repository.blah/123</nowiki></code> - where '123' is an EPrintID.
 +
There are some <code>redir</code>ects in this block to account for older URL that may be requested that had EPrintIDs and/or document positions zero-padded <code><nowiki>http://repository.blah/00000123</nowiki></code> or <code><nowiki>http://repository.blah/00000123/01/Document.txt</nowiki></code>.
  
$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x
+
Each subsequent match on the <code>$uri</code> consumes part of it - e.g.
 +
 
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L377 Line 377]
 +
<source lang="perl">$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x</source>
 +
will remove the EPrintID from the start of <code>$uri</code>.
 +
 
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L398 Line 398]
 +
<source lang="perl">$uri =~ s! ^/(0*)([1-9][0-9]*)\b !!x</source>
 +
will match elements '''after''' the EPrintID in the original URL - matching '45' in
 +
*<code><nowiki>http://repository.blah/123/45/Document.txt</nowiki></code> or
 +
*<code><nowiki>http://repository.blah/123/45.hassmallThumbnailVersion/Document.txt</nowiki></code>
 +
(the second example shows the use of a 'relationship' that is processed using a EP_TRIGGER_DOC_URL_REWRITE trigger).
 +
 
 +
[https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm#L418-419 Lines 418-419] may be a bit confusing at first glance.
 +
<source lang="perl">
 +
$uri =~ s! ^([^/]*)/ !!x;
 +
my @relations = grep { length($_) } split /\./, $1;
 +
</source>
 +
They deal with document relationships - that are of the form <code>.../DocID.relationship1.relationship2.relationshipN/...</code>.
 +
For documents, thumbnails are presented as related documents, the relationship is e.g. 'hassmallThumbnailVersion'.
 +
The first line gets anything from the start of <code>$uri</code> (the DocID already having been removed by line 398), to the next '/'.
 +
The second line takes the captured match ($1): <code>.relationship1.relationship2.relationshipN</code>
  
 
==TODO (some should be seperate pages)==  
 
==TODO (some should be seperate pages)==  

Revision as of 10:03, 24 October 2014

THIS PAGE IS UNDER CONSTRUCTION! JLRS 2014-10-23

This is a description of how EPrints and Apache handles an incoming request. Understanding this flow helps understand how an Access Control layer can be added to the system.

I will assume that you know how to locate a perl module file from the module name (e.g. EPrints::Apache::Rewrite will probably be ~/perl_lib/EPrints/Apache/Rewrite.pm, although this is not always the case!).

Flow of a request

Below are relevant parts of config files and perl modules that are used with when processing a request. The request will generally be dealt with by the EPrints::Apache::Rewrite module, and farmed out from there. Hoe the request reaches this module is also explained below.

Apache core config ~/cfg/apache.conf

PerlSwitches -I/home/eprints/eprints-3.3.12/perl_lib
PerlModule EPrints
PerlPostConfigHandler +EPrints::post_config_handler

The post_config_handler does some sanity checks on the EPrints setup (e.g. is Apache listening to the ports that the repositories are configured to work under) See: http://perl.apache.org/docs/2.0/user/handlers/server.html#C_PerlPostConfigHandler_ for more info about the post_config_handler

Apache repository config ~/cfg/apache/ARCHIVEID.conf

<VirtualHost *:80>
...
  PerlTransHandler +EPrints::Apache::Rewrite

</VirtualHost>

This leads us to the backbone of EPrints - the Rewrite module - where URL_REWRITE_* triggers are called; content negotiation can happen, as well as many other wonderous things!

EPrints::Apache::Rewrite module

This explanation, and line numbers are taken from a specific version of the file: https://github.com/eprints/eprints/blob/88a36fcf1f17c7a04e60455d374b617709f7461d/perl_lib/EPrints/Apache/Rewrite.pm Obviously this file will change over time, so it's worth comparing it with the version you are using, and possibly other versions on GitHub e.g. https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Apache/Rewrite.pm

It's worth taking a few minutes to look at this file - specifically the sub handler.

EP_TRIGGER_URL_REWRITE

Line 123 calls the 'EP_TRIGGER_URL_REWRITE' trigger:

$repository->run_trigger( EPrints::Const::EP_TRIGGER_URL_REWRITE,
	request => $r,
	   lang => $lang,    # en
	   args => $args,    # "" or "?foo=bar"
	urlpath => $urlpath, # "" or "/subdir"
	cgipath => $cgipath, # /cgi or /subdir/cgi
	    uri => $uri,     # /foo/bar
	 secure => $secure,  # boolean
    return_code => \$rc,     # set to trigger a return
);

If any of the EP_TRIGGER_URL_REWRITE's return a return_code, this is returned. Information on triggers

CGI scripts

Line 157 deals with CGI scripts - redirecting to HTTPS if necessary. It looks for the CGI scripts in three locations Lines 195-199:

  • ~/archives/ARCHIVEID/cgi/
  • ~/site_lib/cgi/
  • ~/cgi/

If the cgi script is a 'user' script, it also defines a PerlAccessHandler Lines 214-220

if( $uri =~ m! ^/users\b !x )
{
	$r->push_handlers(PerlAccessHandler => [
		\&EPrints::Apache::Auth::authen,
		\&EPrints::Apache::Auth::authz
	] );
}

SWORD servicedocument

Lines 233-258 deal with the 'Sword' service document, via the CRUD interface.

REST interface

Lines 281-290 handle the REST interface, via EPrints::Apache::REST

EPrints URIs

Lines 292-374: EPrint URIs are normally of the form http://repository.blah/id/.... There are three main if blocks in this section that use regex's to match the URI:

  • Line 293 $uri =~ m! ^$urlpath/id/(repository|dump)$ !x matches two cases.
  • Line 318 $uri =~ m! ^$urlpath/id/([^\/]+)/(ext-.*)$ !x matches ??? Some RDF type stuff!? 'event/ext-foo'..?
  • Lines 345-347 (shown on one line here) $uri =~ s! ^$urlpath/id/(?: contents | ([^/]+)(?:/([^/]+)(?:/([^/]+))?)? )$ !!x matches '/' seperated dataset, dataobjid and field - or 'contents'. Request is passed to CRUD handler, and uses it's authen/authz:
$r->push_handlers(PerlAccessHandler => [
	sub { $crud->authen },
	sub { $crud->authz },
] );

EPrint IDs, Documents

Under construction!

Lines 377-493 This block of code looks for requests starting with e.g. http://repository.blah/123 - where '123' is an EPrintID. There are some redirects in this block to account for older URL that may be requested that had EPrintIDs and/or document positions zero-padded http://repository.blah/00000123 or http://repository.blah/00000123/01/Document.txt.

Each subsequent match on the $uri consumes part of it - e.g.

Line 377

$uri =~ s! ^$urlpath/(0*)([1-9][0-9]*)\b !!x

will remove the EPrintID from the start of $uri.

Line 398

$uri =~ s! ^/(0*)([1-9][0-9]*)\b !!x

will match elements after the EPrintID in the original URL - matching '45' in

  • http://repository.blah/123/45/Document.txt or
  • http://repository.blah/123/45.hassmallThumbnailVersion/Document.txt

(the second example shows the use of a 'relationship' that is processed using a EP_TRIGGER_DOC_URL_REWRITE trigger).

Lines 418-419 may be a bit confusing at first glance.

$uri =~ s! ^([^/]*)/ !!x;
my @relations = grep { length($_) } split /\./, $1;

They deal with document relationships - that are of the form .../DocID.relationship1.relationship2.relationshipN/.... For documents, thumbnails are presented as related documents, the relationship is e.g. 'hassmallThumbnailVersion'. The first line gets anything from the start of $uri (the DocID already having been removed by line 398), to the next '/'. The second line takes the captured match ($1): .relationship1.relationship2.relationshipN

TODO (some should be seperate pages)

  • Explain flow of Rewrite
    • triggers
    • cgi
    • content negotiation
    • CRUD
    • ???
  • permit on DataObj
    • can_request_view / can_user_view
  • summary pages (content neg/ URL rewrite)
  • DOI - 5 metadata elements
  • 40x handling
Access Control Layer