Difference between revisions of "IRStats Technical Documentation"
(→Configuration Constants) |
m (category and redirection updated) |
||
(19 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | This | + | [[Category:Obsolete]] |
− | + | <div style="border: 2px solid red; background-color: yellow;padding:10px">This is IRStats 1 documentation. IRStats 1 is now out of support. You may have been looking for [[IRStats2]]</div> | |
= Directory Structure = | = Directory Structure = | ||
− | == /opt/ | + | == /opt/irstats/bin == |
− | |||
− | |||
− | |||
Contains the scripts needed to update the table. | Contains the scripts needed to update the table. | ||
*daily_update.sh - Runs all the scripts in the right order. | *daily_update.sh - Runs all the scripts in the right order. | ||
*extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint. | *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint. | ||
− | *update_table.pl - Filters and processes new entries in the accesslog to update the | + | *update_table.pl - Filters and processes new entries in the accesslog to update the irstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'. |
− | * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in | + | * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in irstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time. |
Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished. | Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished. | ||
− | == /opt/ | + | == /opt/irstats/cache == |
− | Contains cache files. | + | Contains cache files. These should probably be deleted whenever the database is updated. |
− | == /opt/ | + | == /opt/irstats/cgi == |
− | Contains two scripts, 'get_view | + | Contains two scripts, 'get_view and 'stats'. |
− | *get_view returns the output of a | + | *get_view returns the output of a IRStats::View (see below), which is currently a chunk of html or csv, but could be almost anything. |
*stats is a handy cgi form that passes arguements to get_view | *stats is a handy cgi form that passes arguements to get_view | ||
− | == /opt/ | + | == /opt/irstats/img == |
Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored. | Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored. | ||
− | == /opt/ | + | == /opt/irstats/cfg == |
+ | |||
+ | Where the configuration file and the text files containing repository data are held. | ||
+ | |||
+ | === The Configuration File === | ||
+ | |||
+ | irstats.cfg contains a number of configuration strings. Here are some of the more important ones, with the default in brackets: | ||
+ | |||
+ | *configuration_path (/opt/irstats/cfg/) - The path of the configuration directory. | ||
+ | *view_path (/opt/irstats/perl_lib/IRStats/View/) - The directory containing the Views. | ||
+ | *cache_path (/opt/irstats/cache/) - The directory in which to store cache files. | ||
+ | *graph_path (/opt/irstats/img/graphs/) - The directory in which to store graph images. | ||
+ | *graph_relative_url_path (/img/graphs/) - The url of the directory in which the graph file is from the point of view of the web browser. | ||
+ | *update_lock_filename (/opt/irstats/bin/.lock) - The name of the file that is created to prevent the update process running twice concurrently | ||
+ | *The names of the files used to store set information | ||
+ | **set_member_full_citations_file (/opt/irstats/cfg/irstats_set_member_full_citations.txt) | ||
+ | **set_member_short_citations_file (/opt/irstats/cfg/irstats_set_member_short_citations.txt) | ||
+ | **set_membership_file (/opt/irstats/cfg/irstats_set_membership.txt) | ||
+ | **set_member_codes_file (/opt/irstats/cfg/irstats_set_member_codes.txt) | ||
+ | **set_member_urls_file (/opt/irstats/cfg/irstats_set_member_urls.txt) | ||
+ | *Referrer Scope Labels (note, if you change these, you should also change them in the database) | ||
+ | **referrer_scope_1 (Internal) | ||
+ | **referrer_scope_2 (ECS) | ||
+ | **referrer_scope_3 (Search) | ||
+ | **referrer_scope_4 (External) | ||
+ | **referrer_scope_no_referrer (None) | ||
+ | *awstats_search_engines (/usr/local/awstats/wwwroot/cgi-bin/lib/search_engines.pm) - The path to the awstats search engine module | ||
+ | *repeats_filter_file (/opt/irstats/bin/repeatscache) - The file to maintain state between updates | ||
+ | *repeats_filter_timeout (86400) - repeat timeout in seconds (the amount of time there needs to be between two hits for them both to be recorded, initially set to 60*60*24) | ||
+ | |||
+ | *repository_url = http://eprints.ecs.soton.ac.uk - the path to the repository | ||
+ | |||
+ | *database configuration | ||
+ | **database_driver (mysql) | ||
+ | **database_server (localhost) | ||
+ | **database_name | ||
+ | **database_user | ||
+ | **database_password | ||
+ | |||
+ | *database_id_columns ([ requester_organisation, requester_host, referrer_scope, search_engine, search_terms, referring_entity_id ]) - The columns in the database that have a UID rather than data. These need seperate tables in which to store the data. | ||
+ | |||
+ | *Various table names and parts of names | ||
+ | **database_eprints_access_log_table (accesslog) ##Perhaps remove after update rewrite. | ||
+ | **database_main_stats_table (irstats_true_accesses_table) | ||
+ | **database_column_table_prefix (irstats_column_) | ||
+ | **database_set_table_prefix (irstats_set_) | ||
+ | **database_set_table_code_suffix (_code) | ||
+ | **database_set_table_citation_suffix (_citation) | ||
+ | |||
+ | *id_parameters ([ start_date, end_date, eprints, view ]) - the parameters that are used to uniquely identify a view | ||
+ | *host_lookup_temp_dir (/opt/irstats/bin/convert_hosts_temp_files/) - The directory in which to store temp files for host lookups | ||
+ | |||
+ | |||
+ | == /opt/irstats/perl_lib == | ||
+ | |||
+ | Contains all the irstats classes. | ||
− | + | = IRStats Classes = | |
− | + | Note that the leading IRStats:: has been left out for brevity. | |
− | + | == Configuration == | |
+ | This object acts as an interface to the configuration file. | ||
+ | === Configuration Contstants === | ||
+ | *$configuration_file - The path to the configuration file. | ||
+ | === Functions === | ||
+ | *new - Parses the configuration file and returns a new object. | ||
+ | *get_value(config_id) - Returns a value. | ||
== Params == | == Params == | ||
− | This object holds the parameters that are used to generate the statistics. | + | This object holds the parameters that are used to generate the statistics. This is passed around the system. |
=== Configuration Constants === | === Configuration Constants === | ||
− | |||
− | |||
*$defaults - Any default parameters you wish to set. | *$defaults - Any default parameters you wish to set. | ||
=== Functions === | === Functions === | ||
− | *new(CGI_object) - returns new object | + | *new(Configuration, [ CGI_object | params_hash ]) - returns new object |
*mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack. | *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack. | ||
*unmask - Sets parameters back to how they were before the last mask. | *unmask - Sets parameters back to how they were before the last mask. | ||
− | |||
*get(param_name) - returns the value of a single parameter. | *get(param_name) - returns the value of a single parameter. | ||
*create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called. | *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called. | ||
Line 55: | Line 110: | ||
== DatabaseInterface == | == DatabaseInterface == | ||
This object does what it says on the tin. Any access to the database is done though it. | This object does what it says on the tin. Any access to the database is done though it. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Functions === | === Functions === | ||
− | *new() - returns object. | + | *new(Configuration) - returns object. |
− | *retreive_set_names() - returns a list of eprint sets | + | *retreive_set_names() - returns a list of eprint sets. This can be used to verify cgi input. |
*get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author'). | *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author'). | ||
*get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short'). | *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short'). | ||
− | + | *get_stats(params_object, query_params_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in query params hash. The query params hash can contain the following key/value pairs | |
− | + | **columns => column_name_array - Which columns are we interested in? | |
− | + | **order => column_name - A hash containing a column name and directions (ASC or DESC) | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | *get_stats(params_object, | ||
− | **order => column_name - | ||
**limit => int - How many results to return | **limit => int - How many results to return | ||
**group_by => column_name - if we need to group by a column. | **group_by => column_name - if we need to group by a column. | ||
− | + | **where => where_hash_array - if additional logic needs to be applied, this array contains hashes containing a column name, an operator and a value. These are ANDed together. | |
− | ** | + | *check_tables() - If any IRStats tables are missing, this function will create them. |
− | * | + | *insert_main_table_row(column_array) - inserts the values in the array into the main table (taking into account any tables that contain only IDs). |
− | * | + | *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results. This is the only point where sql is sent to the database. |
− | * | ||
− | |||
== Date == | == Date == | ||
− | + | A date object was implemented because there were some specific things that needed to be done with dates. | |
===Functions=== | ===Functions=== | ||
Line 104: | Line 136: | ||
*increment(period) - decrements by calling mod_date. | *increment(period) - decrements by calling mod_date. | ||
*part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four. | *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four. | ||
+ | *difference(date_object) - returns the difference in days between itself and another date. | ||
*less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0. | *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0. | ||
*greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0. | *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0. | ||
Line 109: | Line 142: | ||
*render(format_string) - returns a date string. Format can be: | *render(format_string) - returns a date string. Format can be: | ||
**'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77 | **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77 | ||
− | |||
**'numerical' (default) - Calls render_numerical - returns a date like this: 19770705 | **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705 | ||
*clone - returns an new, identical date object. | *clone - returns an new, identical date object. | ||
− | |||
== Cache == | == Cache == | ||
The interface to the cache. | The interface to the cache. | ||
− | |||
− | |||
− | |||
=== Functions === | === Functions === | ||
Line 131: | Line 159: | ||
*new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object. | *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object. | ||
− | The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both | + | The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both IRStats::Date objects. |
*calandar_months - Returns full months (each element starts on the 1st, and ends on the last day). | *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day). | ||
Line 139: | Line 167: | ||
== UserInterface::Controls == | == UserInterface::Controls == | ||
− | This is used to generate the drop boxes in the stats cgi script. | + | This is used to generate the drop boxes in the stats cgi script. |
+ | ===Functions=== | ||
+ | new(params_obj, database_interface_object) - returns the object. | ||
+ | start_date_control() - returns the html for the three drop-boxes for selecting the year, month and day of the start date. | ||
+ | end_date_control() - return the html for the three drop-boxes for selecting the year, month and day of the end date. | ||
+ | eprint_control() - returns the html for the eprints text box. | ||
+ | drop_box(id, contents_array) - returns the html for a drop box containing what is in the array (each array element is a hash containing 'value' and 'display'). | ||
+ | |||
+ | == View == | ||
+ | A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views. | ||
+ | |||
+ | === Functions === | ||
+ | All views inherit: | ||
+ | *new(params_obj, database_interface_object) - returns the object. | ||
+ | *render - calls populate, then returns whatever the visualisation renders | ||
+ | All visualisations must implement: | ||
+ | *new - passes arguments to superclass, then calls 'initialise'. | ||
+ | *initialise - the Configuration Constants are set here. | ||
+ | *populate - The engine that powers IRStats. | ||
+ | |||
+ | === View::DownloadCountHTML === | ||
+ | The DownloadCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough: | ||
+ | |||
+ | ==== Housekeeping ==== | ||
+ | At the top of the file, we need: | ||
+ | package IRStats::View::DownloadCountHTML; | ||
+ | use strict; | ||
+ | use warnings; | ||
+ | Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it. | ||
+ | use IRStats::DatabaseInterface; | ||
+ | use IRStats::Cache; | ||
+ | use IRStats::Visualisation::HTML; | ||
+ | use IRStats::View; | ||
+ | use perlchartdir; | ||
+ | And link to superclass. | ||
+ | our @ISA = qw/ IRStats::View /; | ||
+ | |||
+ | ==== Configuration Constants ==== | ||
+ | We aren't actually interested in any columns, just in the count, but we put that in the columns array anyway. | ||
+ | We also create our visualisation here. | ||
+ | sub initialise | ||
+ | { | ||
+ | my ($self) = @_; | ||
+ | $self->{'sql_params'} = {columns => [ 'COUNT' ]}; | ||
+ | $self->{'visualisation'} = IRStats::Visualisation::HTML->new(); | ||
+ | } | ||
+ | |||
+ | ==== new ==== | ||
+ | The new function shouldn't ever need to be any different from this: | ||
+ | sub new | ||
+ | { | ||
+ | my( $class, $params, $database ) = @_; | ||
+ | my $self = $class->SUPER::new($params, $database);; | ||
+ | $self->initialise(); | ||
+ | return $self; | ||
+ | } | ||
+ | |||
+ | ==== populate ==== | ||
+ | Almost every populate function should start by checking the cache. | ||
+ | |||
+ | sub populate | ||
+ | { | ||
+ | my ($self) = @_; | ||
+ | my $cache = IRStats::Cache->new($self->{'params'}->get('id')); | ||
+ | if ($cache->exists) | ||
+ | { | ||
+ | $self->{'visualisation'} = $cache->read(); | ||
+ | return; | ||
+ | } | ||
+ | Next, we have to retreive from the database: | ||
+ | |||
+ | my $query = $self->{'database'}->get_stats( | ||
+ | $self->{'params'}, | ||
+ | $self->{'sql_params'} | ||
+ | ); | ||
+ | Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something. | ||
+ | my @row = $query->fetchrow_array(); | ||
+ | my $html = '<span class="irstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>"; | ||
+ | A little housekeeping: | ||
+ | $query->finish(); | ||
+ | Pop the data into the visualisation: | ||
+ | $self->{'visualisation'}->set('html',$html); | ||
+ | Finally, we should write to the cache so we don't have to query the database next time. | ||
+ | $cache->write($self->{'visualisation'}); | ||
+ | } | ||
− | + | And that's a really simple view. | |
− | |||
− | == | + | === Using Periods === |
+ | |||
+ | If we wanted to break our daterange into periods, we'd need to do something like this: | ||
+ | |||
+ | my $periods = IRStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'}); | ||
+ | foreach my $period ( @{$periods->calandar_months()} ) | ||
+ | { | ||
+ | $self->{'params'}->mask($period); | ||
+ | my $query = $self->{'database'}->get_stats( | ||
+ | $self->{'params'}, | ||
+ | $self->{'sql_params'} | ||
+ | ); | ||
+ | $self->{'params'}->unmask(); | ||
+ | #process and put into variables | ||
+ | } | ||
== Visualisation == | == Visualisation == | ||
Line 188: | Line 313: | ||
An HTML table that is rendered in several columns. | An HTML table that is rendered in several columns. | ||
==== Configuration Constants ==== | ==== Configuration Constants ==== | ||
− | $default_number_of_rows - an int representing the maximum number of rows the table should have. | + | $default_number_of_rows - an int representing the maximum number of rows the table should have. This is to prevent sending huge tables to browsers which may not be able to handle it. |
==== Overridden Functions ==== | ==== Overridden Functions ==== | ||
Line 199: | Line 324: | ||
== Visualisation::Graph == | == Visualisation::Graph == | ||
− | The graph objects all use Chart Director to generate graphs. | + | The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using. |
+ | |||
+ | Every graph must be created with at least the filename: | ||
+ | *new({filename => string}) - the filename comes from the ID of the param object. | ||
=== Configuration Constants === | === Configuration Constants === | ||
Line 206: | Line 334: | ||
*$url_relative - this will have the filename added to the end and put in the img html tag. | *$url_relative - this will have the filename added to the end and put in the img html tag. | ||
− | To | + | === Sub Classes === |
+ | Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends. | ||
+ | |||
+ | ====Visualisation::Graph::Bar.pm==== | ||
+ | A Bar Graph. It can have one or more bars in each division of the x axis. | ||
+ | |||
+ | To implement: | ||
+ | *set('title',string) - The title that will be in the graph image. | ||
+ | *set('x_title',string) - The title of the x axis. | ||
+ | *set('y_title',string) - The title of the y axis. | ||
+ | *set('x_labels',array_ref) - an array containing the labels for the x axis | ||
+ | *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars | ||
+ | |||
+ | ====Visualisation::Graph::Line.pm==== | ||
+ | A Line Graph. There can be many lines on it | ||
+ | |||
+ | To implement: | ||
+ | *set('title',string) - The title that will be in the graph image. | ||
+ | *set('x_title',string) - The title of the x axis. | ||
+ | *set('y_title',string) - The title of the y axis. | ||
+ | *set('x_labels',array_ref) - an array containing the labels for the x axis | ||
+ | *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line | ||
+ | |||
+ | ====Visualisation::Graph::Pie.pm==== | ||
+ | A Pie Graph | ||
+ | |||
+ | To implement: | ||
+ | *set('title',string) - The title that will be in the graph image. | ||
+ | *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice |
Latest revision as of 16:43, 8 August 2019
Contents
Directory Structure
/opt/irstats/bin
Contains the scripts needed to update the table.
- daily_update.sh - Runs all the scripts in the right order.
- extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.
- update_table.pl - Filters and processes new entries in the accesslog to update the irstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.
- convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in irstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.
Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.
/opt/irstats/cache
Contains cache files. These should probably be deleted whenever the database is updated.
/opt/irstats/cgi
Contains two scripts, 'get_view and 'stats'.
- get_view returns the output of a IRStats::View (see below), which is currently a chunk of html or csv, but could be almost anything.
- stats is a handy cgi form that passes arguements to get_view
/opt/irstats/img
Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.
/opt/irstats/cfg
Where the configuration file and the text files containing repository data are held.
The Configuration File
irstats.cfg contains a number of configuration strings. Here are some of the more important ones, with the default in brackets:
- configuration_path (/opt/irstats/cfg/) - The path of the configuration directory.
- view_path (/opt/irstats/perl_lib/IRStats/View/) - The directory containing the Views.
- cache_path (/opt/irstats/cache/) - The directory in which to store cache files.
- graph_path (/opt/irstats/img/graphs/) - The directory in which to store graph images.
- graph_relative_url_path (/img/graphs/) - The url of the directory in which the graph file is from the point of view of the web browser.
- update_lock_filename (/opt/irstats/bin/.lock) - The name of the file that is created to prevent the update process running twice concurrently
- The names of the files used to store set information
- set_member_full_citations_file (/opt/irstats/cfg/irstats_set_member_full_citations.txt)
- set_member_short_citations_file (/opt/irstats/cfg/irstats_set_member_short_citations.txt)
- set_membership_file (/opt/irstats/cfg/irstats_set_membership.txt)
- set_member_codes_file (/opt/irstats/cfg/irstats_set_member_codes.txt)
- set_member_urls_file (/opt/irstats/cfg/irstats_set_member_urls.txt)
- Referrer Scope Labels (note, if you change these, you should also change them in the database)
- referrer_scope_1 (Internal)
- referrer_scope_2 (ECS)
- referrer_scope_3 (Search)
- referrer_scope_4 (External)
- referrer_scope_no_referrer (None)
- awstats_search_engines (/usr/local/awstats/wwwroot/cgi-bin/lib/search_engines.pm) - The path to the awstats search engine module
- repeats_filter_file (/opt/irstats/bin/repeatscache) - The file to maintain state between updates
- repeats_filter_timeout (86400) - repeat timeout in seconds (the amount of time there needs to be between two hits for them both to be recorded, initially set to 60*60*24)
- repository_url = http://eprints.ecs.soton.ac.uk - the path to the repository
- database configuration
- database_driver (mysql)
- database_server (localhost)
- database_name
- database_user
- database_password
- database_id_columns ([ requester_organisation, requester_host, referrer_scope, search_engine, search_terms, referring_entity_id ]) - The columns in the database that have a UID rather than data. These need seperate tables in which to store the data.
- Various table names and parts of names
- database_eprints_access_log_table (accesslog) ##Perhaps remove after update rewrite.
- database_main_stats_table (irstats_true_accesses_table)
- database_column_table_prefix (irstats_column_)
- database_set_table_prefix (irstats_set_)
- database_set_table_code_suffix (_code)
- database_set_table_citation_suffix (_citation)
- id_parameters ([ start_date, end_date, eprints, view ]) - the parameters that are used to uniquely identify a view
- host_lookup_temp_dir (/opt/irstats/bin/convert_hosts_temp_files/) - The directory in which to store temp files for host lookups
/opt/irstats/perl_lib
Contains all the irstats classes.
IRStats Classes
Note that the leading IRStats:: has been left out for brevity.
Configuration
This object acts as an interface to the configuration file.
Configuration Contstants
- $configuration_file - The path to the configuration file.
Functions
- new - Parses the configuration file and returns a new object.
- get_value(config_id) - Returns a value.
Params
This object holds the parameters that are used to generate the statistics. This is passed around the system.
Configuration Constants
- $defaults - Any default parameters you wish to set.
Functions
- new(Configuration, [ CGI_object | params_hash ]) - returns new object
- mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.
- unmask - Sets parameters back to how they were before the last mask.
- get(param_name) - returns the value of a single parameter.
- create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.
DatabaseInterface
This object does what it says on the tin. Any access to the database is done though it.
Functions
- new(Configuration) - returns object.
- retreive_set_names() - returns a list of eprint sets. This can be used to verify cgi input.
- get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').
- get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').
- get_stats(params_object, query_params_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in query params hash. The query params hash can contain the following key/value pairs
- columns => column_name_array - Which columns are we interested in?
- order => column_name - A hash containing a column name and directions (ASC or DESC)
- limit => int - How many results to return
- group_by => column_name - if we need to group by a column.
- where => where_hash_array - if additional logic needs to be applied, this array contains hashes containing a column name, an operator and a value. These are ANDed together.
- check_tables() - If any IRStats tables are missing, this function will create them.
- insert_main_table_row(column_array) - inserts the values in the array into the main table (taking into account any tables that contain only IDs).
- do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results. This is the only point where sql is sent to the database.
Date
A date object was implemented because there were some specific things that needed to be done with dates.
Functions
- new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.
- validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.
- set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.
- decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.
- increment(period) - decrements by calling mod_date.
- part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.
- difference(date_object) - returns the difference in days between itself and another date.
- less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.
- greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.
- month_name() - returns the three letter string of the month.
- render(format_string) - returns a date string. Format can be:
- 'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77
- 'numerical' (default) - Calls render_numerical - returns a date like this: 19770705
- clone - returns an new, identical date object.
Cache
The interface to the cache.
Functions
- new(id) - takes the ID of the params object we're using at the moment.
- exists() - returns true if there's a cached file, false if there isn't one.
- write(visualisation_object) - writes the data to the cache file.
- read() - returns the data from the cache.
Periods
The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.
Functions
- new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.
The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both IRStats::Date objects.
- calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).
- months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).
- weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).
- days - returns single days (for each period, the start_date and end_date are the same).
UserInterface::Controls
This is used to generate the drop boxes in the stats cgi script.
Functions
new(params_obj, database_interface_object) - returns the object. start_date_control() - returns the html for the three drop-boxes for selecting the year, month and day of the start date. end_date_control() - return the html for the three drop-boxes for selecting the year, month and day of the end date. eprint_control() - returns the html for the eprints text box. drop_box(id, contents_array) - returns the html for a drop box containing what is in the array (each array element is a hash containing 'value' and 'display').
View
A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.
Functions
All views inherit:
- new(params_obj, database_interface_object) - returns the object.
- render - calls populate, then returns whatever the visualisation renders
All visualisations must implement:
- new - passes arguments to superclass, then calls 'initialise'.
- initialise - the Configuration Constants are set here.
- populate - The engine that powers IRStats.
View::DownloadCountHTML
The DownloadCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:
Housekeeping
At the top of the file, we need:
package IRStats::View::DownloadCountHTML; use strict; use warnings;
Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.
use IRStats::DatabaseInterface; use IRStats::Cache; use IRStats::Visualisation::HTML; use IRStats::View; use perlchartdir;
And link to superclass.
our @ISA = qw/ IRStats::View /;
Configuration Constants
We aren't actually interested in any columns, just in the count, but we put that in the columns array anyway. We also create our visualisation here.
sub initialise { my ($self) = @_; $self->{'sql_params'} = {columns => [ 'COUNT' ]}; $self->{'visualisation'} = IRStats::Visualisation::HTML->new(); }
new
The new function shouldn't ever need to be any different from this:
sub new { my( $class, $params, $database ) = @_; my $self = $class->SUPER::new($params, $database);; $self->initialise(); return $self; }
populate
Almost every populate function should start by checking the cache.
sub populate { my ($self) = @_; my $cache = IRStats::Cache->new($self->{'params'}->get('id')); if ($cache->exists) { $self->{'visualisation'} = $cache->read(); return; }
Next, we have to retreive from the database:
my $query = $self->{'database'}->get_stats( $self->{'params'}, $self->{'sql_params'} );
Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.
my @row = $query->fetchrow_array();
my $html = '' . ($row[1] ? $row[1] : '0') . "";
A little housekeeping:
$query->finish();
Pop the data into the visualisation:
$self->{'visualisation'}->set('html',$html);
Finally, we should write to the cache so we don't have to query the database next time.
$cache->write($self->{'visualisation'}); }
And that's a really simple view.
Using Periods
If we wanted to break our daterange into periods, we'd need to do something like this:
my $periods = IRStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'}); foreach my $period ( @{$periods->calandar_months()} ) { $self->{'params'}->mask($period); my $query = $self->{'database'}->get_stats( $self->{'params'}, $self->{'sql_params'} ); $self->{'params'}->unmask(); #process and put into variables }
Visualisation
Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).
Functions
All visualisations inherit:
- new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.
- set(param_name, value) - sets something to something - see subclasses
All visualisations must implement:
- render() - returns what will be passed to the script.
Visualisation::HTML
The simplest visualisation. Just a chunk of html.
To Populate:
- set('html', html_string) - takes the html as a string.
Visualisation::Table
The Visualisation::Table currently just passes the buck to its superclass.
There are currently three table Visualisations:
Visualisation::Table::CSV
Returns a CSV table.
To Populate:
- set('headings', headings_arrayref) - pass an array containing headings.
- set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.
Visualisation::Table::HTML
A basic HTML table.
To Populate:
- set('columns', headings_arrayref) - pass an array containing column headings.
- set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.
And then optionally
- set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.
Visualisation::Table::HTML_Columned
An HTML table that is rendered in several columns.
Configuration Constants
$default_number_of_rows - an int representing the maximum number of rows the table should have. This is to prevent sending huge tables to browsers which may not be able to handle it.
Overridden Functions
- new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.
To Populate:
- set('columns', headings_arrayref) - pass an array containing column headings.
- set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.
- set('number_of_rows', int) - set the maximum number of rows the table should have.
Visualisation::Graph
The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.
Every graph must be created with at least the filename:
- new({filename => string}) - the filename comes from the ID of the param object.
Configuration Constants
These are set in the 'new' function.
- $graph_dir - the path to the directory where the image file will be saved.
- $url_relative - this will have the filename added to the end and put in the img html tag.
Sub Classes
Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.
Visualisation::Graph::Bar.pm
A Bar Graph. It can have one or more bars in each division of the x axis.
To implement:
- set('title',string) - The title that will be in the graph image.
- set('x_title',string) - The title of the x axis.
- set('y_title',string) - The title of the y axis.
- set('x_labels',array_ref) - an array containing the labels for the x axis
- set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars
Visualisation::Graph::Line.pm
A Line Graph. There can be many lines on it
To implement:
- set('title',string) - The title that will be in the graph image.
- set('x_title',string) - The title of the x axis.
- set('y_title',string) - The title of the y axis.
- set('x_labels',array_ref) - an array containing the labels for the x axis
- set('data_series, array_ref) - an array of arrayrefs, referencing data for each line
Visualisation::Graph::Pie.pm
A Pie Graph
To implement:
- set('title',string) - The title that will be in the graph image.
- set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice