Difference between revisions of "IRStats"

From EPrints Documentation
Jump to: navigation, search
(Installing IRStats)
m (category replaced)
 
(36 intermediate revisions by 7 users not shown)
Line 1: Line 1:
IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.
+
[[Category:Obsolete]]
 +
<div style="border: 2px solid red; background-color: yellow;padding:10px">This is IRStats 1 documentation. IRStats 1 is now out of support. You may have been looking for [[IRStats2]]</div>
  
== Technical Overview ==
+
IRStats is a flexible statistics package which allows easy processing of accesses to fulltext documents of eprints. It can be downloaded from the [http://files.eprints.org/722/ Eprints File repository]. For more detailed information, please see the [[IRStats Technical Documentation]], though it is now somewhat out of date.
  
The following is a quick tour of IRStats.
+
== The front end ==
  
=== Parameters ===
+
===The Query Form===
  
IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:
+
The main interface to IRStats is found at the following URL (given a repository base URL of myrepository.ac.uk):
  
==== Start Date and End Date ====
+
<pre>
 +
myrepository.ac.uk/cgi/irstats.cgi
 +
</pre>
  
Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year).  Any statistics outside this daterange are ignored.
+
You will be presented with a form allowing you to select the parameters with which to generate a report.
  
==== An Eprint Set ====
+
===Advanced Report Generation (get_view2 params)===
  
As well as defining a daterange, we also have to inform IRStats of which publications we are interested in.  Any publication not in the set will be ignored.  A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.
+
The following will help if you wish to create queries by setting the CGI parameters by hand.
  
==== View ====
+
There are three fundamental parameters that IRStats uses.  There are:
 +
* A Date Range (actually 6 parameters for day, month and year for both start and end dates)
 +
* A Set of EPrints
 +
* A View
  
The final parameter tells IRStats how we want to process and display the statisticsThis is done by selecting a View.
+
However, in order to add functionality, the get_view2 page will convert a larger number of parameters into these threeThe following table shows all parameters and values, with square brackets denoting variables.
  
=== Views ===
 
  
Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably requiredWhen a query is made to IRStats, a View is createdIt generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the queryThe View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.
+
{| border="1"
 +
! Parameter
 +
! Possible Values
 +
! Notes
 +
|-
 +
| IRS_datechoice || period, range || Controls whether the 6 date range parameters or the single period parameter is used.
 +
|-
 +
| period || -[X]m, Q[Z][YYYY] || Used when IRS_datechoice=period.<br/>Where m and Q are literal characters, X is a positive integer, Z is an integer in the range 1 to 4 and YYYY is a four digit year.<br/>Examples: <dl><dt>-4m<dd>Go back exactly four months from today's date<dt>Q32004<dd>Quarter 3, 2004</dl>
 +
|-
 +
| start_day, start_month, start_year, end_day, end_month, end_year || integers (1-31, 1-12, four digit respectively) || Used when IRS_datechoice=range.<br/> Note that if a day value is higher than the highest day in the chosen month, it will be treated as the highest day -- e.g. start_day=31&start_month=02 is seen as valid and equivalent to February 28thNote that start_day=99 is also valid!
 +
|-
 +
| IRS_epchoice || All, EPrint, [set_id] || Controls whether stats will be generated on all eprints, a single eprints, or a set of eprints.  The 'All' option is the only one that does not require extra parametersNote that 'set_id' is the id of a valid set as defined in the IRStats configuration.
 +
|-
 +
| eprint || [eprintid] || Used when IRS_epchoice=EPrint.<br/>Any valid eprint ID (integer).
 +
|-
 +
| [set_id]s || [set_id]_[set_member_code] || Used when IRS_epchoice=[set_id].<br/> Best described through example: <dl><dt>IRS_epchoice=divisions&divisionss=divisions_art<dd>Will generate a report on the art department, given a standard EPrints repository and IRStats config, where the subject id 'art' exists in the divisions tree in EPrints.
 +
|-
 +
| view || [view classname] || The classname of the IRStats::View perl module.
 +
|}
  
=== Visualisations ===
+
===The Dashboard Form===
  
A Visualisation takes a set of processed statistics and outputs themFor example, Visualisation::Graph::Pie creates a pie chart.
+
A dashboard is a collection of reports on a single item or set of items (e.g. all items by John Smith)To access the form to generate a report, go to the url:
  
=== The Database Interface ===
+
<pre>
 +
myrepository.ac.uk/cgi/irstats.cgi?page=db
 +
</pre>
  
The Database Interface object handles all queries to the database.  Most requests for statistics can be completed with a single call to the get_stats($params) method.
+
== The configuration file ==
  
=== Data Flow Diagram ===
+
Documentation to follow.
[[Image:irstats_overview.png]]
 
 
 
== Required Data ==
 
 
 
In order for IRStats to run, it requires two things:
 
 
 
* a database table containing all hits to the repository
 
* text files describing the contents of the repository
 
 
 
=== The Hits Table ===
 
 
 
Awaiting a redevelopment.
 
 
 
=== The Text Files ===
 
 
 
In order for IRStats to build up a picture of a repository, a number of text files need to be created and stored in the cfg/ directory:
 
 
 
* epstats_set_membership.txt
 
* epstats_set_member_codes.txt
 
* epstats_set_member_full_citations.txt
 
* epstats_set_member_short_citations.txt
 
* epstats_set_member_urls.txt
 
 
 
==== Explanation by Example ====
 
 
 
Imagine a very small repository.  Here are its contents:
 
 
 
* eprints
 
** (1) The Smells of Cheese
 
** (2) The Tastes of Wines
 
** (3) The Sounds of Oboes
 
* Authors
 
** (1) John Smith
 
** (2) Harriet Jones
 
 
 
If we then imagine that the following are also true:
 
 
 
* John Smith is credited with being an author of eprints (1) and (2)
 
* Harriet Jones is credited with being an author of eprints (2) and (3)
 
* All three eprints are the output of a research group named "Senses"
 
 
 
===== Creating sets =====
 
 
 
Sets are groups of eprints, and every eprint is a member of at least one set (the set containing only that eprint).  From the information above, we have three sets.  The eprint set, the author set and the research group set.  We need to add the following to epstats_set_membership.txt (the format is <id><tab><csv list of eprint ids>
 
 
 
author_1        1,2
 
author_2        2,3
 
group_1        1,2,3
 
eprint_1        1
 
eprint_2        2
 
eprint_3        3
 
 
 
===== Giving Sets IDs =====
 
 
 
So, we now have some sets, but we need to give them unique IDs so that we can retrieve stats for these sets.  To do this, we add the following to epstats_set_member_codes.txt:
 
 
 
author_1        js
 
author_2        hj
 
group_1        senses
 
eprint_1        1
 
eprint_2        2
 
eprint_3        3
 
 
 
IRStats now assigns the following unique IDs to each set: author_js, author_hj, group_senses, eprint_1, eprint_2, eprint_3.  Note that the IDs should probably be kept alphanumeric, and must be unique within a class of sets (but you can have author_hj, group_hj and eprint_hj).
 
 
 
===== Citations =====
 
 
 
IRStats uses two citations for each set member, one short and one long.  Which you use depends on how you would like your visualisation to look.  However, we do need to add these to the citations files:
 
 
 
epstats_set_member_short_citations.txt
 
author_1        Smith
 
 
 
epstats_set_member_full_citations.txt
 
author_1        Dr John Smith, PhD
 
 
 
Note that the above examples are only for author_1.  It would be exactly the same for any set member.
 
 
 
===== URLs =====
 
 
 
Although URLs are not currently implemented, it is probably a good idea to include this information (in epstats_set_member_urls.txt) for future functionality.
 
 
 
author_1        http://homepage.john.smith.com/
 
 
 
== Installing IRStats ==
 
 
 
 
 
 
 
=== Dependencies ===
 
 
 
==== Logfile::EPrints ====
 
 
 
The Logfile::Eprints modules are used to assist in filtering the raw access log.  They can be installed from CPAN.
 
 
 
==== AWStats ====
 
 
 
AWStats data is used to filter out webspiders and classify search engines.  The irstats.cfg must have an entry showing where the correct perl modules are.
 
 
 
==== Geo::IP ====
 
 
 
Geo::IP is used to fill in country and organisation information.  The country database is free, but if you want organisation information, you will have to purchase a subscription for their database.  The location of the database should also be inserted into irstats.cfg.
 
 
 
Note: The pure perl version of Geo::IP does not support organisations.
 
 
 
=== Installing ===
 
 
 
=== Customising ===
 
 
 
It will almost always be necessary to perform some customisation on IRStats because every repository is different.
 
 
 
==== Updating the Table ====
 

Latest revision as of 16:58, 8 August 2019

This is IRStats 1 documentation. IRStats 1 is now out of support. You may have been looking for IRStats2

IRStats is a flexible statistics package which allows easy processing of accesses to fulltext documents of eprints. It can be downloaded from the Eprints File repository. For more detailed information, please see the IRStats Technical Documentation, though it is now somewhat out of date.

The front end

The Query Form

The main interface to IRStats is found at the following URL (given a repository base URL of myrepository.ac.uk):

myrepository.ac.uk/cgi/irstats.cgi

You will be presented with a form allowing you to select the parameters with which to generate a report.

Advanced Report Generation (get_view2 params)

The following will help if you wish to create queries by setting the CGI parameters by hand.

There are three fundamental parameters that IRStats uses. There are:

  • A Date Range (actually 6 parameters for day, month and year for both start and end dates)
  • A Set of EPrints
  • A View

However, in order to add functionality, the get_view2 page will convert a larger number of parameters into these three. The following table shows all parameters and values, with square brackets denoting variables.


Parameter Possible Values Notes
IRS_datechoice period, range Controls whether the 6 date range parameters or the single period parameter is used.
period -[X]m, Q[Z][YYYY] Used when IRS_datechoice=period.
Where m and Q are literal characters, X is a positive integer, Z is an integer in the range 1 to 4 and YYYY is a four digit year.
Examples:
-4m
Go back exactly four months from today's date
Q32004
Quarter 3, 2004
start_day, start_month, start_year, end_day, end_month, end_year integers (1-31, 1-12, four digit respectively) Used when IRS_datechoice=range.
Note that if a day value is higher than the highest day in the chosen month, it will be treated as the highest day -- e.g. start_day=31&start_month=02 is seen as valid and equivalent to February 28th. Note that start_day=99 is also valid!
IRS_epchoice All, EPrint, [set_id] Controls whether stats will be generated on all eprints, a single eprints, or a set of eprints. The 'All' option is the only one that does not require extra parameters. Note that 'set_id' is the id of a valid set as defined in the IRStats configuration.
eprint [eprintid] Used when IRS_epchoice=EPrint.
Any valid eprint ID (integer).
[set_id]s [set_id]_[set_member_code] Used when IRS_epchoice=[set_id].
Best described through example:
IRS_epchoice=divisions&divisionss=divisions_art
Will generate a report on the art department, given a standard EPrints repository and IRStats config, where the subject id 'art' exists in the divisions tree in EPrints.
view [view classname] The classname of the IRStats::View perl module.

The Dashboard Form

A dashboard is a collection of reports on a single item or set of items (e.g. all items by John Smith). To access the form to generate a report, go to the url:

myrepository.ac.uk/cgi/irstats.cgi?page=db

The configuration file

Documentation to follow.