EPrints Documentation - User contributions [en-gb]

https://wiki.eprints.org/w/api.php?action=feedcontributions&feedformat=atom&user=Gobfrey EPrints Documentation - User contributions [en-gb] 2026-06-13T16:41:52Z User contributions MediaWiki 1.31.8 https://wiki.eprints.org/w/index.php?title=Adding_new_views&diff=6144 Adding new views 2008-06-24T13:39:09Z

<p>Gobfrey: /* tags */</p> <hr /> <div>{{development}}<br /> <br /> Browse views provide a way for visitors to your site to discover relevant content without a specific item in mind (for example, browsing all the content associated with a particular topic). Visitors arriving directly at the page for a specific item in the repository (for example, via a search engine) also use views you have defined to discover related content. <br /> <br /> There are two default views in EPrints - '''By Year''' and '''By Subject'''. This guide describes how to add additional views to your repository.<br /> <br /> __TOC__<br /> <br /> ===The basics===<br /> <br /> The views for your repository are defined in the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Open this file and find the browse_views configuration setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> fields => "-date;res=year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> fields => "subjects",<br /> ...<br /> },<br /> ];<br /> <br /> The views are defined using a special (Perl) syntax: the view definition consists of a pair of curly braces (''note the comma after each closing brace'') enclosing a list of property/value pairs (note ''the comma'' after each line).<br /> <br /> The key part of the view definition is the '''fields''' property. This names the metadata field (or fields) that EPrints will use to construct the view. For example, for the '''By Year''' view, EPrints groups the records in the repository according to their '''date''' (note that the '''res=year''' suffix tells EPrints to only consider the year part), and constructs a Web page for each date listing the records. Similarly, the '''Browse by Subjscts''' view, groups the records according to the ''subjects'' they have been assigned to (a record may appear in more than one group!).<br /> <br /> Both the '''Browse by Year''' and '''Browse by Subject''' views are constructed using the values of a single field ('''date''' and '''subjects''' respectively).<br /> <br /> It is also possible to construct a view using the ''combined'' values of two or more fields (eg. group records by author '''and''' editor), or even using a sequence of two or more fields (eg. group records by journal title '''and then''' by volume number).<br /> <br /> ===Worked example: browse by organisational structure===<br /> <br /> By default, EPrints has a ''divisions'' metadata field which allows authors to associate their deposits with the divisions (units, faculties, schools, departments, institutes, centres..) that were involved in producing their item (for example, the author's department, and the departments of any co-authors). This worked example allows visitors to browse the repository content by division.<br /> <br /> Open the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Add the following definition to the browse_views setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> ...<br /> },<br /> {<br /> id => "divisions",<br /> fields => "divisions",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> ];<br /> <br /> Save the file and generate the new view pages (this will also re-generate any existing views defined in the views configuration file):<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> Open the view page in a Web browser:<br /> <br /> http://your.repository.url/view/<br /> <br /> The view page lists all the available views. You should see your new views on the list:<br /> <br /> [[Image:View_page2.png|frame|none|The view page lists available views]]<br /> <br /> '''Fixing the undefined phrase warning''' The new view may appear on the views page with an ''undefined phrase'' warning (you may also notice a similar warning message when running generate_views):<br /> <br /> ["viewname_eprint_divisions" not defined]<br /> <br /> Each view you create needs to be assigned a ''human-readable'' name, which EPrints will use on the view Web pages.<br /> <br /> Edit the language-specific phrases file for view names:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/phrases/views.xml<br /> <br /> Add an appropriate phrase which describes the new view, for example:<br /> <br /> <epp:phrase id="viewname_eprint_divisions">Division</epp:phrase><br /> <br /> Save the phrases file and regenerate the view pages:<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> ===Example view definitions===<br /> <br /> ====Browse by type====<br /> <br /> Every deposit in EPrints has a type (article, book, thesis...). To allow visitors to browse your repository content by type, add the following definition to the browse_views setting:<br /> <br /> {<br /> id => "types",<br /> fields => "type",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> ====Browse by author====<br /> <br /> ===Example views (combined fields)===<br /> <br /> ====Browse by author and editor====<br /> <br /> ===Example views (multiple fields)===<br /> <br /> ====Browse by journal title, then by volume====<br /> <br /> This example lets visitors browse the journals items in your repository have been published in, and then volumes within each journal.<br /> <br /> {<br /> id=>"journal_volume",<br /> fields=>"publication,volume",<br /> order=>"-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> [[Image:Browse_by_journal.png|border]]<br /> <br /> [[Image:Browse_by_journal_volume.png|border]]<br /> <br /> <br /> <br /> ===Linking in your view===<br /> <br /> You now need to add a link to your repository pages which takes visitors directly to your new view, or to the views page from where they can access all available views.<br /> <br /> [[Image:Browse_by_navbar.png]]<br /> <br /> ====Generating CVs etc====<br /> <br /> ===Linking items back to views===<br /> ===Views as collections===<br /> <br /> <br /> ===New options in EPrints 3.1===<br /> <br /> ====subfield no longer supported====<br /> <br /> The subfield option is no longer supported in EPrints 3.1.<br /> <br /> ====new_column_at====<br /> <br /> This is an array of integers representing the number of items in a view list before another column is added. For example:<br /> <br /> [ 10 ]<br /> <br /> This would have one column of values until there were 11, then there would be 2 columns.<br /> <br /> [ 10, 10 ]<br /> <br /> This would have one column if there were ten or less values, two columns if there were between eleven and twenty (ten + ten) values, and three columns for all other cases.<br /> <br /> [ 0, 0 ]<br /> <br /> This would always have three columns.<br /> <br /> Add one to the number of integers in the array and you get the maximum number of columns. The value of each integer defines the point at which that column becomes full, and more values cause an 'overflow' into the next column.<br /> <br /> ====variations====<br /> <br /> This controls the various ways in which a browse view can be subheaded. It consists of a list of strings. Each string is the name of a non-compound metadata field (or the keyword DEFAULT, for an unsubheaded list), optionally followed by a semi-colon and a comma separated list of options. E.G:<br /> <br /> variations => [<br /> "creators_name;first_letter",<br /> "type",<br /> "DEFAULT"<br /> ],<br /> <br /> The following options are available:<br /> <br /> =====reverse=====<br /> <br /> Reverses the order in which the groupings are shown. Default is the ordervalue for that field (usually alphanumeric). Useful for dates as you may want the highest values first.<br /> <br /> =====filename=====<br /> <br /> Changes the filename of the view variation. The default is the name of the metadata field used, so if two variations use the same metadata field with different options, this is needed.<br /> <br /> filename=different_filename<br /> <br /> =====first_value=====<br /> <br /> If a field is multiple, only use the first value. Otherwise each item will appear once for each value.<br /> <br /> =====first_initial=====<br /> <br /> If using a name, truncate the given name to the first initial. This will make items like "Les Carr" and "Leslie Carr" appear together. Note it will also make "John Smith" and "Jake Smith" appear together too, showing that you really never can win.<br /> <br /> =====first_letter=====<br /> <br /> The same as 'truncate=1'<br /> <br /> =====truncate=====<br /> <br /> Use the first X characters of a value to group by. truncate=4 may be useful for dates as it will group by the first four digits (the year) only.<br /> <br /> truncate=4<br /> <br /> =====tags=====<br /> <br /> Useful for fields like keywords where values may be separated by commas or semi-colons. The value is split on these two characters ( , and ; ) and a heading is created for each.<br /> <br /> =====cloud=====<br /> <br /> Creates a tag cloud. Sets jump to 'plain', cloudmax to 200, cloudmin to 80 and no_separator, then resizes the jump-to links according to frequency of use.<br /> <br /> =====cloudmax=====<br /> <br /> The % size of the largest tag in a tag cloud.<br /> <br /> =====cloudmin=====<br /> <br /> The % size of the smallest tag in a tag cloud.<br /> <br /> =====jump=====<br /> <br /> jump=plain<br /> <br /> Turns of the 'jump to' text before the list of subheading navigation links.<br /> <br /> =====no_seperator (sic)=====<br /> <br /> Turns of the separator between each subheading navigation link (by default '|').<br /> <br /> =====string=====<br /> <br /> Uses values 'as is'. No ordervalues, no phrases.</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=Adding_new_views&diff=6143 Adding new views 2008-06-24T12:47:31Z

<p>Gobfrey: /* no_seperator */</p> <hr /> <div>{{development}}<br /> <br /> Browse views provide a way for visitors to your site to discover relevant content without a specific item in mind (for example, browsing all the content associated with a particular topic). Visitors arriving directly at the page for a specific item in the repository (for example, via a search engine) also use views you have defined to discover related content. <br /> <br /> There are two default views in EPrints - '''By Year''' and '''By Subject'''. This guide describes how to add additional views to your repository.<br /> <br /> __TOC__<br /> <br /> ===The basics===<br /> <br /> The views for your repository are defined in the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Open this file and find the browse_views configuration setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> fields => "-date;res=year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> fields => "subjects",<br /> ...<br /> },<br /> ];<br /> <br /> The views are defined using a special (Perl) syntax: the view definition consists of a pair of curly braces (''note the comma after each closing brace'') enclosing a list of property/value pairs (note ''the comma'' after each line).<br /> <br /> The key part of the view definition is the '''fields''' property. This names the metadata field (or fields) that EPrints will use to construct the view. For example, for the '''By Year''' view, EPrints groups the records in the repository according to their '''date''' (note that the '''res=year''' suffix tells EPrints to only consider the year part), and constructs a Web page for each date listing the records. Similarly, the '''Browse by Subjscts''' view, groups the records according to the ''subjects'' they have been assigned to (a record may appear in more than one group!).<br /> <br /> Both the '''Browse by Year''' and '''Browse by Subject''' views are constructed using the values of a single field ('''date''' and '''subjects''' respectively).<br /> <br /> It is also possible to construct a view using the ''combined'' values of two or more fields (eg. group records by author '''and''' editor), or even using a sequence of two or more fields (eg. group records by journal title '''and then''' by volume number).<br /> <br /> ===Worked example: browse by organisational structure===<br /> <br /> By default, EPrints has a ''divisions'' metadata field which allows authors to associate their deposits with the divisions (units, faculties, schools, departments, institutes, centres..) that were involved in producing their item (for example, the author's department, and the departments of any co-authors). This worked example allows visitors to browse the repository content by division.<br /> <br /> Open the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Add the following definition to the browse_views setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> ...<br /> },<br /> {<br /> id => "divisions",<br /> fields => "divisions",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> ];<br /> <br /> Save the file and generate the new view pages (this will also re-generate any existing views defined in the views configuration file):<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> Open the view page in a Web browser:<br /> <br /> http://your.repository.url/view/<br /> <br /> The view page lists all the available views. You should see your new views on the list:<br /> <br /> [[Image:View_page2.png|frame|none|The view page lists available views]]<br /> <br /> '''Fixing the undefined phrase warning''' The new view may appear on the views page with an ''undefined phrase'' warning (you may also notice a similar warning message when running generate_views):<br /> <br /> ["viewname_eprint_divisions" not defined]<br /> <br /> Each view you create needs to be assigned a ''human-readable'' name, which EPrints will use on the view Web pages.<br /> <br /> Edit the language-specific phrases file for view names:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/phrases/views.xml<br /> <br /> Add an appropriate phrase which describes the new view, for example:<br /> <br /> <epp:phrase id="viewname_eprint_divisions">Division</epp:phrase><br /> <br /> Save the phrases file and regenerate the view pages:<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> ===Example view definitions===<br /> <br /> ====Browse by type====<br /> <br /> Every deposit in EPrints has a type (article, book, thesis...). To allow visitors to browse your repository content by type, add the following definition to the browse_views setting:<br /> <br /> {<br /> id => "types",<br /> fields => "type",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> ====Browse by author====<br /> <br /> ===Example views (combined fields)===<br /> <br /> ====Browse by author and editor====<br /> <br /> ===Example views (multiple fields)===<br /> <br /> ====Browse by journal title, then by volume====<br /> <br /> This example lets visitors browse the journals items in your repository have been published in, and then volumes within each journal.<br /> <br /> {<br /> id=>"journal_volume",<br /> fields=>"publication,volume",<br /> order=>"-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> [[Image:Browse_by_journal.png|border]]<br /> <br /> [[Image:Browse_by_journal_volume.png|border]]<br /> <br /> <br /> <br /> ===Linking in your view===<br /> <br /> You now need to add a link to your repository pages which takes visitors directly to your new view, or to the views page from where they can access all available views.<br /> <br /> [[Image:Browse_by_navbar.png]]<br /> <br /> ====Generating CVs etc====<br /> <br /> ===Linking items back to views===<br /> ===Views as collections===<br /> <br /> <br /> ===New options in EPrints 3.1===<br /> <br /> ====subfield no longer supported====<br /> <br /> The subfield option is no longer supported in EPrints 3.1.<br /> <br /> ====new_column_at====<br /> <br /> This is an array of integers representing the number of items in a view list before another column is added. For example:<br /> <br /> [ 10 ]<br /> <br /> This would have one column of values until there were 11, then there would be 2 columns.<br /> <br /> [ 10, 10 ]<br /> <br /> This would have one column if there were ten or less values, two columns if there were between eleven and twenty (ten + ten) values, and three columns for all other cases.<br /> <br /> [ 0, 0 ]<br /> <br /> This would always have three columns.<br /> <br /> Add one to the number of integers in the array and you get the maximum number of columns. The value of each integer defines the point at which that column becomes full, and more values cause an 'overflow' into the next column.<br /> <br /> ====variations====<br /> <br /> This controls the various ways in which a browse view can be subheaded. It consists of a list of strings. Each string is the name of a non-compound metadata field (or the keyword DEFAULT, for an unsubheaded list), optionally followed by a semi-colon and a comma separated list of options. E.G:<br /> <br /> variations => [<br /> "creators_name;first_letter",<br /> "type",<br /> "DEFAULT"<br /> ],<br /> <br /> The following options are available:<br /> <br /> =====reverse=====<br /> <br /> Reverses the order in which the groupings are shown. Default is the ordervalue for that field (usually alphanumeric). Useful for dates as you may want the highest values first.<br /> <br /> =====filename=====<br /> <br /> Changes the filename of the view variation. The default is the name of the metadata field used, so if two variations use the same metadata field with different options, this is needed.<br /> <br /> filename=different_filename<br /> <br /> =====first_value=====<br /> <br /> If a field is multiple, only use the first value. Otherwise each item will appear once for each value.<br /> <br /> =====first_initial=====<br /> <br /> If using a name, truncate the given name to the first initial. This will make items like "Les Carr" and "Leslie Carr" appear together. Note it will also make "John Smith" and "Jake Smith" appear together too, showing that you really never can win.<br /> <br /> =====first_letter=====<br /> <br /> The same as 'truncate=1'<br /> <br /> =====truncate=====<br /> <br /> Use the first X characters of a value to group by. truncate=4 may be useful for dates as it will group by the first four digits (the year) only.<br /> <br /> truncate=4<br /> <br /> =====tags=====<br /> <br /> Useful for fields like keywords where values may be separated by commas or semi-colons. The value is split on these two characters ( , and ; ) and treated as a multiple field.<br /> <br /> =====cloud=====<br /> <br /> Creates a tag cloud. Sets jump to 'plain', cloudmax to 200, cloudmin to 80 and no_separator, then resizes the jump-to links according to frequency of use.<br /> <br /> =====cloudmax=====<br /> <br /> The % size of the largest tag in a tag cloud.<br /> <br /> =====cloudmin=====<br /> <br /> The % size of the smallest tag in a tag cloud.<br /> <br /> =====jump=====<br /> <br /> jump=plain<br /> <br /> Turns of the 'jump to' text before the list of subheading navigation links.<br /> <br /> =====no_seperator (sic)=====<br /> <br /> Turns of the separator between each subheading navigation link (by default '|').<br /> <br /> =====string=====<br /> <br /> Uses values 'as is'. No ordervalues, no phrases.</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=Adding_new_views&diff=6142 Adding new views 2008-06-24T12:31:45Z

<p>Gobfrey: /* New options in EPrints 3.1 */</p> <hr /> <div>{{development}}<br /> <br /> Browse views provide a way for visitors to your site to discover relevant content without a specific item in mind (for example, browsing all the content associated with a particular topic). Visitors arriving directly at the page for a specific item in the repository (for example, via a search engine) also use views you have defined to discover related content. <br /> <br /> There are two default views in EPrints - '''By Year''' and '''By Subject'''. This guide describes how to add additional views to your repository.<br /> <br /> __TOC__<br /> <br /> ===The basics===<br /> <br /> The views for your repository are defined in the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Open this file and find the browse_views configuration setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> fields => "-date;res=year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> fields => "subjects",<br /> ...<br /> },<br /> ];<br /> <br /> The views are defined using a special (Perl) syntax: the view definition consists of a pair of curly braces (''note the comma after each closing brace'') enclosing a list of property/value pairs (note ''the comma'' after each line).<br /> <br /> The key part of the view definition is the '''fields''' property. This names the metadata field (or fields) that EPrints will use to construct the view. For example, for the '''By Year''' view, EPrints groups the records in the repository according to their '''date''' (note that the '''res=year''' suffix tells EPrints to only consider the year part), and constructs a Web page for each date listing the records. Similarly, the '''Browse by Subjscts''' view, groups the records according to the ''subjects'' they have been assigned to (a record may appear in more than one group!).<br /> <br /> Both the '''Browse by Year''' and '''Browse by Subject''' views are constructed using the values of a single field ('''date''' and '''subjects''' respectively).<br /> <br /> It is also possible to construct a view using the ''combined'' values of two or more fields (eg. group records by author '''and''' editor), or even using a sequence of two or more fields (eg. group records by journal title '''and then''' by volume number).<br /> <br /> ===Worked example: browse by organisational structure===<br /> <br /> By default, EPrints has a ''divisions'' metadata field which allows authors to associate their deposits with the divisions (units, faculties, schools, departments, institutes, centres..) that were involved in producing their item (for example, the author's department, and the departments of any co-authors). This worked example allows visitors to browse the repository content by division.<br /> <br /> Open the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Add the following definition to the browse_views setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> ...<br /> },<br /> {<br /> id => "divisions",<br /> fields => "divisions",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> ];<br /> <br /> Save the file and generate the new view pages (this will also re-generate any existing views defined in the views configuration file):<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> Open the view page in a Web browser:<br /> <br /> http://your.repository.url/view/<br /> <br /> The view page lists all the available views. You should see your new views on the list:<br /> <br /> [[Image:View_page2.png|frame|none|The view page lists available views]]<br /> <br /> '''Fixing the undefined phrase warning''' The new view may appear on the views page with an ''undefined phrase'' warning (you may also notice a similar warning message when running generate_views):<br /> <br /> ["viewname_eprint_divisions" not defined]<br /> <br /> Each view you create needs to be assigned a ''human-readable'' name, which EPrints will use on the view Web pages.<br /> <br /> Edit the language-specific phrases file for view names:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/phrases/views.xml<br /> <br /> Add an appropriate phrase which describes the new view, for example:<br /> <br /> <epp:phrase id="viewname_eprint_divisions">Division</epp:phrase><br /> <br /> Save the phrases file and regenerate the view pages:<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> ===Example view definitions===<br /> <br /> ====Browse by type====<br /> <br /> Every deposit in EPrints has a type (article, book, thesis...). To allow visitors to browse your repository content by type, add the following definition to the browse_views setting:<br /> <br /> {<br /> id => "types",<br /> fields => "type",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> ====Browse by author====<br /> <br /> ===Example views (combined fields)===<br /> <br /> ====Browse by author and editor====<br /> <br /> ===Example views (multiple fields)===<br /> <br /> ====Browse by journal title, then by volume====<br /> <br /> This example lets visitors browse the journals items in your repository have been published in, and then volumes within each journal.<br /> <br /> {<br /> id=>"journal_volume",<br /> fields=>"publication,volume",<br /> order=>"-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> [[Image:Browse_by_journal.png|border]]<br /> <br /> [[Image:Browse_by_journal_volume.png|border]]<br /> <br /> <br /> <br /> ===Linking in your view===<br /> <br /> You now need to add a link to your repository pages which takes visitors directly to your new view, or to the views page from where they can access all available views.<br /> <br /> [[Image:Browse_by_navbar.png]]<br /> <br /> ====Generating CVs etc====<br /> <br /> ===Linking items back to views===<br /> ===Views as collections===<br /> <br /> <br /> ===New options in EPrints 3.1===<br /> <br /> ====subfield no longer supported====<br /> <br /> The subfield option is no longer supported in EPrints 3.1.<br /> <br /> ====new_column_at====<br /> <br /> This is an array of integers representing the number of items in a view list before another column is added. For example:<br /> <br /> [ 10 ]<br /> <br /> This would have one column of values until there were 11, then there would be 2 columns.<br /> <br /> [ 10, 10 ]<br /> <br /> This would have one column if there were ten or less values, two columns if there were between eleven and twenty (ten + ten) values, and three columns for all other cases.<br /> <br /> [ 0, 0 ]<br /> <br /> This would always have three columns.<br /> <br /> Add one to the number of integers in the array and you get the maximum number of columns. The value of each integer defines the point at which that column becomes full, and more values cause an 'overflow' into the next column.<br /> <br /> ====variations====<br /> <br /> This controls the various ways in which a browse view can be subheaded. It consists of a list of strings. Each string is the name of a non-compound metadata field (or the keyword DEFAULT, for an unsubheaded list), optionally followed by a semi-colon and a comma separated list of options. E.G:<br /> <br /> variations => [<br /> "creators_name;first_letter",<br /> "type",<br /> "DEFAULT"<br /> ],<br /> <br /> The following options are available:<br /> <br /> =====reverse=====<br /> <br /> Reverses the order in which the groupings are shown. Default is the ordervalue for that field (usually alphanumeric). Useful for dates as you may want the highest values first.<br /> <br /> =====filename=====<br /> <br /> Changes the filename of the view variation. The default is the name of the metadata field used, so if two variations use the same metadata field with different options, this is needed.<br /> <br /> filename=different_filename<br /> <br /> =====first_value=====<br /> <br /> If a field is multiple, only use the first value. Otherwise each item will appear once for each value.<br /> <br /> =====first_initial=====<br /> <br /> If using a name, truncate the given name to the first initial. This will make items like "Les Carr" and "Leslie Carr" appear together. Note it will also make "John Smith" and "Jake Smith" appear together too, showing that you really never can win.<br /> <br /> =====first_letter=====<br /> <br /> The same as 'truncate=1'<br /> <br /> =====truncate=====<br /> <br /> Use the first X characters of a value to group by. truncate=4 may be useful for dates as it will group by the first four digits (the year) only.<br /> <br /> truncate=4<br /> <br /> =====tags=====<br /> <br /> Useful for fields like keywords where values may be separated by commas or semi-colons. The value is split on these two characters ( , and ; ) and treated as a multiple field.<br /> <br /> =====cloud=====<br /> <br /> Creates a tag cloud. Sets jump to 'plain', cloudmax to 200, cloudmin to 80 and no_separator, then resizes the jump-to links according to frequency of use.<br /> <br /> =====cloudmax=====<br /> <br /> The % size of the largest tag in a tag cloud.<br /> <br /> =====cloudmin=====<br /> <br /> The % size of the smallest tag in a tag cloud.<br /> <br /> =====jump=====<br /> <br /> jump=plain<br /> <br /> Turns of the 'jump to' text before the list of subheading navigation links.<br /> <br /> =====no_seperator=====<br /> <br /> Turns of the separator between each subheading navigation link (by default '|').<br /> <br /> =====string=====<br /> <br /> Uses values 'as is'. No ordervalues, no phrases.</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=Adding_new_views&diff=6141 Adding new views 2008-06-24T11:53:20Z

<p>Gobfrey: /* New options in EPrints 3.1 */</p> <hr /> <div>{{development}}<br /> <br /> Browse views provide a way for visitors to your site to discover relevant content without a specific item in mind (for example, browsing all the content associated with a particular topic). Visitors arriving directly at the page for a specific item in the repository (for example, via a search engine) also use views you have defined to discover related content. <br /> <br /> There are two default views in EPrints - '''By Year''' and '''By Subject'''. This guide describes how to add additional views to your repository.<br /> <br /> __TOC__<br /> <br /> ===The basics===<br /> <br /> The views for your repository are defined in the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Open this file and find the browse_views configuration setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> fields => "-date;res=year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> fields => "subjects",<br /> ...<br /> },<br /> ];<br /> <br /> The views are defined using a special (Perl) syntax: the view definition consists of a pair of curly braces (''note the comma after each closing brace'') enclosing a list of property/value pairs (note ''the comma'' after each line).<br /> <br /> The key part of the view definition is the '''fields''' property. This names the metadata field (or fields) that EPrints will use to construct the view. For example, for the '''By Year''' view, EPrints groups the records in the repository according to their '''date''' (note that the '''res=year''' suffix tells EPrints to only consider the year part), and constructs a Web page for each date listing the records. Similarly, the '''Browse by Subjscts''' view, groups the records according to the ''subjects'' they have been assigned to (a record may appear in more than one group!).<br /> <br /> Both the '''Browse by Year''' and '''Browse by Subject''' views are constructed using the values of a single field ('''date''' and '''subjects''' respectively).<br /> <br /> It is also possible to construct a view using the ''combined'' values of two or more fields (eg. group records by author '''and''' editor), or even using a sequence of two or more fields (eg. group records by journal title '''and then''' by volume number).<br /> <br /> ===Worked example: browse by organisational structure===<br /> <br /> By default, EPrints has a ''divisions'' metadata field which allows authors to associate their deposits with the divisions (units, faculties, schools, departments, institutes, centres..) that were involved in producing their item (for example, the author's department, and the departments of any co-authors). This worked example allows visitors to browse the repository content by division.<br /> <br /> Open the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Add the following definition to the browse_views setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> ...<br /> },<br /> {<br /> id => "divisions",<br /> fields => "divisions",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> ];<br /> <br /> Save the file and generate the new view pages (this will also re-generate any existing views defined in the views configuration file):<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> Open the view page in a Web browser:<br /> <br /> http://your.repository.url/view/<br /> <br /> The view page lists all the available views. You should see your new views on the list:<br /> <br /> [[Image:View_page2.png|frame|none|The view page lists available views]]<br /> <br /> '''Fixing the undefined phrase warning''' The new view may appear on the views page with an ''undefined phrase'' warning (you may also notice a similar warning message when running generate_views):<br /> <br /> ["viewname_eprint_divisions" not defined]<br /> <br /> Each view you create needs to be assigned a ''human-readable'' name, which EPrints will use on the view Web pages.<br /> <br /> Edit the language-specific phrases file for view names:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/phrases/views.xml<br /> <br /> Add an appropriate phrase which describes the new view, for example:<br /> <br /> <epp:phrase id="viewname_eprint_divisions">Division</epp:phrase><br /> <br /> Save the phrases file and regenerate the view pages:<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> ===Example view definitions===<br /> <br /> ====Browse by type====<br /> <br /> Every deposit in EPrints has a type (article, book, thesis...). To allow visitors to browse your repository content by type, add the following definition to the browse_views setting:<br /> <br /> {<br /> id => "types",<br /> fields => "type",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> ====Browse by author====<br /> <br /> ===Example views (combined fields)===<br /> <br /> ====Browse by author and editor====<br /> <br /> ===Example views (multiple fields)===<br /> <br /> ====Browse by journal title, then by volume====<br /> <br /> This example lets visitors browse the journals items in your repository have been published in, and then volumes within each journal.<br /> <br /> {<br /> id=>"journal_volume",<br /> fields=>"publication,volume",<br /> order=>"-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> [[Image:Browse_by_journal.png|border]]<br /> <br /> [[Image:Browse_by_journal_volume.png|border]]<br /> <br /> <br /> <br /> ===Linking in your view===<br /> <br /> You now need to add a link to your repository pages which takes visitors directly to your new view, or to the views page from where they can access all available views.<br /> <br /> [[Image:Browse_by_navbar.png]]<br /> <br /> ====Generating CVs etc====<br /> <br /> ===Linking items back to views===<br /> ===Views as collections===<br /> <br /> <br /> ===New options in EPrints 3.1===<br /> <br /> ====subfield no longer supported====<br /> <br /> The subfield option is no longer supported in EPrints 3.1.<br /> <br /> ====new_column_at====<br /> <br /> This is an array of integers representing the number of items in a view list before another column is added. For example:<br /> <br /> [ 10 ]<br /> <br /> This would have one column of values until there were 11, then there would be 2 columns.<br /> <br /> [ 10, 10 ]<br /> <br /> This would have one column if there were ten or less values, two columns if there were between eleven and twenty (ten + ten) values, and three columns for all other cases.<br /> <br /> [ 0, 0 ]<br /> <br /> This would always have three columns.<br /> <br /> Add one to the number of integers in the array and you get the maximum number of columns. The value of each integer defines the point at which that column becomes full, and more values cause an 'overflow' into the next column.</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=Adding_new_views&diff=6140 Adding new views 2008-06-24T11:45:06Z

<p>Gobfrey: </p> <hr /> <div>{{development}}<br /> <br /> Browse views provide a way for visitors to your site to discover relevant content without a specific item in mind (for example, browsing all the content associated with a particular topic). Visitors arriving directly at the page for a specific item in the repository (for example, via a search engine) also use views you have defined to discover related content. <br /> <br /> There are two default views in EPrints - '''By Year''' and '''By Subject'''. This guide describes how to add additional views to your repository.<br /> <br /> __TOC__<br /> <br /> ===The basics===<br /> <br /> The views for your repository are defined in the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Open this file and find the browse_views configuration setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> fields => "-date;res=year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> fields => "subjects",<br /> ...<br /> },<br /> ];<br /> <br /> The views are defined using a special (Perl) syntax: the view definition consists of a pair of curly braces (''note the comma after each closing brace'') enclosing a list of property/value pairs (note ''the comma'' after each line).<br /> <br /> The key part of the view definition is the '''fields''' property. This names the metadata field (or fields) that EPrints will use to construct the view. For example, for the '''By Year''' view, EPrints groups the records in the repository according to their '''date''' (note that the '''res=year''' suffix tells EPrints to only consider the year part), and constructs a Web page for each date listing the records. Similarly, the '''Browse by Subjscts''' view, groups the records according to the ''subjects'' they have been assigned to (a record may appear in more than one group!).<br /> <br /> Both the '''Browse by Year''' and '''Browse by Subject''' views are constructed using the values of a single field ('''date''' and '''subjects''' respectively).<br /> <br /> It is also possible to construct a view using the ''combined'' values of two or more fields (eg. group records by author '''and''' editor), or even using a sequence of two or more fields (eg. group records by journal title '''and then''' by volume number).<br /> <br /> ===Worked example: browse by organisational structure===<br /> <br /> By default, EPrints has a ''divisions'' metadata field which allows authors to associate their deposits with the divisions (units, faculties, schools, departments, institutes, centres..) that were involved in producing their item (for example, the author's department, and the departments of any co-authors). This worked example allows visitors to browse the repository content by division.<br /> <br /> Open the views configuration file:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/cfg.d/views.pl<br /> <br /> Add the following definition to the browse_views setting:<br /> <br /> $c->{browse_views} = [<br /> {<br /> id => "year",<br /> ...<br /> },<br /> {<br /> id => "subjects",<br /> ...<br /> },<br /> {<br /> id => "divisions",<br /> fields => "divisions",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> ];<br /> <br /> Save the file and generate the new view pages (this will also re-generate any existing views defined in the views configuration file):<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> Open the view page in a Web browser:<br /> <br /> http://your.repository.url/view/<br /> <br /> The view page lists all the available views. You should see your new views on the list:<br /> <br /> [[Image:View_page2.png|frame|none|The view page lists available views]]<br /> <br /> '''Fixing the undefined phrase warning''' The new view may appear on the views page with an ''undefined phrase'' warning (you may also notice a similar warning message when running generate_views):<br /> <br /> ["viewname_eprint_divisions" not defined]<br /> <br /> Each view you create needs to be assigned a ''human-readable'' name, which EPrints will use on the view Web pages.<br /> <br /> Edit the language-specific phrases file for view names:<br /> <br /> /opt/eprints3/archives/ARCHIVEID/cfg/lang/en/phrases/views.xml<br /> <br /> Add an appropriate phrase which describes the new view, for example:<br /> <br /> <epp:phrase id="viewname_eprint_divisions">Division</epp:phrase><br /> <br /> Save the phrases file and regenerate the view pages:<br /> <br /> bin/generate_views ARCHIVEID --verbose<br /> <br /> ===Example view definitions===<br /> <br /> ====Browse by type====<br /> <br /> Every deposit in EPrints has a type (article, book, thesis...). To allow visitors to browse your repository content by type, add the following definition to the browse_views setting:<br /> <br /> {<br /> id => "types",<br /> fields => "type",<br /> order => "-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> ====Browse by author====<br /> <br /> ===Example views (combined fields)===<br /> <br /> ====Browse by author and editor====<br /> <br /> ===Example views (multiple fields)===<br /> <br /> ====Browse by journal title, then by volume====<br /> <br /> This example lets visitors browse the journals items in your repository have been published in, and then volumes within each journal.<br /> <br /> {<br /> id=>"journal_volume",<br /> fields=>"publication,volume",<br /> order=>"-date/title",<br /> hideempty => 1,<br /> },<br /> <br /> [[Image:Browse_by_journal.png|border]]<br /> <br /> [[Image:Browse_by_journal_volume.png|border]]<br /> <br /> <br /> <br /> ===Linking in your view===<br /> <br /> You now need to add a link to your repository pages which takes visitors directly to your new view, or to the views page from where they can access all available views.<br /> <br /> [[Image:Browse_by_navbar.png]]<br /> <br /> ====Generating CVs etc====<br /> <br /> ===Linking items back to views===<br /> ===Views as collections===<br /> <br /> <br /> ===New options in EPrints 3.1===</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=Migration&diff=6131 Migration 2008-06-13T18:28:27Z

<p>Gobfrey: /* export3data.pl */</p> <hr /> <div>This page covers how to migrate from EPrints 2 to EPrints 3.<br /> <br /> == Migration Toolkit ==<br /> <br /> The migration toolkit, available from http://files.eprints.org/ does quite a bit of the heavy lifting. It is intended to help configure an EP3 archive to have the same files, eprint types etc. as an EPrint 2 repository and then copy the data over.<br /> <br /> Release 1.0-beta-1 should be a big improvement over 0.2 but it still doesn't do everything. <br /> <br /> === Installation ===<br /> <br /> ==== Backup ====<br /> <br /> First of all make sure your EPrints 2 repository is backed up, just in case things don't go to plan. You already back it up daily anyway, right...?<br /> <br /> ==== Mtoolkit ====<br /> <br /> Un-tar the package on the same machine as your EPrints 2 repository.<br /> <br /> If your EPrints 2 was not installed in /opt/eprints2 then you'll need to modify the first line of the two .pl scripts in the toolkit.<br /> <br /> ==== EPrints 3 ====<br /> <br /> Minimum version required: 3.0.2 (This version introduces some very small options and bugfixes aimed at migration).<br /> <br /> Also, get an EPrints 3 server set up. This can be either on the same machine (you'll need a separate instance of apache as ep2 and ep3 can't run under the same server at the same time, put it on port 8080 for now - see http://httpd.apache.org/docs/2.0/install.html for instructions - put it in another directory using the --PREFIX option!), or on a different machine. Get a repository created (probably with the same ID as your ep2 repo, although that's not essential). The database will need to be a different name or you'll get in an utter mess.<br /> <br /> === mkconfig.pl ===<br /> <br /> This tool takes the id of an EPrints 2 repository and generates a number of EPrints 3 config. files. Copy these files into the cfg dir of your EPrints 3 repository. It also creates a file called migration_notes.txt with some helpful comments of anything it's messed with.<br /> <br /> Get your (empty) EP3 repository up and running using these configuration files. <br /> <br /> === export3data.pl ===<br /> <br /> This script exports the data from your EPrints 2 repostory in a format which can be imported by EPrints 3.<br /> <br /> There have been some problems with exporting non Latin characters (e.g. letters with accents). If you have any problems, these can probably be solved by editing the export3data script and adding the following line (put it just under the first line).<br /> <br /> use encoding 'utf8';<br /> <br /> To export the data do the following:<br /> <br /> export3data.pl ARCHIVEID eprints > eprints.xml<br /> export3data.pl ARCHIVEID users > users.xml<br /> export3data.pl ARCHIVEID subjects > subjects.xml<br /> <br /> eprints.xml references the full paths of the files in EPrints 2. If your EPrints 3 is on a different machine you'll need to either make sure they are the same on the new machine or do a big search-and-replace on eprints.xml!<br /> <br /> If the script has any problems, run with the 'skiplog' argument:<br /> <br /> export3data.pl --skiplog errors.txt ARCHIVEID eprints > eprints.xml<br /> <br /> Any items with problems will be ignored, but the ids of them will be recorded in the 'errors.txt' file. Export these by hand if they are important.<br /> <br /> === Importing ===<br /> <br /> EPrints 3.0.2 no longer needs the hacks which were required for mtoolkit 0.2<br /> <br /> === Empty out any test data ===<br /> <br /> To erase the current data in your EP3 repository use:<br /> <br /> bin/epadmin erase_data ARCHIVEID<br /> <br /> === Import the data ===<br /> <br /> To import the subjects and users do:<br /> /opt/eprints3/bin/import_subjects --verbose --force --xml ARCHIVEID subjects.xml<br /> /opt/eprints3/bin/import --verbose --migration ARCHIVEID user XML users.xml<br /> If something goes wrong with subjects or users, use epadmin erase_data to empty the database and start again.<br /> <br /> To import the EPrints do:<br /> /opt/eprints3/bin/import --verbose --migration ARCHIVEID eprint XML eprints.xml<br /> If something goes wrong with importing the eprints, use epadmin erase_eprints, to just erase the eprints data so you don't need to redo subjects and users.<br /> <br /> the --migration option tells the importer to:<br /> * skip are-you-sure? messages.<br /> * use the eprintid and userid from the XML rather than assigning them.<br /> * use the "datestamp" from the XML rather than assign it.<br /> * load files from the local file system (normally this would be a security hole)<br /> <br /> You may encounter some issues with badly formed XML. This is due to non correctly encoded data creeping into your database. It should all be utf-8 but earlier versions of EPrints didn't always check... If your EPrints 2 server is running perl 5.8 you can install the Perl module Encode which will clean up your data, but on our system our EPrints 2 was running on a machine with an older version of Perl and we didn't want to risk upgrading.<br /> <br /> == Finishing up after using mtoolkit ==<br /> <br /> You will probably still want to tweak some of the following things by hand, depending how much you customised EPrints 2:<br /> <br /> Some of these we can't easily add to the mtoolkit (those involving perl code). The XML files we could add in theory, but we've made a decision to release 1.0 with the current features, rather than delay it months but make it perfect.<br /> <br /> * the template<br /> * the workflow (EPrints 3 offers some nice features, look at the lib/defaultcfg/workflows/ for an idea of what you can do)<br /> * the static pages (.xpage)<br /> * the citation files<br /> * the /view/ browsing configuration<br /> * the search configuration<br /> * any custom render routines<br /> * the render eprint method (eprint_render.pl)<br /> * any custom document security options<br /> * any custom validation options<br /> * etc.<br /> <br /> Feel free to add tips on the wiki, linked from this section.<br /> <br /> <br /> == Known bugs in version 1.0 of toolkit / importing into EPrints 3.0.2 ==<br /> <br /> === Documents with subdirectories fail to import ===<br /> <br /> FIX: do them by hand at the end.<br /> <br /> === Warning messages about "hideemail" ===<br /> <br /> hideemail was introduced in a version of EPrints 2 (I forget which). Earlier repositories may not have this field. Some of the EPrints 3 default config files assume it exists (user_fields_default.pl and user_render.pl).<br /> <br /> FIX 1: Don't worry about it.<br /> <br /> FIX 2: Before importing users.xml, add the hideemail field back into user_fields.pl<br /> {<br /> 'name' => 'hideemail',<br /> 'input_style' => 'radio',<br /> 'type' => 'boolean',<br /> },<br /> <br /> === Error missing field: X ===<br /> <br /> The default EPrints 3 config. may reference a field not imported. If so you can almost always just remove the offending section of configuration. Examples: searches, citations, views.<br /> <br /> === Problems with bad characters in eprints.xml ===<br /> <br /> This is not tested, but I think this should clean it up...<br /> iconv -c eprints.xml --output=eprints_cleaned.xml -f utf-8 -t utf-8<br /> <br /> === Warning about Pagerange ===<br /> <br /> Argument "" isn't numeric in addition (+) at<br /> /opt/eprints3/perl_lib/EPrints/MetaField/Pagerange.pm line 182.<br /> <br /> This is a warning that is caused by having non-numeric data in the pagerange field. eg. "iii-xi".<br /> <br /> FIX: Don't worry about it.<br /> <br /> === Can't import files which contain "/" ===<br /> <br /> eg if your document had index.html and images/dia.jpg<br /> <br /> FIX: Make a note of the offenders, and just add those documents by hand. <br /> <br /> FIX2: Bug chris to add this to fix this in the final release of 3.0.2 (it's not in beta-1)</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4361 IRStats 2007-05-30T19:55:58Z

<p>Gobfrey: </p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints. For more detailed information, please see the [[IRStats Technical Documentation]].<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> In order for IRStats to run, it requires two things:<br /> <br /> * a database table containing all hits to the repository<br /> * text files describing the contents of the repository<br /> <br /> === The Hits Table ===<br /> <br /> Awaiting a redevelopment.<br /> <br /> === The Text Files ===<br /> <br /> In order for IRStats to build up a picture of a repository, a number of text files need to be created and stored in the cfg/ directory:<br /> <br /> * epstats_set_membership.txt<br /> * epstats_set_member_codes.txt<br /> * epstats_set_member_full_citations.txt<br /> * epstats_set_member_short_citations.txt<br /> * epstats_set_member_urls.txt<br /> <br /> ==== Explanation by Example ====<br /> <br /> Imagine a very small repository. Here are its contents:<br /> <br /> * eprints<br /> ** (1) The Smells of Cheese<br /> ** (2) The Tastes of Wines<br /> ** (3) The Sounds of Oboes<br /> * Authors<br /> ** (1) John Smith<br /> ** (2) Harriet Jones<br /> <br /> If we then imagine that the following are also true:<br /> <br /> * John Smith is credited with being an author of eprints (1) and (2)<br /> * Harriet Jones is credited with being an author of eprints (2) and (3)<br /> * All three eprints are the output of a research group named "Senses"<br /> <br /> ===== Creating sets =====<br /> <br /> Sets are groups of eprints, and every eprint is a member of at least one set (the set containing only that eprint). From the information above, we have three sets. The eprint set, the author set and the research group set. We need to add the following to epstats_set_membership.txt (the format is <id><tab><csv list of eprint ids><br /> <br /> author_1 1,2<br /> author_2 2,3<br /> group_1 1,2,3<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> ===== Giving Sets IDs =====<br /> <br /> So, we now have some sets, but we need to give them unique IDs so that we can retrieve stats for these sets. To do this, we add the following to epstats_set_member_codes.txt:<br /> <br /> author_1 js<br /> author_2 hj<br /> group_1 senses<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> IRStats now assigns the following unique IDs to each set: author_js, author_hj, group_senses, eprint_1, eprint_2, eprint_3. Note that the IDs should probably be kept alphanumeric, and must be unique within a class of sets (but you can have author_hj, group_hj and eprint_hj).<br /> <br /> ===== Citations =====<br /> <br /> IRStats uses two citations for each set member, one short and one long. Which you use depends on how you would like your visualisation to look. However, we do need to add these to the citations files:<br /> <br /> epstats_set_member_short_citations.txt<br /> author_1 Smith<br /> <br /> epstats_set_member_full_citations.txt<br /> author_1 Dr John Smith, PhD<br /> <br /> Note that the above examples are only for author_1. It would be exactly the same for any set member.<br /> <br /> ===== URLs =====<br /> <br /> Although URLs are not currently implemented, it is probably a good idea to include this information (in epstats_set_member_urls.txt) for future functionality.<br /> <br /> author_1 http://homepage.john.smith.com/<br /> <br /> == Installing IRStats ==<br /> <br /> <br /> <br /> === Dependencies ===<br /> <br /> ==== Logfile::EPrints ====<br /> <br /> The Logfile::Eprints modules are used to assist in filtering the raw access log. They can be installed from CPAN.<br /> <br /> ==== AWStats ====<br /> <br /> AWStats data is used to filter out webspiders and classify search engines. The irstats.cfg must have an entry showing where the correct perl modules are.<br /> <br /> ==== Geo::IP ====<br /> <br /> Geo::IP is used to fill in country and organisation information. The country database is free, but if you want organisation information, you will have to purchase a subscription for their database. The location of the database should also be inserted into irstats.cfg.<br /> <br /> Note: The pure perl version of Geo::IP does not support organisations.<br /> <br /> === Installing ===<br /> <br /> === Customising ===<br /> <br /> It will almost always be necessary to perform some customisation on IRStats because every repository is different.<br /> <br /> ==== Updating the Table ====</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4356 IRStats Technical Documentation 2007-05-30T19:41:22Z

<p>Gobfrey: </p> <hr /> <div>= Directory Structure =<br /> <br /> == /opt/irstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the irstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in irstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/irstats/cache ==<br /> Contains cache files. These should probably be deleted whenever the database is updated.<br /> <br /> == /opt/irstats/cgi ==<br /> <br /> Contains two scripts, 'get_view and 'stats'.<br /> <br /> *get_view returns the output of a IRStats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/irstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/irstats/cfg ==<br /> <br /> Where the configuration file and the text files containing repository data are held.<br /> <br /> === The Configuration File ===<br /> <br /> irstats.cfg contains a number of configuration strings. Here are some of the more important ones, with the default in brackets:<br /> <br /> *configuration_path (/opt/irstats/cfg/) - The path of the configuration directory.<br /> *view_path (/opt/irstats/perl_lib/IRStats/View/) - The directory containing the Views.<br /> *cache_path (/opt/irstats/cache/) - The directory in which to store cache files.<br /> *graph_path (/opt/irstats/img/graphs/) - The directory in which to store graph images.<br /> *graph_relative_url_path (/img/graphs/) - The url of the directory in which the graph file is from the point of view of the web browser.<br /> *update_lock_filename (/opt/irstats/bin/.lock) - The name of the file that is created to prevent the update process running twice concurrently<br /> *The names of the files used to store set information<br /> **set_member_full_citations_file (/opt/irstats/cfg/irstats_set_member_full_citations.txt)<br /> **set_member_short_citations_file (/opt/irstats/cfg/irstats_set_member_short_citations.txt)<br /> **set_membership_file (/opt/irstats/cfg/irstats_set_membership.txt)<br /> **set_member_codes_file (/opt/irstats/cfg/irstats_set_member_codes.txt)<br /> **set_member_urls_file (/opt/irstats/cfg/irstats_set_member_urls.txt)<br /> *Referrer Scope Labels (note, if you change these, you should also change them in the database)<br /> **referrer_scope_1 (Internal)<br /> **referrer_scope_2 (ECS)<br /> **referrer_scope_3 (Search)<br /> **referrer_scope_4 (External)<br /> **referrer_scope_no_referrer (None)<br /> *awstats_search_engines (/usr/local/awstats/wwwroot/cgi-bin/lib/search_engines.pm) - The path to the awstats search engine module<br /> *repeats_filter_file (/opt/irstats/bin/repeatscache) - The file to maintain state between updates<br /> *repeats_filter_timeout (86400) - repeat timeout in seconds (the amount of time there needs to be between two hits for them both to be recorded, initially set to 60*60*24)<br /> <br /> *repository_url = http://eprints.ecs.soton.ac.uk - the path to the repository<br /> <br /> *database configuration<br /> **database_driver (mysql)<br /> **database_server (localhost)<br /> **database_name<br /> **database_user<br /> **database_password<br /> <br /> *database_id_columns ([ requester_organisation, requester_host, referrer_scope, search_engine, search_terms, referring_entity_id ]) - The columns in the database that have a UID rather than data. These need seperate tables in which to store the data.<br /> <br /> *Various table names and parts of names<br /> **database_eprints_access_log_table (accesslog) ##Perhaps remove after update rewrite.<br /> **database_main_stats_table (irstats_true_accesses_table)<br /> **database_column_table_prefix (irstats_column_)<br /> **database_set_table_prefix (irstats_set_)<br /> **database_set_table_code_suffix (_code)<br /> **database_set_table_citation_suffix (_citation)<br /> <br /> *id_parameters ([ start_date, end_date, eprints, view ]) - the parameters that are used to uniquely identify a view<br /> *host_lookup_temp_dir (/opt/irstats/bin/convert_hosts_temp_files/) - The directory in which to store temp files for host lookups<br /> <br /> <br /> == /opt/irstats/perl_lib ==<br /> <br /> Contains all the irstats classes.<br /> <br /> = IRStats Classes =<br /> <br /> Note that the leading IRStats:: has been left out for brevity.<br /> <br /> == Configuration ==<br /> This object acts as an interface to the configuration file.<br /> === Configuration Contstants ===<br /> *$configuration_file - The path to the configuration file.<br /> === Functions ===<br /> *new - Parses the configuration file and returns a new object.<br /> *get_value(config_id) - Returns a value.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. This is passed around the system.<br /> === Configuration Constants ===<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(Configuration, [ CGI_object | params_hash ]) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> === Functions ===<br /> *new(Configuration) - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. This can be used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_stats(params_object, query_params_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in query params hash. The query params hash can contain the following key/value pairs<br /> **columns => column_name_array - Which columns are we interested in?<br /> **order => column_name - A hash containing a column name and directions (ASC or DESC)<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> **where => where_hash_array - if additional logic needs to be applied, this array contains hashes containing a column name, an operator and a value. These are ANDed together.<br /> *check_tables() - If any IRStats tables are missing, this function will create them.<br /> *insert_main_table_row(column_array) - inserts the values in the array into the main table (taking into account any tables that contain only IDs).<br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results. This is the only point where sql is sent to the database.<br /> <br /> == Date ==<br /> A date object was implemented because there were some specific things that needed to be done with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *difference(date_object) - returns the difference in days between itself and another date.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both IRStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script.<br /> ===Functions===<br /> new(params_obj, database_interface_object) - returns the object.<br /> start_date_control() - returns the html for the three drop-boxes for selecting the year, month and day of the start date.<br /> end_date_control() - return the html for the three drop-boxes for selecting the year, month and day of the end date.<br /> eprint_control() - returns the html for the eprints text box.<br /> drop_box(id, contents_array) - returns the html for a drop box containing what is in the array (each array element is a hash containing 'value' and 'display').<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers IRStats.<br /> <br /> === View::DownloadCountHTML ===<br /> The DownloadCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package IRStats::View::DownloadCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use IRStats::DatabaseInterface;<br /> use IRStats::Cache;<br /> use IRStats::Visualisation::HTML;<br /> use IRStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ IRStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We aren't actually interested in any columns, just in the count, but we put that in the columns array anyway.<br /> We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_params'} = {columns => [ 'COUNT' ]};<br /> $self->{'visualisation'} = IRStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Almost every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = IRStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="irstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = IRStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have. This is to prevent sending huge tables to browsers which may not be able to handle it.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4355 IRStats Technical Documentation 2007-05-30T19:17:08Z

<p>Gobfrey: </p> <hr /> <div>= Directory Structure =<br /> <br /> == /opt/irstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the irstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in irstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/irstats/cache ==<br /> Contains cache files. These should probably be deleted whenever the database is updated.<br /> <br /> == /opt/irstats/cgi ==<br /> <br /> Contains two scripts, 'get_view and 'stats'.<br /> <br /> *get_view returns the output of a IRStats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/irstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/irstats/cfg ==<br /> <br /> Where the configuration file and the text files containing repository data are held.<br /> <br /> === The Configuration File ===<br /> <br /> irstats.cfg contains a number of configuration strings. Here are some of the more important ones, with the default in brackets:<br /> <br /> *configuration_path (/opt/irstats/cfg/) - The path of the configuration directory.<br /> *view_path (/opt/irstats/perl_lib/IRStats/View/) - The directory containing the Views.<br /> *cache_path (/opt/irstats/cache/) - The directory in which to store cache files.<br /> *graph_path (/opt/irstats/img/graphs/) - The directory in which to store graph images.<br /> *graph_relative_url_path (/img/graphs/) - The url of the directory in which the graph file is from the point of view of the web browser.<br /> *update_lock_filename (/opt/irstats/bin/.lock) - The name of the file that is created to prevent the update process running twice concurrently<br /> *The names of the files used to store set information<br /> **set_member_full_citations_file (/opt/irstats/cfg/irstats_set_member_full_citations.txt)<br /> **set_member_short_citations_file (/opt/irstats/cfg/irstats_set_member_short_citations.txt)<br /> **set_membership_file (/opt/irstats/cfg/irstats_set_membership.txt)<br /> **set_member_codes_file (/opt/irstats/cfg/irstats_set_member_codes.txt)<br /> **set_member_urls_file (/opt/irstats/cfg/irstats_set_member_urls.txt)<br /> *Referrer Scope Labels (note, if you change these, you should also change them in the database)<br /> **referrer_scope_1 (Internal)<br /> **referrer_scope_2 (ECS)<br /> **referrer_scope_3 (Search)<br /> **referrer_scope_4 (External)<br /> **referrer_scope_no_referrer (None)<br /> *awstats_search_engines (/usr/local/awstats/wwwroot/cgi-bin/lib/search_engines.pm) - The path to the awstats search engine module<br /> *repeats_filter_file (/opt/irstats/bin/repeatscache) - The file to maintain state between updates<br /> *repeats_filter_timeout (86400) - repeat timeout in seconds (the amount of time there needs to be between two hits for them both to be recorded, initially set to 60*60*24)<br /> <br /> *repository_url = http://eprints.ecs.soton.ac.uk - the path to the repository<br /> <br /> *database configuration<br /> **database_driver (mysql)<br /> **database_server (localhost)<br /> **database_name<br /> **database_user<br /> **database_password<br /> <br /> *database_id_columns ([ requester_organisation, requester_host, referrer_scope, search_engine, search_terms, referring_entity_id ]) - The columns in the database that have a UID rather than data. These need seperate tables in which to store the data.<br /> <br /> *Various table names and parts of names<br /> **database_eprints_access_log_table (accesslog) ##Perhaps remove after update rewrite.<br /> **database_main_stats_table (irstats_true_accesses_table)<br /> **database_column_table_prefix (irstats_column_)<br /> **database_set_table_prefix (irstats_set_)<br /> **database_set_table_code_suffix (_code)<br /> **database_set_table_citation_suffix (_citation)<br /> <br /> *id_parameters ([ start_date, end_date, eprints, view ]) - the parameters that are used to uniquely identify a view<br /> *host_lookup_temp_dir (/opt/irstats/bin/convert_hosts_temp_files/) - The directory in which to store temp files for host lookups<br /> <br /> <br /> == /opt/irstats/perl_lib ==<br /> <br /> Contains all the irstats classes.<br /> <br /> = IRStats Classes =<br /> <br /> Note that the leading IRStats:: has been left out for brevity.<br /> <br /> == Configuration ==<br /> This object acts as an interface to the configuration file.<br /> === Configuration Contstants ===<br /> *$configuration_file - The path to the configuration file.<br /> === Functions ===<br /> *new - Parses the configuration file and returns a new object.<br /> *get_value(config_id) - Returns a value.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. This is passed around the system.<br /> === Configuration Constants ===<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(Configuration, [ CGI_object | params_hash ]) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> === Functions ===<br /> *new(Configuration) - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. This can be used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_stats(params_object, query_params_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in query params hash. The query params hash can contain the following key/value pairs<br /> **columns => column_name_array - Which columns are we interested in?<br /> **order => column_name - A hash containing a column name and directions (ASC or DESC)<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> **where => where_hash_array - if additional logic needs to be applied, this array contains hashes containing a column name, an operator and a value. These are ANDed together.<br /> *check_tables() - If any IRStats tables are missing, this function will create them.<br /> *insert_main_table_row(column_array) - inserts the values in the array into the main table (taking into account any tables that contain only IDs).<br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results. This is the only point where sql is sent to the database.<br /> <br /> == Date ==<br /> A date object was implemented because there were some specific things that needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *difference(date_object) - returns the difference in days between itself and another date.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both IRStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers IRStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package IRStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use IRStats::DatabaseInterface;<br /> use IRStats::Cache;<br /> use IRStats::Visualisation::HTML;<br /> use IRStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ IRStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = IRStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = IRStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="irstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = IRStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4354 IRStats Technical Documentation 2007-05-30T18:55:11Z

<p>Gobfrey: </p> <hr /> <div>= Directory Structure =<br /> <br /> == /opt/irstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the irstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in irstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/irstats/cache ==<br /> Contains cache files. These should probably be deleted whenever the database is updated.<br /> <br /> == /opt/irstats/cgi ==<br /> <br /> Contains two scripts, 'get_view and 'stats'.<br /> <br /> *get_view returns the output of a IRStats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/irstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/irstats/cfg ==<br /> <br /> Where the configuration file and the text files containing repository data are held.<br /> <br /> === The Configuration File ===<br /> <br /> irstats.cfg contains a number of configuration strings. Here are some of the more important ones, with the default in brackets:<br /> <br /> *configuration_path (/opt/irstats/cfg/) - The path of the configuration directory.<br /> *view_path (/opt/irstats/perl_lib/IRStats/View/) - The directory containing the Views.<br /> *cache_path (/opt/irstats/cache/) - The directory in which to store cache files.<br /> *graph_path (/opt/irstats/img/graphs/) - The directory in which to store graph images.<br /> *graph_relative_url_path (/img/graphs/) - The url of the directory in which the graph file is from the point of view of the web browser.<br /> *update_lock_filename (/opt/irstats/bin/.lock) - The name of the file that is created to prevent the update process running twice concurrently<br /> *The names of the files used to store set information<br /> **set_member_full_citations_file (/opt/irstats/cfg/irstats_set_member_full_citations.txt)<br /> **set_member_short_citations_file (/opt/irstats/cfg/irstats_set_member_short_citations.txt)<br /> **set_membership_file (/opt/irstats/cfg/irstats_set_membership.txt)<br /> **set_member_codes_file (/opt/irstats/cfg/irstats_set_member_codes.txt)<br /> **set_member_urls_file (/opt/irstats/cfg/irstats_set_member_urls.txt)<br /> *Referrer Scope Labels (note, if you change these, you should also change them in the database)<br /> **referrer_scope_1 (Internal)<br /> **referrer_scope_2 (ECS)<br /> **referrer_scope_3 (Search)<br /> **referrer_scope_4 (External)<br /> **referrer_scope_no_referrer (None)<br /> *awstats_search_engines (/usr/local/awstats/wwwroot/cgi-bin/lib/search_engines.pm) - The path to the awstats search engine module<br /> *repeats_filter_file (/opt/irstats/bin/repeatscache) - The file to maintain state between updates<br /> *repeats_filter_timeout (86400) - repeat timeout in seconds (the amount of time there needs to be between two hits for them both to be recorded, initially set to 60*60*24)<br /> <br /> *repository_url = http://eprints.ecs.soton.ac.uk - the path to the repository<br /> <br /> *database configuration<br /> **database_driver (mysql)<br /> **database_server (localhost)<br /> **database_name<br /> **database_user<br /> **database_password<br /> <br /> *database_id_columns ([ requester_organisation, requester_host, referrer_scope, search_engine, search_terms, referring_entity_id ]) - The columns in the database that have a UID rather than data. These need seperate tables in which to store the data.<br /> <br /> *Various table names and parts of names<br /> **database_eprints_access_log_table (accesslog) ##Perhaps remove after update rewrite.<br /> **database_main_stats_table (irstats_true_accesses_table)<br /> **database_column_table_prefix (irstats_column_)<br /> **database_set_table_prefix (irstats_set_)<br /> **database_set_table_code_suffix (_code)<br /> **database_set_table_citation_suffix (_citation)<br /> <br /> *id_parameters ([ start_date, end_date, eprints, view ]) - the parameters that are used to uniquely identify a view<br /> *host_lookup_temp_dir (/opt/irstats/bin/convert_hosts_temp_files/) - The directory in which to store temp files for host lookups<br /> <br /> <br /> == /opt/irstats/perl_lib ==<br /> <br /> Contains all the irstats classes.<br /> <br /> = IRStats Classes =<br /> <br /> Note that the leading IRStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. This is passed around the system.<br /> === Configuration Constants ===<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, IRStats filters by inner joining the irstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both IRStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers IRStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package IRStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use IRStats::DatabaseInterface;<br /> use IRStats::Cache;<br /> use IRStats::Visualisation::HTML;<br /> use IRStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ IRStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = IRStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = IRStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="irstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = IRStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4353 IRStats Technical Documentation 2007-05-30T18:28:58Z

<p>Gobfrey: /* Directory Structure */</p> <hr /> <div><br /> = Directory Structure =<br /> <br /> == /opt/irstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the irstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in irstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/irstats/cache ==<br /> Contains cache files. These should probably be deleted whenever the database is updated.<br /> <br /> == /opt/irstats/cgi ==<br /> <br /> Contains two scripts, 'get_view and 'stats'.<br /> <br /> *get_view returns the output of a IRStats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/irstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/irstats/perl_lib ==<br /> <br /> Contains all the irstats classes.<br /> <br /> = IRStats Classes =<br /> <br /> Note that the leading IRStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, IRStats filters by inner joining the irstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both IRStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers IRStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package IRStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use IRStats::DatabaseInterface;<br /> use IRStats::Cache;<br /> use IRStats::Visualisation::HTML;<br /> use IRStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ IRStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = IRStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = IRStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="irstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = IRStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4352 IRStats Technical Documentation 2007-05-30T18:25:13Z

<p>Gobfrey: </p> <hr /> <div><br /> = Directory Structure =<br /> <br /> == /opt/irstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/irstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the irstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in irstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/irstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/irstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a IRStats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/irstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/irstats/perl_lib ==<br /> <br /> Contains all the irstats classes.<br /> <br /> = IRStats Classes =<br /> <br /> Note that the leading IRStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, IRStats filters by inner joining the irstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both IRStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers IRStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package IRStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use IRStats::DatabaseInterface;<br /> use IRStats::Cache;<br /> use IRStats::Visualisation::HTML;<br /> use IRStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ IRStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = IRStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = IRStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="irstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = IRStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4350 IRStats Technical Documentation 2007-05-30T18:22:52Z

<p>Gobfrey: IRS - EPStats Technical Documentation moved to IRStats Technical Documentation</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = EPStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRS_-_EPStats_Technical_Documentation&diff=4351 IRS - EPStats Technical Documentation 2007-05-30T18:22:52Z

<p>Gobfrey: IRS - EPStats Technical Documentation moved to IRStats Technical Documentation</p> <hr /> <div>#redirect [[IRStats Technical Documentation]]</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4349 IRStats 2007-05-30T18:01:50Z

<p>Gobfrey: /* Installing IRStats */</p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> In order for IRStats to run, it requires two things:<br /> <br /> * a database table containing all hits to the repository<br /> * text files describing the contents of the repository<br /> <br /> === The Hits Table ===<br /> <br /> Awaiting a redevelopment.<br /> <br /> === The Text Files ===<br /> <br /> In order for IRStats to build up a picture of a repository, a number of text files need to be created and stored in the cfg/ directory:<br /> <br /> * epstats_set_membership.txt<br /> * epstats_set_member_codes.txt<br /> * epstats_set_member_full_citations.txt<br /> * epstats_set_member_short_citations.txt<br /> * epstats_set_member_urls.txt<br /> <br /> ==== Explanation by Example ====<br /> <br /> Imagine a very small repository. Here are its contents:<br /> <br /> * eprints<br /> ** (1) The Smells of Cheese<br /> ** (2) The Tastes of Wines<br /> ** (3) The Sounds of Oboes<br /> * Authors<br /> ** (1) John Smith<br /> ** (2) Harriet Jones<br /> <br /> If we then imagine that the following are also true:<br /> <br /> * John Smith is credited with being an author of eprints (1) and (2)<br /> * Harriet Jones is credited with being an author of eprints (2) and (3)<br /> * All three eprints are the output of a research group named "Senses"<br /> <br /> ===== Creating sets =====<br /> <br /> Sets are groups of eprints, and every eprint is a member of at least one set (the set containing only that eprint). From the information above, we have three sets. The eprint set, the author set and the research group set. We need to add the following to epstats_set_membership.txt (the format is <id><tab><csv list of eprint ids><br /> <br /> author_1 1,2<br /> author_2 2,3<br /> group_1 1,2,3<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> ===== Giving Sets IDs =====<br /> <br /> So, we now have some sets, but we need to give them unique IDs so that we can retrieve stats for these sets. To do this, we add the following to epstats_set_member_codes.txt:<br /> <br /> author_1 js<br /> author_2 hj<br /> group_1 senses<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> IRStats now assigns the following unique IDs to each set: author_js, author_hj, group_senses, eprint_1, eprint_2, eprint_3. Note that the IDs should probably be kept alphanumeric, and must be unique within a class of sets (but you can have author_hj, group_hj and eprint_hj).<br /> <br /> ===== Citations =====<br /> <br /> IRStats uses two citations for each set member, one short and one long. Which you use depends on how you would like your visualisation to look. However, we do need to add these to the citations files:<br /> <br /> epstats_set_member_short_citations.txt<br /> author_1 Smith<br /> <br /> epstats_set_member_full_citations.txt<br /> author_1 Dr John Smith, PhD<br /> <br /> Note that the above examples are only for author_1. It would be exactly the same for any set member.<br /> <br /> ===== URLs =====<br /> <br /> Although URLs are not currently implemented, it is probably a good idea to include this information (in epstats_set_member_urls.txt) for future functionality.<br /> <br /> author_1 http://homepage.john.smith.com/<br /> <br /> == Installing IRStats ==<br /> <br /> <br /> <br /> === Dependencies ===<br /> <br /> ==== Logfile::EPrints ====<br /> <br /> The Logfile::Eprints modules are used to assist in filtering the raw access log. They can be installed from CPAN.<br /> <br /> ==== AWStats ====<br /> <br /> AWStats data is used to filter out webspiders and classify search engines. The irstats.cfg must have an entry showing where the correct perl modules are.<br /> <br /> ==== Geo::IP ====<br /> <br /> Geo::IP is used to fill in country and organisation information. The country database is free, but if you want organisation information, you will have to purchase a subscription for their database. The location of the database should also be inserted into irstats.cfg.<br /> <br /> Note: The pure perl version of Geo::IP does not support organisations.<br /> <br /> === Installing ===<br /> <br /> === Customising ===<br /> <br /> It will almost always be necessary to perform some customisation on IRStats because every repository is different.<br /> <br /> ==== Updating the Table ====</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4348 IRStats 2007-05-30T17:59:11Z

<p>Gobfrey: /* Installing IRStats */</p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> In order for IRStats to run, it requires two things:<br /> <br /> * a database table containing all hits to the repository<br /> * text files describing the contents of the repository<br /> <br /> === The Hits Table ===<br /> <br /> Awaiting a redevelopment.<br /> <br /> === The Text Files ===<br /> <br /> In order for IRStats to build up a picture of a repository, a number of text files need to be created and stored in the cfg/ directory:<br /> <br /> * epstats_set_membership.txt<br /> * epstats_set_member_codes.txt<br /> * epstats_set_member_full_citations.txt<br /> * epstats_set_member_short_citations.txt<br /> * epstats_set_member_urls.txt<br /> <br /> ==== Explanation by Example ====<br /> <br /> Imagine a very small repository. Here are its contents:<br /> <br /> * eprints<br /> ** (1) The Smells of Cheese<br /> ** (2) The Tastes of Wines<br /> ** (3) The Sounds of Oboes<br /> * Authors<br /> ** (1) John Smith<br /> ** (2) Harriet Jones<br /> <br /> If we then imagine that the following are also true:<br /> <br /> * John Smith is credited with being an author of eprints (1) and (2)<br /> * Harriet Jones is credited with being an author of eprints (2) and (3)<br /> * All three eprints are the output of a research group named "Senses"<br /> <br /> ===== Creating sets =====<br /> <br /> Sets are groups of eprints, and every eprint is a member of at least one set (the set containing only that eprint). From the information above, we have three sets. The eprint set, the author set and the research group set. We need to add the following to epstats_set_membership.txt (the format is <id><tab><csv list of eprint ids><br /> <br /> author_1 1,2<br /> author_2 2,3<br /> group_1 1,2,3<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> ===== Giving Sets IDs =====<br /> <br /> So, we now have some sets, but we need to give them unique IDs so that we can retrieve stats for these sets. To do this, we add the following to epstats_set_member_codes.txt:<br /> <br /> author_1 js<br /> author_2 hj<br /> group_1 senses<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> IRStats now assigns the following unique IDs to each set: author_js, author_hj, group_senses, eprint_1, eprint_2, eprint_3. Note that the IDs should probably be kept alphanumeric, and must be unique within a class of sets (but you can have author_hj, group_hj and eprint_hj).<br /> <br /> ===== Citations =====<br /> <br /> IRStats uses two citations for each set member, one short and one long. Which you use depends on how you would like your visualisation to look. However, we do need to add these to the citations files:<br /> <br /> epstats_set_member_short_citations.txt<br /> author_1 Smith<br /> <br /> epstats_set_member_full_citations.txt<br /> author_1 Dr John Smith, PhD<br /> <br /> Note that the above examples are only for author_1. It would be exactly the same for any set member.<br /> <br /> ===== URLs =====<br /> <br /> Although URLs are not currently implemented, it is probably a good idea to include this information (in epstats_set_member_urls.txt) for future functionality.<br /> <br /> author_1 http://homepage.john.smith.com/<br /> <br /> == Installing IRStats ==<br /> <br /> <br /> <br /> === Dependencies ===<br /> <br /> ==== Logfile::EPrints ====<br /> <br /> The Logfile::Eprints modules are used to assist in filtering the raw access log. They can be installed from CPAN.<br /> <br /> ==== AWStats ====<br /> <br /> AWStats data is used to filter out webspiders and classify search engines. The irstats.cfg must have an entry showing where the correct perl modules are.<br /> <br /> ==== Geo::IP ====<br /> <br /> Geo::IP is used to fill in country and organisation information. The country database is free, but if you want organisation information, you will have to purchase a subscription for their database. The location of the database should also be inserted into irstats.cfg.<br /> <br /> Note: The pure perl version of Geo::IP does not support organisations.</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4347 IRStats 2007-05-30T15:13:35Z

<p>Gobfrey: /* Explanation by Example */</p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> In order for IRStats to run, it requires two things:<br /> <br /> * a database table containing all hits to the repository<br /> * text files describing the contents of the repository<br /> <br /> === The Hits Table ===<br /> <br /> Awaiting a redevelopment.<br /> <br /> === The Text Files ===<br /> <br /> In order for IRStats to build up a picture of a repository, a number of text files need to be created and stored in the cfg/ directory:<br /> <br /> * epstats_set_membership.txt<br /> * epstats_set_member_codes.txt<br /> * epstats_set_member_full_citations.txt<br /> * epstats_set_member_short_citations.txt<br /> * epstats_set_member_urls.txt<br /> <br /> ==== Explanation by Example ====<br /> <br /> Imagine a very small repository. Here are its contents:<br /> <br /> * eprints<br /> ** (1) The Smells of Cheese<br /> ** (2) The Tastes of Wines<br /> ** (3) The Sounds of Oboes<br /> * Authors<br /> ** (1) John Smith<br /> ** (2) Harriet Jones<br /> <br /> If we then imagine that the following are also true:<br /> <br /> * John Smith is credited with being an author of eprints (1) and (2)<br /> * Harriet Jones is credited with being an author of eprints (2) and (3)<br /> * All three eprints are the output of a research group named "Senses"<br /> <br /> ===== Creating sets =====<br /> <br /> Sets are groups of eprints, and every eprint is a member of at least one set (the set containing only that eprint). From the information above, we have three sets. The eprint set, the author set and the research group set. We need to add the following to epstats_set_membership.txt (the format is <id><tab><csv list of eprint ids><br /> <br /> author_1 1,2<br /> author_2 2,3<br /> group_1 1,2,3<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> ===== Giving Sets IDs =====<br /> <br /> So, we now have some sets, but we need to give them unique IDs so that we can retrieve stats for these sets. To do this, we add the following to epstats_set_member_codes.txt:<br /> <br /> author_1 js<br /> author_2 hj<br /> group_1 senses<br /> eprint_1 1<br /> eprint_2 2<br /> eprint_3 3<br /> <br /> IRStats now assigns the following unique IDs to each set: author_js, author_hj, group_senses, eprint_1, eprint_2, eprint_3. Note that the IDs should probably be kept alphanumeric, and must be unique within a class of sets (but you can have author_hj, group_hj and eprint_hj).<br /> <br /> ===== Citations =====<br /> <br /> IRStats uses two citations for each set member, one short and one long. Which you use depends on how you would like your visualisation to look. However, we do need to add these to the citations files:<br /> <br /> epstats_set_member_short_citations.txt<br /> author_1 Smith<br /> <br /> epstats_set_member_full_citations.txt<br /> author_1 Dr John Smith, PhD<br /> <br /> Note that the above examples are only for author_1. It would be exactly the same for any set member.<br /> <br /> ===== URLs =====<br /> <br /> Although URLs are not currently implemented, it is probably a good idea to include this information (in epstats_set_member_urls.txt) for future functionality.<br /> <br /> author_1 http://homepage.john.smith.com/<br /> <br /> == Installing IRStats ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4346 IRStats 2007-05-30T13:19:21Z

<p>Gobfrey: /* The Text Files */</p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> In order for IRStats to run, it requires two things:<br /> <br /> * a database table containing all hits to the repository<br /> * text files describing the contents of the repository<br /> <br /> === The Hits Table ===<br /> <br /> Awaiting a redevelopment.<br /> <br /> === The Text Files ===<br /> <br /> In order for IRStats to build up a picture of a repository, a number of text files need to be created and stored in the cfg/ directory:<br /> <br /> * epstats_set_membership.txt<br /> * epstats_set_member_codes.txt<br /> * epstats_set_member_full_citations.txt<br /> * epstats_set_member_short_citations.txt<br /> * epstats_set_member_urls.txt<br /> <br /> ==== Explanation by Example ====<br /> <br /> Imagine a very small repository. Here are its contents:<br /> <br /> * eprints<br /> ** (1) The Smells of Cheese<br /> ** (2) The Tastes of Wines<br /> ** (3) The Sounds of Oboes<br /> * Authors<br /> ** (1) John Smith<br /> ** (2) Harriet Jones<br /> <br /> == Installing IRStats ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4345 IRStats 2007-05-30T11:49:39Z

<p>Gobfrey: /* The Hits Table */</p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> In order for IRStats to run, it requires two things:<br /> <br /> * a database table containing all hits to the repository<br /> * text files describing the contents of the repository<br /> <br /> === The Hits Table ===<br /> <br /> Awaiting a redevelopment.<br /> <br /> === The Text Files ===<br /> <br /> == Installing IRStats ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4344 IRStats 2007-05-30T10:34:29Z

<p>Gobfrey: /* Required Data */</p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> In order for IRStats to run, it requires two things:<br /> <br /> * a database table containing all hits to the repository<br /> * text files describing the contents of the repository<br /> <br /> === The Hits Table ===<br /> <br /> <br /> === The Text Files ===<br /> <br /> == Installing IRStats ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4343 IRStats 2007-05-30T10:25:57Z

<p>Gobfrey: </p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]<br /> <br /> == Required Data ==<br /> <br /> == Installing IRStats ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4342 IRStats 2007-05-30T10:20:18Z

<p>Gobfrey: </p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> A Visualisation takes a set of processed statistics and outputs them. For example, Visualisation::Graph::Pie creates a pie chart.<br /> <br /> === The Database Interface ===<br /> <br /> The Database Interface object handles all queries to the database. Most requests for statistics can be completed with a single call to the get_stats($params) method.<br /> <br /> === Data Flow Diagram ===<br /> [[Image:irstats_overview.png]]</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4341 IRStats 2007-05-30T10:14:54Z

<p>Gobfrey: </p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats. They have been designed to be user configurable, though some knowledge of perl is probably required. When a query is made to IRStats, a View is created. It generates some parameters for the DatabaseInterface object, which queries the database and passes back the results of the query. The View then iterates over the database rows and processes the stats in any way programmatically possible. These processed results are then passed to a Visualisation.<br /> <br /> === Visualisations ===<br /> <br /> === The Database Interface ===<br /> <br /> [[Image:irstats_overview.png]]</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4340 IRStats 2007-05-30T10:10:37Z

<p>Gobfrey: </p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Set ====<br /> <br /> As well as defining a daterange, we also have to inform IRStats of which publications we are interested in. Any publication not in the set will be ignored. A set of eprints can either be a single eprint or any set of eprints the system administrator wishes to define in the config files.<br /> <br /> ==== View ====<br /> <br /> The final parameter tells IRStats how we want to process and display the statistics. This is done by selecting a View.<br /> <br /> === Views ===<br /> <br /> Views are perl modules which plug in to IRStats.<br /> <br /> === Visualisations ===<br /> <br /> === The Database Interface ===<br /> <br /> [[Image:irstats_overview.png]]</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4339 IRStats 2007-05-30T10:04:42Z

<p>Gobfrey: </p> <hr /> <div>IRStats is a flexible statistics package which allows easy processing of accesses to fulltext and abstract pages of eprints.<br /> <br /> == Technical Overview ==<br /> <br /> The following is a quick tour of IRStats.<br /> <br /> === Parameters ===<br /> <br /> IRStats output depends on four parameters, which need to be passed as cgi parameters if called through a web browser, or in a hash if called through the Perl API. These are:<br /> <br /> ==== Start Date and End Date ====<br /> <br /> Date parameters are implemented as separate day, month and year parameters, so these two parameters are actually six (start_day, start_month, start_year, end_day, end_month, end_year). Any statistics outside this daterange are ignored.<br /> <br /> ==== An Eprint Sets ====<br /> <br /> <br /> ==== View ====<br /> <br /> <br /> <br /> === Views ===<br /> <br /> === Visualisations ===<br /> <br /> === The Database Interface ===<br /> <br /> [[Image:irstats_overview.png]]</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats&diff=4338 IRStats 2007-05-30T09:46:34Z

<p>Gobfrey: </p> <hr /> <div>[[Image:irstats_overview.png]]</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=File:Irstats_overview.png&diff=4337 File:Irstats overview.png 2007-05-30T09:46:00Z

<p>Gobfrey: A system overview of IRStats, showing how data flows through the system.</p> <hr /> <div>A system overview of IRStats, showing how data flows through the system.</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4297 IRStats Technical Documentation 2007-05-21T16:04:52Z

<p>Gobfrey: Epstats moved to IRS - EPStats Technical Documentation</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = EPStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=Epstats&diff=4298 Epstats 2007-05-21T16:04:52Z

<p>Gobfrey: Epstats moved to IRS - EPStats Technical Documentation</p> <hr /> <div>#redirect [[IRS - EPStats Technical Documentation]]</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4212 IRStats Technical Documentation 2007-03-29T21:49:37Z

<p>Gobfrey: /* View */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation. It is intended that savvy users create their own views.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = EPStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4211 IRStats Technical Documentation 2007-03-29T21:48:39Z

<p>Gobfrey: /* View::FullTextCountHTML */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing, making it ideal for a quick walkthrough:<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = EPStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4210 IRStats Technical Documentation 2007-03-29T21:46:16Z

<p>Gobfrey: /* View::FullTextCountHTML */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing.<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> And that's a really simple view.<br /> <br /> === Using Periods ===<br /> <br /> If we wanted to break our daterange into periods, we'd need to do something like this:<br /> <br /> my $periods = EPStats::Periods->new($self->{'params'}->{'start_date'},$self->{'params'}->{'end_date'});<br /> foreach my $period ( @{$periods->calandar_months()} )<br /> {<br /> $self->{'params'}->mask($period);<br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> $self->{'params'}->unmask();<br /> #process and put into variables<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4209 IRStats Technical Documentation 2007-03-29T21:40:47Z

<p>Gobfrey: /* View::FullTextCountHTML */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing.<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> And that's a really simple view.<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4208 IRStats Technical Documentation 2007-03-29T21:40:19Z

<p>Gobfrey: /* View::FullTextCountHTML */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing.<br /> <br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> And that's a really simple view.<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4207 IRStats Technical Documentation 2007-03-29T21:40:01Z

<p>Gobfrey: /* View */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing.<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> Next, we have to retreive from the database:<br /> <br /> my $query = $self->{'database'}->get_stats(<br /> $self->{'params'},<br /> $self->{'sql_columns'},<br /> $self->{'sql_params'}<br /> );<br /> Now we process them. In this case, we don't even need a loop as we know there's only going to be one row. We'll stick the result straight into some html, and save it. Don't forget that if there isn't any data, you still have to output something.<br /> my @row = $query->fetchrow_array();<br /> my $html = '<span class="epstats_view_fulltextcounthtml">' . ($row[1] ? $row[1] : '0') . "</span>";<br /> A little housekeeping:<br /> $query->finish();<br /> Pop the data into the visualisation:<br /> $self->{'visualisation'}->set('html',$html);<br /> Finally, we should write to the cache so we don't have to query the database next time.<br /> $cache->write($self->{'visualisation'});<br /> }<br /> And that's a really simple view.<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4206 IRStats Technical Documentation 2007-03-29T21:34:17Z

<p>Gobfrey: /* View */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> A view processes the stats data filtered by the parameters and creates a visualisation.<br /> <br /> === Functions ===<br /> All views inherit:<br /> *new(params_obj, database_interface_object) - returns the object.<br /> *render - calls populate, then returns whatever the visualisation renders<br /> All visualisations must implement:<br /> *new - passes arguments to superclass, then calls 'initialise'.<br /> *initialise - the Configuration Constants are set here.<br /> *populate - The engine that powers EPStats.<br /> <br /> === View::FullTextCountHTML ===<br /> The FullTextCountHTML is an extremely simple view. It retrieves one row from the database and does no processing.<br /> <br /> ==== Housekeeping ====<br /> At the top of the file, we need:<br /> package EPStats::View::FullTextCountHTML;<br /> use strict;<br /> use warnings;<br /> Now, which modules will we use. I've included perchardir, the graph making package, even though we're not using it.<br /> use EPStats::DatabaseInterface;<br /> use EPStats::Cache;<br /> use EPStats::Visualisation::HTML;<br /> use EPStats::View;<br /> use perlchartdir;<br /> And link to superclass. <br /> our @ISA = qw/ EPStats::View /;<br /> <br /> ==== Configuration Constants ====<br /> We are interested in retreiving the fulltxt column, and a count as we will be aggregating. The sql_params are set, so that we can filter on fulltext downloads, and we need to group as we are counting. We also create our visualisation here.<br /> sub initialise<br /> {<br /> my ($self) = @_;<br /> $self->{'sql_columns'} = [ 'fulltxt', 'COUNT(fulltxt)' ];<br /> $self->{'sql_params'} = {where => "fulltxt = 'F'", group_by => 'fulltxt'};<br /> $self->{'visualisation'} = EPStats::Visualisation::HTML->new();<br /> }<br /> <br /> ==== new ====<br /> The new function shouldn't ever need to be any different from this:<br /> sub new<br /> {<br /> my( $class, $params, $database ) = @_;<br /> my $self = $class->SUPER::new($params, $database);;<br /> $self->initialise();<br /> return $self;<br /> }<br /> <br /> ==== populate ====<br /> Every populate function should start by checking the cache, and finish by writing to the cache.<br /> <br /> sub populate<br /> {<br /> my ($self) = @_;<br /> <br /> my $cache = EPStats::Cache->new($self->{'params'}->get('id'));<br /> if ($cache->exists)<br /> {<br /> $self->{'visualisation'} = $cache->read();<br /> return;<br /> }<br /> .<br /> .<br /> .<br /> $cache->write($self->{'visualisation'});<br /> }<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4205 IRStats Technical Documentation 2007-03-29T21:12:13Z

<p>Gobfrey: /* Visualisation::Graph */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs. The Graph object initialised the colours that the graph may be using.<br /> <br /> Every graph must be created with at least the filename:<br /> *new({filename => string}) - the filename comes from the ID of the param object.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> === Sub Classes ===<br /> Note that in the Visualisation/Graph/ directory, there is 'GraphLegend.pm'. This is used to create the html for the graph legends.<br /> <br /> ====Visualisation::Graph::Bar.pm====<br /> A Bar Graph. It can have one or more bars in each division of the x axis.<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each set of bars<br /> <br /> ====Visualisation::Graph::Line.pm====<br /> A Line Graph. There can be many lines on it<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('x_title',string) - The title of the x axis.<br /> *set('y_title',string) - The title of the y axis.<br /> *set('x_labels',array_ref) - an array containing the labels for the x axis<br /> *set('data_series, array_ref) - an array of arrayrefs, referencing data for each line<br /> <br /> ====Visualisation::Graph::Pie.pm====<br /> A Pie Graph<br /> <br /> To implement:<br /> *set('title',string) - The title that will be in the graph image.<br /> *set('data_series, array_ref) - an array of hashrefs, {data => int, citation => string}, one for each slice</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4204 IRStats Technical Documentation 2007-03-29T20:45:23Z

<p>Gobfrey: /* Configuration Constants */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> *$graph_dir - the path to the directory where the image file will be saved.<br /> *$url_relative - this will have the filename added to the end and put in the img html tag.<br /> <br /> To Populate</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4203 IRStats Technical Documentation 2007-03-29T20:44:53Z

<p>Gobfrey: /* Visualisation::Graph */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==<br /> The graph objects all use Chart Director to generate graphs.<br /> <br /> === Configuration Constants ===<br /> These are set in the 'new' function.<br /> my $graph_dir - the path to the directory where the image file will be saved.<br /> my $url_relative - this will have the filename added to the end and put in the img html tag.</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4202 IRStats Technical Documentation 2007-03-29T20:34:55Z

<p>Gobfrey: /* isualisation */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == Visualisation ==<br /> Currently Visualisations are Graph, Table or HTML. These are what the user will look at in the broswer or download (in the case of CSV).<br /> <br /> === Functions ===<br /> All visualisations inherit:<br /> *new(data_hash) - a hash can optionally be passed containing the values that would otherwise be set using the 'set' function.<br /> *set(param_name, value) - sets something to something - see subclasses<br /> <br /> All visualisations must implement:<br /> *render() - returns what will be passed to the script.<br /> <br /> <br /> == Visualisation::HTML ==<br /> The simplest visualisation. Just a chunk of html.<br /> <br /> To Populate:<br /> *set('html', html_string) - takes the html as a string.<br /> <br /> == Visualisation::Table ==<br /> The Visualisation::Table currently just passes the buck to its superclass.<br /> <br /> There are currently three table Visualisations:<br /> <br /> === Visualisation::Table::CSV ===<br /> Returns a CSV table.<br /> <br /> To Populate:<br /> *set('headings', headings_arrayref) - pass an array containing headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> <br /> === Visualisation::Table::HTML ===<br /> A basic HTML table.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> And then optionally<br /> *set('totals', totals_arrayref) - an array of totals to put at the bottom of the table.<br /> <br /> === Visualisation::Table::HTML_Columned ===<br /> An HTML table that is rendered in several columns.<br /> ==== Configuration Constants ====<br /> $default_number_of_rows - an int representing the maximum number of rows the table should have.<br /> <br /> ==== Overridden Functions ====<br /> *new(data_hash, number_of_rows) - Both data_hash and number_of_rows are optional. Both can be set with 'set'.<br /> <br /> To Populate:<br /> *set('columns', headings_arrayref) - pass an array containing column headings.<br /> *set('rows', rows_arrayref) - an array of arrayrefs, each referencing a row of data.<br /> *set('number_of_rows', int) - set the maximum number of rows the table should have.<br /> <br /> == Visualisation::Graph ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4201 IRStats Technical Documentation 2007-03-29T20:10:14Z

<p>Gobfrey: /* UserInterface::Controls */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daughter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == isualisation ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4200 IRStats Technical Documentation 2007-03-29T20:10:03Z

<p>Gobfrey: /* UserInterface::Controls */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> This is used to generate the drop boxes in the stats cgi script. If I had more time I'd document it fully, but my daugter's going to be born in less than 12 hours.<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == isualisation ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4199 IRStats Technical Documentation 2007-03-29T20:08:57Z

<p>Gobfrey: /* Periods */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> The Periods object is used when you want to break a daterange down into sub-ranges. Used with the params->mask() function, stats can be retrieved for periods inside a date range.<br /> ===Functions===<br /> *new(start_date_obj, end_date_obj) - doesn't do anything, just returns the object.<br /> <br /> The following functions all return an array of hashes. Each hash has the keys 'start_date' and 'end_date', and the values are both EPStats::Date objects.<br /> <br /> *calandar_months - Returns full months (each element starts on the 1st, and ends on the last day).<br /> *months - Returns month periods (if the start_date is the 15th, then each period starts on the 15th and ends on the 14th of the next month - except the last period, which only has about a 1/30 chance of doing so).<br /> *weeks - returns 7-day periods (except the last, which has a 1/7 chance of being 7 days long).<br /> *days - returns single days (for each period, the start_date and end_date are the same).<br /> <br /> == UserInterface::Controls ==<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == isualisation ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4198 IRStats Technical Documentation 2007-03-29T20:00:41Z

<p>Gobfrey: /* Functions */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> *new(id) - takes the ID of the params object we're using at the moment.<br /> *exists() - returns true if there's a cached file, false if there isn't one.<br /> *write(visualisation_object) - writes the data to the cache file.<br /> *read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> <br /> == UserInterface::Controls ==<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == isualisation ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4197 IRStats Technical Documentation 2007-03-29T20:00:26Z

<p>Gobfrey: /* Cache */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> The interface to the cache.<br /> <br /> === Configuration Constants ===<br /> *$cache_directory - a string containing a path to the directory in which the cache files are located.<br /> <br /> === Functions ===<br /> new(id) - takes the ID of the params object we're using at the moment.<br /> exists() - returns true if there's a cached file, false if there isn't one.<br /> write(visualisation_object) - writes the data to the cache file.<br /> read() - returns the data from the cache.<br /> <br /> == Periods ==<br /> <br /> == UserInterface::Controls ==<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == isualisation ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4196 IRStats Technical Documentation 2007-03-29T19:55:12Z

<p>Gobfrey: /* Date */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> == Cache ==<br /> <br /> <br /> == Periods ==<br /> <br /> == UserInterface::Controls ==<br /> <br /> == Page (depricated) ==<br /> Harkens back to the day when a page object contained views.<br /> <br /> == View ==<br /> <br /> == isualisation ==</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4195 IRStats Technical Documentation 2007-03-29T19:52:33Z

<p>Gobfrey: /* Date */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> *less_than(date_object) - compares itself to another date object. Returns 1 if it's less than it, otherwise returns 0.<br /> *greater_than(date_object) - compares itself to another date object. Returns 1 if it's greater than it, otherwise returns 0.<br /> *month_name() - returns the three letter string of the month.<br /> *render(format_string) - returns a date string. Format can be:<br /> **'short' - Calls render_abbreviated - returns a date like this: 05-Jul-77<br /> **'long' - Calls render_full (not implemented).<br /> **'numerical' (default) - Calls render_numerical - returns a date like this: 19770705<br /> *clone - returns an new, identical date object.<br /> <br /> <br /> Cache<br /> <br /> <br /> Periods<br /> <br /> UserInterface::Controls<br /> <br /> Page (depricated)<br /> <br /> <br /> <br /> View<br /> View.pm<br /> Visualisation<br /> Visualisation.pm</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4194 IRStats Technical Documentation 2007-03-29T19:45:00Z

<p>Gobfrey: /* Date */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> I implemented a date object because there were some specific things I needed to do with dates.<br /> <br /> ===Functions===<br /> *new(date_hash) - Creates a new date object when passed a hash with the keys 'day', 'month' and 'year'.<br /> *validate() - If the date is not valid, it will be modified to a sensible value. E.G. if it's Feb 30th, it will be modified to Feb 29th or 28th, dependant on if it's a leap year.<br /> *set(part_name, int) - Sets part of the date ('year','month' or 'day') to a specific value.<br /> *decrement(period) - increments the date by a period ('day', 'week', 'month', 'quarter', 'year'). Calls the mod_date function, which does the muscle work.<br /> *increment(period) - decrements by calling mod_date.<br /> *part(part_name, style - Returns the day, month or year. For month, if style=='text', returns a three letter string, otherwise returns an integer. For year, if style=='short', returns the last two digits, otherwise returns all four.<br /> <br /> <br /> <br /> <br /> <br /> Cache<br /> <br /> <br /> Periods<br /> <br /> UserInterface::Controls<br /> <br /> Page (depricated)<br /> <br /> <br /> <br /> View<br /> View.pm<br /> Visualisation<br /> Visualisation.pm</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4193 IRStats Technical Documentation 2007-03-29T19:30:26Z

<p>Gobfrey: /* DatabaseInterface */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> <strong>IMPORTANT</strong> - the mysql generated has been developed on a machine running mysql 5. Installing on the EPrints server has broken this (as it's running mysql 4). I placed a quick and dirty hack into the do_sql function, and modified the create_top_table function. I have no idea if this works well. <strong>IT NEEDS TO BE CHECKED</strong>.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> <br /> Cache<br /> <br /> <br /> Periods<br /> <br /> UserInterface::Controls<br /> <br /> Page (depricated)<br /> <br /> <br /> <br /> View<br /> View.pm<br /> Visualisation<br /> Visualisation.pm</div>

Gobfrey https://wiki.eprints.org/w/index.php?title=IRStats_Technical_Documentation&diff=4192 IRStats Technical Documentation 2007-03-29T19:27:48Z

<p>Gobfrey: /* EPStats Classes */</p> <hr /> <div>This document is intended as guidance to the last stage of development of EPstats.<br /> <br /> = Directory Structure =<br /> <br /> == /opt/epstats ==<br /> Contains data files for GeoIP. If I had had root access, I would have put them in the correct place. They are linked to from the correct place. These need regular updating, something which hasn't been implemented.<br /> <br /> == /opt/epstats/bin ==<br /> Contains the scripts needed to update the table.<br /> <br /> *daily_update.sh - Runs all the scripts in the right order.<br /> *extract_metadata_from_archive.pl - Extracts eprint, author and group metadata from the repository by iterating over every eprint.<br /> *update_table.pl - Filters and processes new entries in the accesslog to update the epstats_true_acesses_table. Uses 'SearchParser.pm' and 'repeatscache'.<br /> * convert_ip_to_host.pl - Attempts to convert ip addresses of the new entries in epstats_true_acesses_table to hostnames. Uses 'host_updated' to keep track of where it got to last time.<br /> <br /> Note that most of these scripts probably need to be tidied up. They were written in a hurry and were never polished.<br /> <br /> == /opt/epstats/cache ==<br /> Contains cache files. Feel free to delete these whenever you like. <br /> <br /> == /opt/epstats/cgi ==<br /> <br /> Contains two scripts, 'get_view' and 'stats'.<br /> <br /> *get_view returns the output of a EPstats::View (see below), which is currently a chunk of html or csv, but could be almost anything.<br /> *stats is a handy cgi form that passes arguements to get_view<br /> <br /> == /opt/epstats/img ==<br /> <br /> Conceptually, where any images would be kept (e.g. national flags). At the moment, only the img/graphs directory is used. This is where generated graphs are stored.<br /> <br /> == /opt/epstats/perl_lib ==<br /> <br /> Contains all the epstats classes.<br /> <br /> = EPStats Classes =<br /> <br /> Note that the leading EPStats:: has been left out for brevity.<br /> <br /> == Params ==<br /> This object holds the parameters that are used to generate the statistics. The most imortant of these are a date range and an eprint set.<br /> === Configuration Constants ===<br /> *$cgi_script - the name of the cgi script (currently unused)<br /> *$id_params - When generating an ID, which parameters are important.<br /> *$defaults - Any default parameters you wish to set.<br /> <br /> === Functions ===<br /> *new(CGI_object) - returns new object<br /> *mask(params_hash) - used when you want to temporarily overwrite parameter(s). Overwrites values with contents of params_hash. Overwritten values get pushed onto a stack.<br /> *unmask - Sets parameters back to how they were before the last mask.<br /> *generate_cgi - returns a string containing the name of the cgi script, and all parameters, to enable the creation of links. (currently unused)<br /> *get(param_name) - returns the value of a single parameter.<br /> *create_id - Uses MD5 to create a unique ID from the id_params (see Constants above). This is called whenever get('id') is called.<br /> <br /> == DatabaseInterface ==<br /> This object does what it says on the tin. Any access to the database is done though it.<br /> <br /> === Configuration Constants ===<br /> Constants are contained in the new function.<br /> <br /> *DBI Configuration Constants - $driver, $server, $database, $user, $password are all used to create the connection to the database.<br /> *source_table - The table in which the stats are stored.<br /> <br /> === Functions ===<br /> *new() - returns object.<br /> *retreive_set_names() - returns a list of eprint sets. Currently 'group' and 'author' are implemented. This is used to verify cgi input.<br /> *get_membership(eprint_id, set_name) - For a given eprint ID, which of a named set does it belong to. For example, we can find out which authors eprint 12614 has by get_membership(12614, 'author').<br /> *get_citation(id, set, length) - returns a citation. Every set member (eprint, author, group) has two citations. short and full. We only return a short citation if length == 'short'. So, to get the short citation of a group 3: get_citation(3,'group','short').<br /> *get_code($id,$set) - UNWRITTEN - Set member have codes. This how they are identified by the user. For example author_lac is the member of the author set whose code is lac. To get the code for group 3: get_code(3,'group').<br /> <br /> When retreiving statistics, EPStats filters by inner joining the epstats_true_accesses_table to other tables contining eprint IDs. Sometimes it has to create these tables.<br /> <br /> *create_top_table(param_object) - This creates a table containing the eprint IDs of the top X by fulltext download between two dates.<br /> *create_list_table(table_name, eprint_ids) - Takes two strings, one the name of the table, the other a space seperated list of eprint IDs. Creates a temporary table.<br /> <br /> The following are the only two functions that actually make calls to the database. <br /> <br /> *do_sql(sql_query_string) - takes a string and performs a query, returning the dbi object containing the results.<br /> *insert_values(table_name, values) - inserts a row of data into a table.<br /> <br /> And finally, the meat and potatoes. The functions that return the statistics we're interested in.<br /> <br /> *get_stats(params_object, column_name_list, options_hash) - returns a dbi object containing the stats we are interested. i.e. the params_object's date range and eprints sets, and only the columns in column_list. The options hash can contain the following key/value pairs<br /> **order => column_name - the column on which to order it. append with '-' or ' DESC' to order it descending.<br /> **limit => int - How many results to return<br /> **group_by => column_name - if we need to group by a column.<br /> get_stats works by examining the 'eprints' parameter and calling one of the following functions:<br /> **get_list_stats<br /> **get_top_stats<br /> **get_set_stats<br /> **get_all_stats<br /> These functions generate slightly different mysql queries, and pass them to the do_sql function.<br /> <br /> == Date ==<br /> <br /> Cache<br /> <br /> <br /> Periods<br /> <br /> UserInterface::Controls<br /> <br /> Page (depricated)<br /> <br /> <br /> <br /> View<br /> View.pm<br /> Visualisation<br /> Visualisation.pm</div>

Gobfrey