Difference between revisions of "API:EPrints/Apache/SiteMap"

From EPrints Documentation
Jump to: navigation, search
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<!-- Pod2Wiki=_preamble_  
 
<!-- Pod2Wiki=_preamble_  
This page has been automatically generated from the EPrints 3.2 source. Any wiki changes made between the 'Pod2Wiki=*' and 'Edit below this comment' comments will be lost.
+
This page has been automatically generated from the EPrints 3.4 source. Any wiki changes made between the 'Pod2Wiki=*' and 'Edit below this comment' comments will be lost.
  -->{{API}}{{Pod2Wiki}}{{API:Source|file=EPrints/Apache/SiteMap.pm|package_name=EPrints::Apache::SiteMap}}[[Category:API|SITEMAP]][[Category:API:EPrints/Apache|SITEMAP]][[Category:API:EPrints/Apache/SiteMap|SITEMAP]]<div><!-- Edit below this comment -->
+
  -->{{API}}{{Pod2Wiki}}{{API:Source|file=EPrints/Apache/SiteMap.pm|package_name=EPrints::Apache::SiteMap}}[[Category:API|SITEMAP]][[Category:API:EPrints/Apache|SITEMAP]]<div><!-- Edit below this comment -->
  
  
 
<!-- Pod2Wiki=_private_ --><!-- Pod2Wiki=head_name -->
 
<!-- Pod2Wiki=_private_ --><!-- Pod2Wiki=head_name -->
 
==NAME==
 
==NAME==
EPrints::Apache::SiteMap
+
'''EPrints::Apache::SiteMap''' - Generates a dynamic sitemap.
  
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
Line 17: Line 17:
 
<!-- Pod2Wiki=head_description -->
 
<!-- Pod2Wiki=head_description -->
 
==DESCRIPTION==
 
==DESCRIPTION==
This handler has been heavily modified in order to support a static sitemap.xml file in addition to the semantic web crawling extensions provided by EPrints. The modified handler inserts the semantic web crawling extensions into the existing sitemap.xml if it exists, or creates a new document if it doesn't. The original handler is now in the _insert_semantic_web_extensions below.
+
This has been somewhat superseded by the {{API:PodLink|file=bin|package_name=bin|section=generate_sitemap|text=bin/generate_sitemap}} script, which generates a more useful output for Google indexing.
 +
 
 +
This handler has been heavily modified in order to support a static ''sitemap.xml'' file in addition to the semantic web crawling extensions provided by EPrints. The modified handler inserts the semantic web crawling extensions into the existing ''sitemap.xml'' if it exists, or creates a new document if it doesn't. The original handler is now in the <tt>_insert_semantic_web_extensions</tt> below.
 +
 
 +
If the static sitemap XML is a <tt>sitemapindex</tt>, this handler inserts  a new <tt>&lt;sitemap</tt>&gt; element into the index, which directs crawlers to a ''sitemap-sc.xml'' URL that contains the semantic web sitemap  generated by <tt>_insert_semantic_web_extensions</tt>. This handler also  implements the <tt>sitemap-sc.xml</tt> URL.
  
If the static sitemap XML is a sitemapindex, this handler inserts a new &lt;sitemap&gt; element into the index, which directs crawlers to a "sitemap-sc.xml" URL that contains the semantic web sitemap generated by _insert_semantic_web_extensions. This handler also implements the sitemap-sc.xml URL.
 
  
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
Line 41: Line 44:
  
 
  $rc = EPrints::Apache::SiteMap::handler( $r )
 
  $rc = EPrints::Apache::SiteMap::handler( $r )
Handler for managing EPrints requests for dynamically generated sitemap.xml or sitemap-sc.xml (or returning static version if that exists). =cut ######################################################################
+
Handler for managing EPrints requests for dynamically generated ''sitemap.xml'' or ''sitemap-sc.xml'' (or returning static version if that exists).
 
 
sub handler {
 
my( $r ) = @_;
 
 
 
  my $repository = $EPrints::HANDLE-&gt;current_repository;
 
  my $xml = $repository-&gt;xml;
 
  my $sitemap;
 
 
 
  if ( $r-&gt;uri =~ m! sitemap-sc\.xml$ !x )
 
  {
 
    # this is a direct request for the semantic web extensions
 
    $sitemap = _new_urlset( $repository, $xml );
 
  }
 
  else
 
  {
 
    # get the static sitemap.xml
 
    my $langid = EPrints::Session::get_session_language( $repository, $r );
 
    my @static_dirs = $repository-&gt;get_static_dirs( $langid );
 
    foreach my $static_dir ( @static_dirs )
 
    {
 
      my $file = "$static_dir/sitemap.xml";
 
      next if( !-e $file );
 
 
 
      $sitemap = $xml-&gt;parse_file($file) || EPrints::abort( "Can't parse $file: $!" );
 
      last;
 
    }
 
 
 
    if( !defined $sitemap )
 
    {
 
      # no static sitemap file - create a new document
 
      $sitemap = _new_urlset( $repository, $xml );
 
    }
 
    elsif( $sitemap-&gt;documentElement-&gt;localname eq "urlset" )
 
    {
 
      # the static sitemap is a &lt;urlset&gt; - append the semantic web extensions to the end
 
      _insert_semantic_web_extensions($repository, $xml, $sitemap-&gt;documentElement);
 
    }
 
    elsif( $sitemap-&gt;documentElement-&gt;localname eq "sitemapindex" )
 
    {
 
      # the static sitemap is a &lt;sitemapindex&gt; - append a semantic web sitemap to the index
 
      my $sw_sitemap = $sitemap-&gt;createElement("sitemap");
 
      $sitemap-&gt;documentElement-&gt;appendChild($sw_sitemap);
 
 
 
      # append the location of the semantic web sitemap
 
      my $sw_loc = $sitemap-&gt;createElement("loc");
 
      $sw_sitemap-&gt;appendChild($sw_loc);
 
      $sw_loc-&gt;appendChild($sitemap-&gt;createTextNode($repository-&gt;config('base_url')."/sitemap-sc.xml"));
 
    }
 
  }
 
 
 
  # adds local sitemap URLs
 
  if( $sitemap-&gt;documentElement-&gt;localname eq "urlset" )
 
  {
 
    $repository-&gt;run_trigger( EPrints::Const::EP_TRIGGER_LOCAL_SITEMAP_URLS,
 
      urlset =&gt; $sitemap-&gt;documentElement,
 
    );
 
  } # TODO: else { call some other trigger, with the sitemapindex element }
 
 
 
  binmode( *STDOUT, ":utf8" );
 
  $repository-&gt;send_http_header( "content_type"=&gt;"text/xml; charset=UTF-8" );
 
  print $xml-&gt;to_string( $sitemap );
 
  return DONE; }
 
 
 
# # Creates a new XML document containing a urlset populated # by _insert_semantic_web_extensions #
 
sub _new_urlset {
 
my( $repository, $xml ) = @_;
 
 
 
  my $document = $xml-&gt;make_document();
 
  my $urlset = $xml-&gt;create_element(
 
      "urlset",
 
      "xmlns" =&gt; "http://www.sitemaps.org/schemas/sitemap/0.9"
 
  );
 
  _insert_semantic_web_extensions( $repository, $xml, $urlset );
 
  $document-&gt;appendChild( $urlset );
 
 
 
  return $document; }
 
 
 
# # Insert the semantic web extensions as children of the element given as the # third argument to the function. This function contains the body of the main # handler shipped with EPrints 3.2.x #
 
sub _insert_semantic_web_extensions {
 
my ( $repository, $xml, $urlset ) = @_;
 
 
 
  $urlset-&gt;setAttribute( "xmlns:sc" , "http://sw.deri.org/2007/07/sitemapextension/scschema.xsd" );
 
 
 
  my $sc_dataset = $xml-&gt;create_element( "sc:dataset" );
 
 
 
  $urlset-&gt;appendChild( $sc_dataset ); 
 
  $sc_dataset-&gt;appendChild( _create_data( $xml,
 
    "sc:linkedDataPrefix",
 
    $repository-&gt;config( 'base_url' )."/id/",
 
    slicing =&gt; "subject-object", ));
 
  $sc_dataset-&gt;appendChild( _create_data( $xml,
 
    "sc:datasetURI",
 
    $repository-&gt;config( 'base_url' )."/id/repository" ));
 
 
 
 
 
  $sc_dataset-&gt;appendChild( _create_data( $xml,
 
    "sc:dataDumpLocation",
 
    $repository-&gt;config( 'base_url' )."/id/repository" ));
 
  $sc_dataset-&gt;appendChild( _create_data( $xml,
 
    "sc:dataDumpLocation",
 
    $repository-&gt;config( 'base_url' )."/id/dump" ));
 
 
 
  my $root_subject = $repository-&gt;dataset("subject")-&gt;dataobj("ROOT");
 
  foreach my $top_subject ( $root_subject-&gt;get_children )
 
  {
 
    $sc_dataset-&gt;appendChild( _create_data( $xml,
 
      "sc:dataDumpLocation",
 
      $top_subject-&gt;uri ) );
 
  } }
 
 
 
sub _create_data {
 
my( $xml, $name, $data, %attr ) = @_;
 
 
 
  my $node = $xml-&gt;create_element( $name, %attr );
 
  $node-&gt;appendChild( $xml-&gt;create_text_node( $data ));
 
 
 
  return $node; }
 
 
 
1;
 
  
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
Line 171: Line 55:
 
<!-- Pod2Wiki=head_copyright -->
 
<!-- Pod2Wiki=head_copyright -->
 
==COPYRIGHT==
 
==COPYRIGHT==
 +
{{API:Copyright}}
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<div style='background-color: #e8e8f; margin: 0.5em 0em 1em 0em; border: solid 1px #cce;  padding: 0em 1em 0em 1em; font-size: 80%; '>
 
<span style='display:none'>User Comments</span>
 
<span style='display:none'>User Comments</span>

Latest revision as of 17:11, 15 March 2023

EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects


API: Core API

Latest Source Code (3.4, 3.3) | Revision Log | Before editing this page please read Pod2Wiki


NAME

EPrints::Apache::SiteMap - Generates a dynamic sitemap.

User Comments


DESCRIPTION

This has been somewhat superseded by the bin/generate_sitemap script, which generates a more useful output for Google indexing.

This handler has been heavily modified in order to support a static sitemap.xml file in addition to the semantic web crawling extensions provided by EPrints. The modified handler inserts the semantic web crawling extensions into the existing sitemap.xml if it exists, or creates a new document if it doesn't. The original handler is now in the _insert_semantic_web_extensions below.

If the static sitemap XML is a sitemapindex, this handler inserts a new <sitemap> element into the index, which directs crawlers to a sitemap-sc.xml URL that contains the semantic web sitemap generated by _insert_semantic_web_extensions. This handler also implements the sitemap-sc.xml URL.


User Comments


METHODS

User Comments


handler

$rc = EPrints::Apache::SiteMap::handler( $r )

Handler for managing EPrints requests for dynamically generated sitemap.xml or sitemap-sc.xml (or returning static version if that exists).

User Comments


COPYRIGHT

© Copyright 2000-2024 University of Southampton.

EPrints 3.4 is supplied by EPrints Services.

http://www.eprints.org/eprints-3.4/

LICENSE

This file is part of EPrints 3.4 http://www.eprints.org/.

EPrints 3.4 and this file are released under the terms of the GNU Lesser General Public License version 3 as published by the Free Software Foundation unless otherwise stated.

EPrints 3.4 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with EPrints 3.4. If not, see http://www.gnu.org/licenses/.

User Comments