Difference between revisions of "API:EPrints/Plugin/Search/Xapian"
| m (reformatted and some typos corrected) |  (→DESCRIPTION:  Default qs for 'Internal' search is 0, so setting a value of 0.1 as previously suggested doesn't stop if being used. Now suggests a value of -1.) | ||
| Line 58: | Line 58: | ||
| As Xapian has a higher 'qs' score than Internal it will (once enabled) override the default EPrints simple search. You can override this behaviour in '''cfg.d/plugins.pl''': | As Xapian has a higher 'qs' score than Internal it will (once enabled) override the default EPrints simple search. You can override this behaviour in '''cfg.d/plugins.pl''': | ||
| − | <pre>  $c->{plugins}{'Search::Xapian'}{params}{qs} =  | + | <pre>  $c->{plugins}{'Search::Xapian'}{params}{qs} = -1;</pre> | 
| Or disable completely (including disabling indexing): | Or disable completely (including disabling indexing): | ||
| Line 69: | Line 69: | ||
| <!-- Pod2Wiki= --> | <!-- Pod2Wiki= --> | ||
| <!-- Pod2Wiki=head_usage --> | <!-- Pod2Wiki=head_usage --> | ||
| + | |||
| ==USAGE== | ==USAGE== | ||
| Install the [http://search.cpan.org/search?query=xapian&mode=dist Search::Xapian] extension. Note: there are two Perl bindings available for Xapian. The CPAN version is older and based on Perl-XS. xapian-bindings-perl available from xapian.org is based on SWIG and has better coverage of the API. Regardless, for the best feature support/performance it is highly recommended to have the latest stable version of the Xapian library. | Install the [http://search.cpan.org/search?query=xapian&mode=dist Search::Xapian] extension. Note: there are two Perl bindings available for Xapian. The CPAN version is older and based on Perl-XS. xapian-bindings-perl available from xapian.org is based on SWIG and has better coverage of the API. Regardless, for the best feature support/performance it is highly recommended to have the latest stable version of the Xapian library. | ||
Latest revision as of 12:40, 28 January 2025
EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects
Latest Source Code (3.4, 3.3) | Revision Log | Before editing this page please read Pod2Wiki
Contents
NAME
EPrints::Plugin::Search::Xapian
DESCRIPTION
Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
Xapian currently only supports simple searches.
Xapian simple searches are parsed by the Xapian query parser which supports prefixes for search terms:
title:(eagle buzzard) abstract:"london wetlands"
The field prefixes are taken from the search configuration and constrain the following term (or bracketed terms) to that field only. If no prefix is given the entire Xapian index will be used i.e. it will search any indexed term, not just those from the search configuration fields. For example, the following simple search configuration:
  search_fields => [
    {
      id => "q",
         meta_fields => [
           "documents",
           "title",
           "abstract",
           "creators_name",
           "date"
         ]
    },
  ],
Allows the user to specify "documents", "title", "abstract", "creators_name" or "date" as a prefix to a search term. Omitting a prefix will match any field e.g. "publisher".
Terms can be negated by prefixing the term with '-':
eagle -buzzard
Phrases can be specified by using quotes, for example "Southampton University" won't match University of Southampton.
Terms are stemmed by default ('bubbles' becomes 'bubble') except if you use the term in a phrase.
Partial matches are supported by using '*':
ameri* - americans, americas, amerillo etc.
Xapian search results are returned in a sub-class of EPrints::List (a wrapper around a Xapian enquire object). Calling EPrints::List/count will return an estimate of the total matches.
As Xapian has a higher 'qs' score than Internal it will (once enabled) override the default EPrints simple search. You can override this behaviour in cfg.d/plugins.pl:
  $c->{plugins}{'Search::Xapian'}{params}{qs} = -1;
Or disable completely (including disabling indexing):
  $c->{plugins}{'Search::Xapian'}{params}{disable} = 1;
USAGE
Install the Search::Xapian extension. Note: there are two Perl bindings available for Xapian. The CPAN version is older and based on Perl-XS. xapian-bindings-perl available from xapian.org is based on SWIG and has better coverage of the API. Regardless, for the best feature support/performance it is highly recommended to have the latest stable version of the Xapian library.
Xapian uses a separate (from MySQL) index that is stored in archives/[archiveid]/var/xapian. To build the Xapian index you will need to reindex:
./bin/epadmin reindex [archiveid] eprint
(Repeat for any other datasets you expect to use Xapian with.)
The var/xapian/ directory should contain something like:
  flintlock  position.baseA  position.DB     postlist.baseB  
       record.baseA  record.DB       termlist.baseB
  iamchert   position.baseB  postlist.baseA  postlist.DB     
       record.baseB  termlist.baseA  termlist.DB
The indexing process for Xapian is in lib/cfg.d/search_xapian.pl. This can be overridden by dropping the same-named file into your repository archives/[archiveid]/cfg.d/. If the Xapian search is not matching what you might expect it to, you probably need to fix the indexing process (and re-index!). Terms indexed by Xapian can also be weighted to e.g. give names a higher weighting than abstract text.
You will need to restart your Apache server to enable the Xapian search plugin and dependencies.
If the Xapian search is working correctly you will have a "by relevance" option available in the ordering of simple search results.
Lock Files
Xapian maintains a lock file in archives/[archiveid]/var/xapian. If you see indexing errors about not being able to lock the database ensure you aren't running multiple copies of the EPrints indexer. If no other processes are running you may need to manually remove the lock file from the var/xapian directory. While only one process may modify the Xapian index at a time, any number of processes may concurrently read.
PARAMETERS
- lang
- Override the default language used for stemming.
- stopwords
- An array reference of stop words to use (defaults to English).
METHODS
stemmer
$stemmer = $plugin->stemmer()
Returns a Search::Xapian::Stem for the default language.
stopper
$stopper = $plugin->stopper()
Returns a Search::Xapian::SimpleStopper for stopwords.
COPYRIGHT
Copyright 2000-2011 University of Southampton.
This file is part of EPrints http://www.eprints.org/.
EPrints is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
EPrints is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with EPrints. If not, see http://www.gnu.org/licenses/.
