API:EPrints/Index/Tokenizer
EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects
Latest Source Code (3.4, 3.3) | Revision Log | Before editing this page please read Pod2Wiki
Contents
NAME
EPrints::Index::Tokenizer - text indexing utility methods.
DESCRIPTION
This module provides utility methods for processing free text into indexable things.
METHODS
split_words
@words = EPrints::Index::Tokenizer::split_words( $session, $utext )
Splits a utf8 string $utext into individual words.
split_search_value
@terms = EPrints::Index::Tokenizer::split_search_value( $session, $value )
Splits and returns $value into search terms.
apply_mapping
$utext2 = EPrints::Index::Tokenizer::apply_mapping( $session, $utext )
Replaces certain unicode characters in $utext with ASCII equivalents and returns the new string.
This is used before indexing words so that things like umlauts will be ignored when searching.
COPYRIGHT
© Copyright 2000-2024 University of Southampton.
EPrints 3.4 is supplied by EPrints Services.
http://www.eprints.org/eprints-3.4/
LICENSE
This file is part of EPrints 3.4 http://www.eprints.org/.
EPrints 3.4 and this file are released under the terms of the GNU Lesser General Public License version 3 as published by the Free Software Foundation unless otherwise stated.
EPrints 3.4 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with EPrints 3.4. If not, see http://www.gnu.org/licenses/.