EPrints::Index::Tokenizer - text indexing utility methods.
This module provides utility methods for processing free text into indexable things.
@words = EPrints::Index::Tokenizer::split_words( $session, $utext )
Splits a utf8 string $utext into individual words.
@terms = EPrints::Index::Tokenizer::split_search_value( $session, $value )
Splits and returns $value into search terms.
$utext2 = EPrints::Index::Tokenizer::apply_mapping( $session, $utext )
Replaces certain unicode characters in $utext with ASCII equivalents and returns the new string.
This is used before indexing words so that things like umlauts will be ignored when searching.
© Copyright 2023 University of Southampton.
EPrints 3.4 is supplied by EPrints Services.
This file is part of EPrints 3.4 http://www.eprints.org/.
EPrints 3.4 and this file are released under the terms of the GNU Lesser General Public License version 3 as published by the Free Software Foundation unless otherwise stated.
EPrints 3.4 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with EPrints 3.4. If not, see http://www.gnu.org/licenses/.