API:EPrints/Index/Tokenizer

From EPrints Documentation
Jump to: navigation, search

EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects


API: Core API

Latest Source Code (3.4, 3.3) | Revision Log | Before editing this page please read Pod2Wiki


NAME

EPrints::Index::Tokenizer - text indexing utility methods.

User Comments


DESCRIPTION

This module provides utility methods for processing free text into indexable things.

User Comments


METHODS

User Comments


split_words

@words = EPrints::Index::Tokenizer::split_words( $session, $utext )

Splits a utf8 string $utext into individual words.

User Comments


split_search_value

@terms = EPrints::Index::Tokenizer::split_search_value( $session, $value )

Splits and returns $value into search terms.

User Comments


apply_mapping

$utext2 = EPrints::Index::Tokenizer::apply_mapping( $session, $utext )

Replaces certain unicode characters in $utext with ASCII equivalents and returns the new string.

This is used before indexing words so that things like umlauts will be ignored when searching.

User Comments


COPYRIGHT

© Copyright 2000-2024 University of Southampton.

EPrints 3.4 is supplied by EPrints Services.

http://www.eprints.org/eprints-3.4/

LICENSE

This file is part of EPrints 3.4 http://www.eprints.org/.

EPrints 3.4 and this file are released under the terms of the GNU Lesser General Public License version 3 as published by the Free Software Foundation unless otherwise stated.

EPrints 3.4 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with EPrints 3.4. If not, see http://www.gnu.org/licenses/.

User Comments