Difference between revisions of "Indexing.pl"
m |
(Added actually file name in bold.) |
||
Line 2: | Line 2: | ||
{{cfgd}} | {{cfgd}} | ||
− | + | '''indexing.pl''' contains configuration for indexing data objects. | |
In particular this has configuration for whether indexing is enabled and if so the following configuration rules: | In particular this has configuration for whether indexing is enabled and if so the following configuration rules: |
Latest revision as of 10:21, 30 January 2022
EPrints 3 Reference: Directory Structure - Metadata Fields - Repository Configuration - XML Config Files - XML Export Format - EPrints data structure - Core API - Data Objects
indexing.pl contains configuration for indexing data objects.
In particular this has configuration for whether indexing is enabled and if so the following configuration rules:
$c->{indexing}->{freetext_min_word_size}
- The minimum length a word in free-text field has to be to be indexed. The default is 3.$c->{indexing}->{freetext_stop_words}
- Words that should not be indexed in free-text fields, as they are too common (e.g. and, are, the, you, etc.).$c->{indexing}->{freetext_seperator_chars}
- Characters that separate two separate words in a free-text field (e.g. colon :, equals = hyphen -, full stop ., space , etc.). N.B. seperator was a typo in the codebase that cannot now be fixed for legacy reasons.
The file also contains the extract_words function for how individual words should be extracted from free-text. This may vary across different types of repository and some repositories may have edge cases they need to handle, so this has be purposefully designed as a user-defined function to facilitate bespoke requirements.