Adding multilang fields

From EPrints Documentation
Revision as of 12:19, 7 June 2016 by Mamalos@eng.auth.gr (talk | contribs) (Adding the appropriate phrases)
Jump to: navigation, search

EPrints' builtin fields are not multilingual, in the sense that there is only one version of each field -indirectly this means one language support. This page explains how to add multilingual versions of existing fields in EPrints and how they can be integrated with its subsystems.

Multilang fields and EPrints

EPrints supports the multilang field type (see Multilang for more details) which allows a user to insert different content for different languages. There are a few limitations with multilang fields though:

  • When its value is printed it is shown in all languages containing content.
  • If a basic EPrints field's type is replaced to become multilang (like title and abstract), EPrints functionality breaks because it expects a single output.

To address these limitations. this article explains how we can replace a basic field, the title field as an example, with a multilingual one.

Warning for already populated repositories

Once you follow this article's procedure, existing titles and abstracts will not be copied into their multilingual counterparts and access to them via the API will be lost!

How to add a custom, multilingual field

Replacing an EPrints basic field, like the title field, involves a few steps. First, a new field needs to be created that will be able to store information for different languages; this field will be of type multilang. Next, the basic field's type needs to be replaced with one that is able to use our newly created field as its storage place. This field type will use a function wrapper for storing and retrieving information from the multilang field, hence the title field will become a calculated field.

So, in order to add multilingual support for the title field, the following actions need to take place:

  • A new field type needs to be created that will help our title field to implement some of EPrints logic. Our field type is called virtualwithvalue.
  • A new multilang field needs to be created that will store our multilingual information. We will call this field ml_title.
  • ml_title field and title field's functionality need to be introduced to the EPrints system via a configuration file located in ~eprints/archives/<reponame>/cfg/cfg.d/.
  • EPrints' database needs to be updated to include the new field.
  • The appropriate phrases need to be added for the ml_title field on each supported language.
  • The title field needs to be replaced with ml_title field in the workflow.
  • ml_title field in the workflow will needs a custom lookup script.
  • The title field needs to be replaced with the ml_title field in the simple and advanced search scripts.
  • The repository needs to be reloaded.
  • Static files need to be regenerated if our repository already contains data.

The following sections explain each step in detail, and title' and ml_title are used as our example fields. The code snippets are just for demonstration purposes - and proof of concept. If you want to see the final, working implementation, you should look at the source code of the MultiLang_Fields_Bazaar_Package plugin.

Adding a new field type (VirtualWithValue)

In order to create a multiple-language field we have to create an appropriate field type. EPrints' MetaField (see MetaField article for details) is a perfect candidate for this, and we need to extend it and override its get_value and set_value functions for the field to work properly with the rest of EPrints API, as well as its get_property_defaults function to sort out warnings for default values. The following code could be a rough implementation of such a field:

package EPrints::MetaField::Virtualwithvalue;

use strict;
use warnings;

use EPrints::MetaField;

our @ISA = qw( EPrints::MetaField );

use strict;

sub get_property_defaults
{
    my ( $self ) = @_;
    my %defaults = $self->SUPER::get_property_defaults;
    $defaults{get_value} = $EPrints::MetaField::UNDEF;
    $defaults{set_value} = $EPrints::MetaField::UNDEF;

    return %defaults;
}

sub get_value
{
    my( $self, $object ) = @_;
    if ( defined $self->get_property("get_value") )
    {
        return $self->call_property( "get_value", $object);
    }
    return undef;
}

sub set_value
{
     my( $self, $object, $value ) = @_;
     if ( defined $self->get_property("set_value") )
     {
         return $self->call_property( "set_value", $object, $value);
     }
     return undef;
}

We could save this file in ~eprints/lib/plugins/EPrints/MetaField/Virtualwithvalue.pm.

Introducing ml_title field in EPrints and replacing title field's type

To inform EPrints about our new field (that will be of type virtualwithvalue), we should create a configuration file, eg: ~eprints/archives/<reponame>/cfg/cfg.d/zz_multilang_field.pl with content like the following:

#define local fields
my $local_fields = [
{
    name => 'ml_title',
    type => 'multilang',
    multiple => 1,
    fields => [ { sub_name => "text", type => "longtext", input_rows => 3, make_single_value_orderkey => 'EPrints::Extras::english_title_orderkey' } ],
    input_add_boxes => 1,
},

{
    name => 'title',
    type => 'virtualwithvalue',
    virtual => 1,

    get_value => sub
    {
        my ($eprint) = @_;
        if ($eprint->is_set('ml_title'))
        {
            my $lang = $eprint->repository->get_langid;
            my $lang_set = 0;
            my $vals = $eprint->get_value('ml_title');
            my $title = '';
            if (!$lang)
            {
                $lang_set = 1;
            }
            else
            {
                # set the default lang's text as title
                foreach my $v1 (@{$vals})
                {
                    if ($v1->{lang} eq $lang)
                    {
                        $title = $v1->{text};
                    }
                }
            }
            # if the language is not set or I can't find an abstract in the 
            # user's language, get the first object's text as abstract
            if ($lang_set or $title eq '')
            {
                $title = $vals->[0]->{text};
            }
            return $title;

        }
        return undef;
    },

    set_value => sub
    {
        my ($eprint, $value) = @_;
        my $lang = 'en';
        #only use this on imports, NOT if the value is already set
        if ($eprint->is_set('ml_title'))
        {
            return;
        }
        if ($value)
        {
            $eprint->set_value('ml_title', [{lang=>$lang, text=>$value}]);
        }
    }
},
];

#create lookup hash of local field names
my $local_fieldnames = {};

foreach my $f (@{$local_fields})
{
    $local_fieldnames->{$f->{name}} = 1;
}

#merge in existing field configurations
foreach my $f (@{$c->{fields}->{eprint}})
{
    if (!$local_fieldnames->{$f->{name}})
    {
     push @{$local_fields}, $f;
    }
}

#overwrite original array of configured fields
$c->{fields}->{eprint} = $local_fields;

Where we can see that our new ml_title field is of type multilang and the title field's type has become virtualwithvalue. Moreover, the title field now implements the two aforementioned functions: get_value and set_value. Both these functions -whose names imply their functionality- are used by EPrints API, and their existence, as well as their return values, are critical for EPrints to work properly. The last statements of our example code show how a custom field can be added in the list of EPrints fields.

What has happened in effect is that the title field has become a calculated field that gets or sets its value via its corresponding multilang field (ml_title). So, each new record now has a multilang field which it can access via the interface provided by the calculated field of type virtualwithvalue, which is no other than EPrints' basic title field.

The reason we didn't set the title field to be of type multilang in the first place is that many EPrints builtin functions expect only a single value from the title (and abstract) field, and multilang fields don't support such functionality. Hence, doing so would cause EPrints to throw errors. By using and extending calculated fields (like MetaFields), we can calculate and produce always a single output for our title field (using the data stored in ml_title field) via its get_value function; its set_value function is used for populating our ml_title field's values. Our sample code prints output based on the user's language settings, but the programmer can do whatever they wish when overriding these functions, as long as their code return values that comply with EPrints' API.

Adding the appropriate phrases

EPrints phrases are not aware of our new field (ml_title), so we need to update them. In our example we update the phrases for the English and for the Greek languages (en and el respectively). We chose to add new files instead of changing the default ones so as to help EPrints upgrades. So, for the English language we can add the file ~eprints/archives/reponame/cfg/lang/en/phrases/local.xml that contains the following information:

<epp:phrase id="eprint_fieldname_ml_title">Title</epp:phrase>
<epp:phrase id="eprint_fieldname_ml_title_text">Text</epp:phrase>
<epp:phrase id="eprint_fieldname_ml_title_lang">Language</epp:phrase>
<epp:phrase id="eprint_fieldhelp_ml_title">The title of the item. The title should not end with a full stop, but may end with a question mark. There is no way to make italic text, please enter it normally. If you have a subtitle, it should be preceded with a colon [:]. Use capitals only for the first word and for proper nouns.
    <br/>Example: <span class="ep_form_example">A brief history of time</span>
    <br/>Example: <span class="ep_form_example">Life: an unauthorised biography</span>
    <br/>Example: <span class="ep_form_example">Mathematics for engineers and scientists. 5th edition</span>
    <br/>Example: <span class="ep_form_example">Ecosystems of the world. Vol. 26. Estuaries of the world</span>
</epp:phrase>

Greek phrases can be added in file ~eprints/archives/reponame/cfg/lang/el/phrases/local.xml:

<epp:phrase id="eprint_fieldname_ml_title">Τίτλος</epp:phrase>
<epp:phrase id="eprint_fieldname_ml_title_text">Κείμενο</epp:phrase>
<epp:phrase id="eprint_fieldname_ml_title_help">Το help τεξτ</epp:phrase>
<epp:phrase id="eprint_fieldname_ml_title_lang">Γλώσσα</epp:phrase>
<epp:phrase id="eprint_fieldhelp_ml_title">Ο τίτλος του τεκμηρίου. Ο τίτλος δεν πρέπει να τελειώνει με τελεία, αλλά μπορεί να τελειώνει με ερωτηματικό. Δεν υπάρχει τρόπος να γράψετε με πλάγια γράμματα, παρακαλώ χρησιμοποιήστε απλό κείμενο. Εάν έχετε έναν υπότιτλο, θα πρέπει να προηγείται η άνω και κάτω τελεία του υπότιτλου [:]. Χρησιμοποιήστε κεφαλαία γράμματα μόνο στην πρώτη λέξη και στα κύρια ονόματα.
    <br/>Παράδειγμα: <span class="ep_form_example">Μια σύντομη ιστορία</span>
    <br/>Παράδειγμα: <span class="ep_form_example">Καβάφης: η βιογραφία</span>
    <br/>Παράδειγμα: <span class="ep_form_example">Μαθηματικά για μηχανικούς και επιστήμονες. 5η έκδοση</span>
    <br/>Παράδειγμα: <span class="ep_form_example">Οικοσυστήματα του πλανήτη. Τόμ. 26. Εκβολές του πλανήτη.</span>
</epp:phrase>

Replacing the title field with ml_title field in the workflow

In order to use our new ml_title field in the workflow, we need to replace the existing one (title). So, the title needs to be commented out and the new multilingual one need is added. This means that ~eprints/archives/reponame/cfg/workflows/eprint/default.xml is edited as follows:

<!--    
    <component><field ref="title" required="yes" input_lookup_url="{$config{rel_cgipath}}/users/lookup/title_duplicates" input_lookup_params="id={eprintid}&amp;dataset=eprint&amp;field=title" /></component>
    <component><field ref="abstract"/></component>
-->
 
    <component><field ref="ml_title" required="yes" input_lookup_url="{$config{rel_cgipath}}/users/lookup/ml_title_duplicates" input_lookup_params="id={eprintid}&amp;dataset=eprint&amp;field=ml_title"/></component>
    <component><field ref="ml_abstract"/></component>

As can be seen, the default lookup script is replaced by the plugin's lookup script (ml_title_duplicates) which supports our new ml_title field.

Updating EPrints database to include the ml_title field

EPrints database is updated when the user eprints executes (from his home directory):

$ ./bin/epadmin update reponame


Adding a custom lookup script for ml_title autocompletion

We copy the default lookup script to a new one (~eprints/cgi/users/lookup/ml_title_duplicates in our example):

$ cp ~eprints/cgi/users/lookup/title_duplicates ~eprints/cgi/users/lookup/ml_title_duplicates

and edit the SQL statement to contain ml_title instead of title (line 70 in EPrints 3.3.14):

my $sql = "SELECT ep.eprintid, ml_title_text FROM eprint AS ep JOIN eprint_ml_title_text AS ml ON ep.eprintid = ml.eprintid WHERE ";
if ($dataset_name eq "eprint") {
    $sql .= " $Q_eprint_status=" .  $db->quote_value( "archive" ) . " AND ";
}
$sql .= "ml.ml_title_text IS NOT NULL" .
    " AND ml.ml_title_text " .
    $db->sql_LIKE() .
    $db->quote_value( EPrints::Database::prep_like_value( $q ) . '%' );

Adding search support for the ml_title field

We should allow our search scripts to be able to search into ml_title instead of title, since title is now a calculated field and contains no information in EPrints database. To do so, we add two configuration files that add support for each search respectively. We chose not to change EPrints default search configuration files in order to to affect future EPrints upgrades. So, for the simple search we add the file ~eprints/archives/reponame/cfg/cfg.d/eprint_search_simple.pl with the content:

$c->{search}->{simple} = 
{
    search_fields => [
        {
            id => "q",
            meta_fields => [
                "documents",
                "ml_title",
                "abstract",
                "creators_name",
                "date" 
            ]
        },
    ],
#    preamble_phrase => "cgi/search:preamble",
    title_phrase => "cgi/search:simple_search",
    citation => "result",
    page_size => 20,
    order_methods => {
        "byyear"      => "-date/creators_name/title",
        "byyearoldest"     => "date/creators_name/title",
        "byname"       => "creators_name/-date/title",
        "bytitle"      => "title/creators_name/-date" 
    },
    default_order => "byyear",
    show_zero_results => 1,
};

And for the advanced search we add the file: ~eprints/archives/reponame/cfg/cfg.d/eprint_search_advanced_local.pl that reads:

$c->{search}->{advanced} =
{
    search_fields => [
        { meta_fields => [ "documents" ] },
        { meta_fields => [ "ml_title" ] },
        { meta_fields => [ "creators_name" ] },
        { meta_fields => [ "abstract" ] },
        { meta_fields => [ "date" ] },
        { meta_fields => [ "keywords" ] },
        { meta_fields => [ "subjects" ] },
        { meta_fields => [ "type" ] },
        { meta_fields => [ "department" ] },
        { meta_fields => [ "editors_name" ] },
        { meta_fields => [ "ispublished" ] },
        { meta_fields => [ "refereed" ] },
        { meta_fields => [ "publication" ] },
        { meta_fields => [ "documents.format" ] },
    ],
    preamble_phrase => "cgi/advsearch:preamble",
    title_phrase => "cgi/advsearch:adv_search",
    citation => "result",
    page_size => 20,
    order_methods => {
        "byyear"     => "-date/creators_name/title",
        "byyearoldest"   => "date/creators_name/title",
        "byname"     => "creators_name/-date/title",
        "bytitle"    => "title/creators_name/-date" 
    },
    default_order => "byyear",
    show_zero_results => 1,
};

Reloading our repository

In order for our changes to take effect, we should reload our repository by running:

$ ./bin/epadmin reload reponame

within eprints user's home directory as user eprints.

Regenerating static files and abstracts

If our repository already contained records, we need to recreate static content such as static pages and abstracts. Hence, as eprints user we should run:

$ ./bin/generate_abstracts reponame
$ ./bin/generate_static reponame

within eprints user's home directory.