Translation

From EPrints Documentation
Revision as of 13:46, 28 June 2007 by Csirmaz (talk | contribs) (Language dependent date and time)
Jump to: navigation, search

Making translation to Eprints3 is similar to that of earlier versions. Here only some of the differences are pointed out. Also, there is a difference between just translating, or making a multilingual site. If you are making a translation, then with little effort you can also make your site speak English too, a courtesy for casual visitors of your repository.

Multilingual sites

Selecting the session's language

In a multilingual site you probably let the visitors to choose the language of the session. It is determined by the default setting of the browser preferences, but sometimes users want to change this. Manual setting of the session's language can be done by the [port] of the set_language script from earlier Eprints versions, and should go into the cgi directory. A handy place for the set_language URL is in the menu at the top of the page. You might want to edit /lang/en/templates/default.xml to include this possibility there:

 <ul class="ep_tm_menu">
  <li><a href="{$config{frontpage}}">Home</a></li>
  <li><a ref="{$config{perl_url}}/set_lang">Language</a></li>
   <li><a href="{$config{base_url}}/information.html">About</a></li>
  ... 
 </ul>

The template page on other languages should not contain this Language item as not neccessarily will people recognise it. Rather use a button which reverts the language to English:

<li><a href="{$config{perl_url}/set_lang?langid=en">In English</a></li>

If you want a little more fancy layout, you might consider using the [flag images] to be copied to the directory static/style/images/flags/. If you define a phrase (see #Phrases section) of the form "cgi/set_lang:lang_XX" then the set_lang script will use that phrase to render a link to that language. A typical format could be

 <epp:phrase id="cgi/set_lang:lang_hu"><epc:pin name="link">
     <img sec="/style/images/flags/flag_hu.png" alt="[hu]" /> 
      Hungarian</epc:pin>
 </epp:phrase>

Setting the language of outgoing e-mail messages

By default, all e-mail messages are sent out on the default language of the depository. You might let users choose their preferred language so that they'll receive messages on that language. The lang (system) field is defined for all users; the value can be set automatically in defaultcfg/cfg.d/user_fields_automatic.pl for example by

$c->{set_user_automatic_fields} = sub
{ my ( $user ) = @_;
   if( !$user -> is_set( "frequency" ) )
  {
        $user->set_value("frequency","never");
   }
   ## NEW: set default language to the session language
   if( !$user -> is_set( "lang" ) )
   {
        $user->set_value("lang",$user->{session}->{lang}->{id});
   }
}

and also let the user edit their own preference by inserting the lang field in the defaultcfg/workflows/user/default.xml workflow, say in the personal section:

  <component type="Field::Multi">
     <title><epc:phrase ref="user_section_personal" /></title>
     <field ref="name" required="yes" />
     <field ref="lang"/> <-- NEW!!! edit preferred language -->
     <field ref="dept"/>
     <field ref="org"/>
     <field ref="address"/>
     <field ref="country"/>
     <field ref="url"/>
   </component>

Finally, you might want the default language appear on the user's profile pages. This needs editing the cfg/cfg.d/user_render.pl file by inserting the following into the place of your choice (I coose just next to "country"):

 if( $user->is_set( "lang" ) )
 {
      $p->appendChild($session->>make_element( "br" ) );
      $p->appendChild($session->html_phrase("user_preferred_language",
                lang => $user->render_value( "lang" ) ) );
 }

here the new phrase "user_preferred_language" should be defined in one of the phrases files (see the #Phrases section); in it the pin "lang" contains the language itself:

 <epp:phrase id="user_preferred_language">Preferred language: 
      < epc:pin name="lang"/>
 </epp:phrase>

WARNING! While e-mail messages are sent out utf-8 encoded, they are not properly formatted. For non-English e-mails you must apply some patches from the #Patches section. Also, the text version of the e-mails does not containt the links. This can also be corrected by one of the patches.

Setting document language

Each document file has its own language. The same document might be submitted in different languages. In the default eprints workflow the reference to the language field is commented out; you only has to enable it. It is in file defaultcfg/workflows/eprint/default.xml, stage "files":

<stage name="files">
  <component type="XHTML">...</component>
  <component type="Upload">
     <field ref="format" />
     <field ref="formatdesc" />
     <field ref="security" />
     <field ref="license" />
     <field ref="date_embargo" />
     <field ref="language" /> <!-- UNCOMMENT! -->
 </component>
</stage>

The default value is set in cfg.d/document_fields_default.pl to the language of the session. The available values are listed in lib/defaultcfg/namedsets/languages, you might consider revising this set. The "undefined" language is a question mark (?).

You might also want to print the language as well. To this end, edit the citations/document/default.xml file. Show the language only when it is set, and the document's mime type is text/plain, application/pdf, application/postscript, application/msword, or other. (For a full list of available mime type see lib/defaultcfg/namedsets/document )

<cite:linkhere>...</cite:linkhere>
<epc:if test="security != 'public'"> ... </epc:if>
<!-- NEW!!! -->
<epc:if test="is_set(language) and format.one_of( 'text/plain' ,
    'application/pdf', 'application/postscript', 'application/msword', 'other') >
   <epc:phrase ref="citation:doc_language">
        <epc:param name="lang">
             <epc:print expr="language"/>
        </epc:param>
   </epc:phrase>
 </epc:if>

and, of course, one has to define the "citation:doc_language" phrase (see below) in all languages. In the phrase definition we have the pin "lang" which contains the document language:

 <epp:phrase id="citation:doc_language">
   (The document's language is <epc:pin name="lang"/>.)
</epp:phrase>

WARNING! The example above works only after you've applied some of the patches in the #Patches section.

Phrases

Other configuration files

Several language-independent configuration files contain hardwired English texts. All of them should be replaced by references to phrases, which, in turn, generate the phrases in all languages. The "offending" files are in the citations and the workflows directories.

Subject list

Patches

Apart from translating the language dependent phrase files, some other modification should be done for proper functioning. Theses patches are for Eprints version 3.0.1, please check your current version

  • proper utf-8 encoding for outgoing e-mails
  • citation phrases with embedded pins
  • phrases where the same link is used twice
  • expiration time for authentication codes (e.g. at registering) (hardcoded in English)
  • generation time for browse pages (harcoded in English)
  • user name rendering depends on session language

Proper utf-8 encoding for outgoing e-mails

For outgoing e-mails the charset should be "utf-8". The mail headers (From, To, Subject, Reply-To, etc lines) might contain a restricted set of English characters only, other characters should be encoded. This EPrints.pm.diff corrects these, PLUS it enables the URL text from links into the text version, PLUS converts wide characters to bytes (the Net::SMTP perl routine complains).

See the tech list references [5762], [5783] and [7114].

Citation phrases with embedded pins

You need to apply the second half of the XML/EPC.pm.diff patch if want to use citations with embedded pins. See tech list [7236].

Phrases using the same link twice

If in a phrase you are using the same link pin twice, the second time it does not render correctly. E.g. if you want a picture AND a text refer to the same link, such as

<ep:phrase id="two_links">
  <epc:pin ref="link"><img src="img.png" alt="[img]></epc:pin>
  <epc:pin ref="link">Click here</epc:pin>
<ep:phrase>

For both links to work correctly please apply the first half of the XML/EPC.pm.diff patch. See tech list [6960].

Language dependent date and time

The wording of the grace period (needed when someone registers or changes the email address) is generated in the Eprints/Time.pm script by the human_delay() routine. Also, the creation time at the bottom of browse pages are created here by human_time()'.

Human_delay() always creates English text, while human_time() uses the server's locale settings. When using in multilanguage environment none of them does the right action. Both should produce an XML fragment (rather than a plain string), and should receive the current $session as an argument so they can decide on the correct format.

Peeking into the code, human_delay() is invoked only in the cgi scripts register and reset_password, while human_time() shows up only in the admin script generate_views. There are other invocations of human_time where it serves as a timestamp (index, saved search, epadmin/reload). There the time format should not depend on the actual session.

Rendering user names depending on session language

All fields can have their separate rendering routine. This routine should be given as the value of the render_single_value attribute. In our case modify the top of cfg.d/user_fields.pl file which contains the definition of the user fields as follows:

 $c->{fields}->{user} = [
     {
         'name' => 'name',
         'type'   => 'name',
         'render_order' => 'gf',
         'render_single_value' => \&my_namefield_rendering,
     },
     ...

and insert the definition of my_namefield_rendering e.g. at the end of the same file:

sub my_namefield_rendering
{
   my ($session,$field,$value,$object)=@_;
   my $langid = $session->{lang}->{id}; 
   my $format = {
# format: f, g - first, given; h - honourific, l - lineage
     'en' => 'hfl,g',
     'hu' => 'hlfg',
# Further lines should be added for other used languages
   } -> {$langid}
   my $all="";
   foreach my $fmtchar ( split //, $format ) {
       my $insert="";
       if( $fmtchar eq "l" ) {$insert = $value->{lineage}; }
       elsif( $fmtchar eq "f" ) { $insert = $value->{family}; }
       elsif( $fmtchar eq "g" ) { $insert = $value->{given}; }
       elsif( $fmtchar eq "h" ) { $insert = $value->{honourific}; }
       elsif( $all ){ $all .= $fmtchar; }
       next if( ! $insert );
       $all .= $insert;
   }
   my $span=$session->make_element("span",class=>"person_name");
   $span->appendChild($session->make_text($all));
   $return $span;
}

Indexing