Translation

From EPrints Documentation
Revision as of 12:47, 4 July 2007 by Csirmaz (talk | contribs) (Phrases)
Jump to: navigation, search

Making translation to Eprints3 is similar to that of earlier versions. Here only some of the differences are pointed out. Also, there is a difference between just translating, or making a multilingual site. If you are making a translation, then with little effort you can also make your site speak English too, a courtesy for casual visitors of your repository.

Multilingual sites

In order to costumize EPrints you should make several changes to the configuration files. It is a good time to make yourself familiarize with the EPrints3 Directory Structure. Certain files are common for all archives – change these files only if you really know what your are doing. Other files are repository specific, any change affects that repository only.

Repository-dependent files are below archives, each repository has it own substructure. When creating a new repository, configuration files are copied from lib/defaultcfg. Thus making changes in lib/defaultcfg or below will effect all newly created repositories but will have no effect on existing repositories. Probably you want to make certain changes before creating repositories, e.g. adding further languages to lib/defaultcfg/lang (and also to lib/lang), changing the default language and available languages in /lib/defaultcfg/cfg.d/languages.pl, correcting files in lib/defaultcfg/citations and lib/defaultcfg/workflows, etc.

In the description below we assume that you've set the repository, and use directories and file names, whenever appropriate, relative to that repository. It should be straightforward to make these changes so that they will have effect for all newly created archives.

Selecting the session's language

In a multilingual site you probably let the visitors to choose the language of the session. It is determined by the default setting of the browser preferences, but sometimes users want to change this. Manual setting of the session's language can be done by the port of the set_language script from earlier Eprints versions, and should go into the cgi directory. A handy place for the set_language URL is in the menu at the top of the page. As the the default page template is repository dependent, you should edit archives/ArchiveID/lang/en/templates/default.xml, the English language template for that repository as follows (but see the remarks above):

 <ul class="ep_tm_menu">
  <li><a href="{$config{frontpage}}">Home</a></li>
  <li><a ref="{$config{perl_url}}/set_lang">Language</a></li>
   <li><a href="{$config{base_url}}/information.html">About</a></li>
  ... 
 </ul>

The template page on other languages should not contain this Language item as not neccessarily will people recognise it. Rather use a button which reverts the language to English, i.e. insert the following into archives/ArchiveID/lang/XX/templates/default.xml

<li><a href="{$config{perl_url}/set_lang?langid=en">In English</a></li>

If you want a little more fancy layout, you might consider using flag images to be copied to the directory lib/static/style/images/flags/. If you define a phrase (see the #Phrases section) of the form "cgi/set_lang:lang_XX" then the set_lang script will use that phrase to render a link to that language. A typical format could be

 <epp:phrase id="cgi/set_lang:lang_hu"><epc:pin name="link">
     <img sec="/style/images/flags/flag_hu.png" alt="[hu]" /> 
      Hungarian</epc:pin>
 </epp:phrase>

Setting the language of outgoing e-mail messages

By default, all e-mail messages are sent out on the default language of the depository. You might let the users choose their preferred language so that they'll receive messages on that language. The lang (system) field is defined for all users; the value can be set automatically in cfg/cfg.d/user_fields_automatic.pl for example by

$c->{set_user_automatic_fields} = sub
{ my ( $user ) = @_;
   if( !$user -> is_set( "frequency" ) )
  {
        $user->set_value("frequency","never");
   }
   ## NEW: set default language to the session language
   if( !$user -> is_set( "lang" ) )
   {
        $user->set_value("lang",$user->{session}->{lang}->{id});
   }
}

and also let the user edit their own preference by inserting the lang field in the cfg/workflows/user/default.xml workflow, say in the personal section:

  <component type="Field::Multi">
     <title><epc:phrase ref="user_section_personal" /></title>
     <field ref="name" required="yes" />
     <field ref="lang"/> <-- NEW!!! edit preferred language -->
     <field ref="dept"/>
     <field ref="org"/>
     <field ref="address"/>
     <field ref="country"/>
     <field ref="url"/>
   </component>

Finally, you might want the default language appear on the user's profile pages. This needs editing the cfg/cfg.d/user_render.pl file by inserting the following into the place of your choice (I coose just next to "country"):

 if( $user->is_set( "lang" ) )
 {
      $p->appendChild($session->>make_element( "br" ) );
      $p->appendChild($session->html_phrase("user_preferred_language",
                lang => $user->render_value( "lang" ) ) );
 }

here the new phrase "user_preferred_language" should be defined in one of the phrase files (see the #Phrases section); in it the pin "lang" contains the language itself:

 <epp:phrase id="user_preferred_language">Preferred language: 
      < epc:pin name="lang"/>
 </epp:phrase>

WARNING! While e-mail messages are sent out utf-8 encoded, they are not properly formatted. For non-English e-mails you must apply the "Proper utf-8 encoding for outgoing e-mails" patch from the #Patches section. Also, by default, the text version of the e-mail does not contain links. This is also corrected by this patch.

Setting document language

Each document file has its own language. The same document might be submitted in different languages. In the default eprints workflow the reference to the language field is commented out; you only has to enable it. It is in file cfg/workflows/eprint/default.xml, stage "files":

<stage name="files">
  <component type="XHTML">...</component>
  <component type="Upload">
     <field ref="format" />
     <field ref="formatdesc" />
     <field ref="security" />
     <field ref="license" />
     <field ref="date_embargo" />
     <field ref="language" /> <!-- UNCOMMENT! -->
 </component>
</stage>

The default value is set in cfg/cfg.d/document_fields_default.pl to the language of the session. The available values are listed in cfg/namedsets/languages, you might consider revising this set. The "undefined" language is a question mark (?).

You might also want to print the language as well. To this end, edit the cfg/citations/document/default.xml file. Show the language only when it is set, and the document's mime type is text/plain, application/pdf, application/postscript, application/msword, or other. (For a full list of available mime type see cfg/namedsets/document )

<cite:linkhere>...</cite:linkhere>
<epc:if test="security != 'public'"> ... </epc:if>
<!-- NEW!!! -->
<epc:if test="is_set(language) and format.one_of( 'text/plain' ,
    'application/pdf', 'application/postscript', 'application/msword', 'other') >
   <epc:phrase ref="citation:doc_language">
        <epc:param name="lang">
             <epc:print expr="language"/>
        </epc:param>
   </epc:phrase>
 </epc:if>

and, of course, one has to define the "citation:doc_language" phrase (see below) in all languages. In the phrase definition we have the pin "lang" which contains the document language in the actual language (defined as the phrase "languages_typename_XX"):

 <epp:phrase id="citation:doc_language">
   (The document's language is <epc:pin name="lang"/>.)
</epp:phrase>

WARNING! The example above works only after you've applied the "Citation phrases with embedded pins" patch in the #Patches section.

Phrases

Phrase defintions are scattered in several places. There are repository independent ones and repository dependent ones. The latter has preference over the former: if the same entity is defined in both places then the repository dependent definition takes precedence.

Phrases are parsed and stored during web server initialization for all repositories. If you make any change in any of the phrase files, you must restart the apache server to see the effect.

System-wide (i.e. repository independent) phrases are contained in the files in lib/lang/XX/phrases/ where XX is the language code. Repository dependent phrase files are in archives/ARCHIVEID/cfg/lang/XX/phrases/. In both directories all files with extension ".xml" are scanned and read in in alphabetical order. You can split the data into several files instead having a single a huge phrase file.

All phrase files must have proper XML syntax. Use existing files as templates. The very first line tells you which encoding is applied in the file:

<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>

If no encoding is given, then utf-8 is assumed (and you get error messages if the actual encoding is not that one). Set the encoding into your favourite one. Don't forget that when using iso-8859-X then you can enter a limited set of characters only.

You can define as many new phrases as you want, and refer them in other phrases. When using a phrase, the last existing definition is used. This means that you can have forward references with no problem.


Other configuration files

Several language-independent configuration files contain hardwired English texts. All of them should be replaced by references to phrases, which, in turn, generate the phrases in all languages. The "offending" files are in the citations and the workflows directories.

Subject list

Patches

Apart from translating the language dependent phrase files, some other modification should be done for proper functioning. Theses patches are for Eprints version 3.0.1, please check your current version

  • proper utf-8 encoding for outgoing e-mails
  • citation phrases with embedded pins
  • phrases where the same link is used twice
  • expiration time for authentication codes (e.g. at registering) (hardcoded in English)
  • generation time for browse pages (harcoded in English)
  • user name rendering depends on session language

Proper utf-8 encoding for outgoing e-mails

For outgoing e-mails the charset should be "utf-8". The mail headers (From, To, Subject, Reply-To, etc lines) might contain a restricted set of English characters only, other characters should be encoded. This EPrints.pm.diff patch file corrects these, PLUS it enables the URL text from links into the text version, PLUS converts wide characters to bytes (the Net::SMTP perl routine complains).

See the tech list references 5762, 5783 and 7114.

Citation phrases with embedded pins

You need to apply the second half of the XML/EPC.pm.diff patch if want to use citations with embedded pins. See tech list 7236.

Phrases using the same link twice

If in a phrase you are using the same link pin twice, the second time it does not render correctly. E.g. if you want a picture AND a text refer to the same link, such as

<ep:phrase id="two_links">
  <epc:pin ref="link"><img src="img.png" alt="[img]></epc:pin>
  <epc:pin ref="link">Click here</epc:pin>
<ep:phrase>

For both links to work correctly please apply the first half of the XML/EPC.pm.diff patch. See tech list [6960].

Language dependent date and time

The wording of grace period (expiration time of the random pin number when someone registers or changes the email address) is generated in the Eprints/Time.pm script by the human_delay() routine. Also, the creation time at the bottom of browse pages are created here by human_time()'. Both routines yield English text.

This patch replaces human_delay() with reference to phrases for hour, day, week and their plural forms. See also tech list 7243

Rendering user names depending on session language

All fields can have their separate rendering routine. This routine should be given as the value of the render_single_value attribute. In our case modify the top of cfg.d/user_fields.pl file which contains the definition of the user fields as follows:

 $c->{fields}->{user} = [
     {
         'name' => 'name',
         'type'   => 'name',
         'render_order' => 'gf',
         'render_single_value' => \&my_namefield_rendering,
     },
     ...

and insert the definition of my_namefield_rendering e.g. at the end of the same file:

sub my_namefield_rendering
{
   my ($session,$field,$value,$object)=@_;
   my $langid = $session->{lang}->{id}; 
   my $format = {
# format: f, g - first, given; h - honourific, l - lineage
     'en' => 'hfl,g',
     'hu' => 'hlfg',
# Further lines should be added for other used languages
   } -> {$langid}
   my $all="";
   foreach my $fmtchar ( split //, $format ) {
       my $insert="";
       if( $fmtchar eq "l" ) {$insert = $value->{lineage}; }
       elsif( $fmtchar eq "f" ) { $insert = $value->{family}; }
       elsif( $fmtchar eq "g" ) { $insert = $value->{given}; }
       elsif( $fmtchar eq "h" ) { $insert = $value->{honourific}; }
       elsif( $all ){ $all .= $fmtchar; }
       next if( ! $insert );
       $all .= $insert;
   }
   my $span=$session->make_element("span",class=>"person_name");
   $span->appendChild($session->make_text($all));
   $return $span;
}

Indexing