Difference between revisions of "Translation"

From EPrints Documentation
Jump to: navigation, search
(Phrases)
(Phrases)
Line 118: Line 118:
 
Phrase defintions are scattered in several places. There are repository independent ones and repository dependent ones. The latter has preference over the former: if the same entity is defined in both places then the repository dependent definition takes precedence.
 
Phrase defintions are scattered in several places. There are repository independent ones and repository dependent ones. The latter has preference over the former: if the same entity is defined in both places then the repository dependent definition takes precedence.
  
Phrases are parsed and stored during web server initialization for all repositories. If you make any change in any of the phrases files, you ''must restart'' the apache server to see the effect.
+
Phrases are parsed and stored during web server initialization for all repositories. If you make any change in any of the phrase files, you ''must restart'' the apache server to see the effect.
  
 
System-wide (i.e. repository independent) phrases are contained in the files in '''/opt/eprints3/lib/lang/XX/phrases/''' (assuming the standard placement) where '''XX''' is the language code. Repository dependent phrase files are in '''/opt/eprints3/archives/ARCHIVEID/cfg/lang/XX/phrases/'''. In both directories all files with extension ".xml" are scanned and read in in alphabetical order.  You can split the data into several files instead a huge one.
 
System-wide (i.e. repository independent) phrases are contained in the files in '''/opt/eprints3/lib/lang/XX/phrases/''' (assuming the standard placement) where '''XX''' is the language code. Repository dependent phrase files are in '''/opt/eprints3/archives/ARCHIVEID/cfg/lang/XX/phrases/'''. In both directories all files with extension ".xml" are scanned and read in in alphabetical order.  You can split the data into several files instead a huge one.

Revision as of 06:43, 29 June 2007

Making translation to Eprints3 is similar to that of earlier versions. Here only some of the differences are pointed out. Also, there is a difference between just translating, or making a multilingual site. If you are making a translation, then with little effort you can also make your site speak English too, a courtesy for casual visitors of your repository.

Multilingual sites

Selecting the session's language

In a multilingual site you probably let the visitors to choose the language of the session. It is determined by the default setting of the browser preferences, but sometimes users want to change this. Manual setting of the session's language can be done by the [port] of the set_language script from earlier Eprints versions, and should go into the cgi directory. A handy place for the set_language URL is in the menu at the top of the page. You might want to edit /lang/en/templates/default.xml to include this possibility there:

 <ul class="ep_tm_menu">
  <li><a href="{$config{frontpage}}">Home</a></li>
  <li><a ref="{$config{perl_url}}/set_lang">Language</a></li>
   <li><a href="{$config{base_url}}/information.html">About</a></li>
  ... 
 </ul>

The template page on other languages should not contain this Language item as not neccessarily will people recognise it. Rather use a button which reverts the language to English:

<li><a href="{$config{perl_url}/set_lang?langid=en">In English</a></li>

If you want a little more fancy layout, you might consider using the [flag images] to be copied to the directory static/style/images/flags/. If you define a phrase (see #Phrases section) of the form "cgi/set_lang:lang_XX" then the set_lang script will use that phrase to render a link to that language. A typical format could be

 <epp:phrase id="cgi/set_lang:lang_hu"><epc:pin name="link">
     <img sec="/style/images/flags/flag_hu.png" alt="[hu]" /> 
      Hungarian</epc:pin>
 </epp:phrase>

Setting the language of outgoing e-mail messages

By default, all e-mail messages are sent out on the default language of the depository. You might let users choose their preferred language so that they'll receive messages on that language. The lang (system) field is defined for all users; the value can be set automatically in defaultcfg/cfg.d/user_fields_automatic.pl for example by

$c->{set_user_automatic_fields} = sub
{ my ( $user ) = @_;
   if( !$user -> is_set( "frequency" ) )
  {
        $user->set_value("frequency","never");
   }
   ## NEW: set default language to the session language
   if( !$user -> is_set( "lang" ) )
   {
        $user->set_value("lang",$user->{session}->{lang}->{id});
   }
}

and also let the user edit their own preference by inserting the lang field in the defaultcfg/workflows/user/default.xml workflow, say in the personal section:

  <component type="Field::Multi">
     <title><epc:phrase ref="user_section_personal" /></title>
     <field ref="name" required="yes" />
     <field ref="lang"/> <-- NEW!!! edit preferred language -->
     <field ref="dept"/>
     <field ref="org"/>
     <field ref="address"/>
     <field ref="country"/>
     <field ref="url"/>
   </component>

Finally, you might want the default language appear on the user's profile pages. This needs editing the cfg/cfg.d/user_render.pl file by inserting the following into the place of your choice (I coose just next to "country"):

 if( $user->is_set( "lang" ) )
 {
      $p->appendChild($session->>make_element( "br" ) );
      $p->appendChild($session->html_phrase("user_preferred_language",
                lang => $user->render_value( "lang" ) ) );
 }

here the new phrase "user_preferred_language" should be defined in one of the phrases files (see the #Phrases section); in it the pin "lang" contains the language itself:

 <epp:phrase id="user_preferred_language">Preferred language: 
      < epc:pin name="lang"/>
 </epp:phrase>

WARNING! While e-mail messages are sent out utf-8 encoded, they are not properly formatted. For non-English e-mails you must apply some patches from the #Patches section. Also, the text version of the e-mails does not containt the links. This can also be corrected by one of the patches.

Setting document language

Each document file has its own language. The same document might be submitted in different languages. In the default eprints workflow the reference to the language field is commented out; you only has to enable it. It is in file defaultcfg/workflows/eprint/default.xml, stage "files":

<stage name="files">
  <component type="XHTML">...</component>
  <component type="Upload">
     <field ref="format" />
     <field ref="formatdesc" />
     <field ref="security" />
     <field ref="license" />
     <field ref="date_embargo" />
     <field ref="language" /> <!-- UNCOMMENT! -->
 </component>
</stage>

The default value is set in cfg.d/document_fields_default.pl to the language of the session. The available values are listed in lib/defaultcfg/namedsets/languages, you might consider revising this set. The "undefined" language is a question mark (?).

You might also want to print the language as well. To this end, edit the citations/document/default.xml file. Show the language only when it is set, and the document's mime type is text/plain, application/pdf, application/postscript, application/msword, or other. (For a full list of available mime type see lib/defaultcfg/namedsets/document )

<cite:linkhere>...</cite:linkhere>
<epc:if test="security != 'public'"> ... </epc:if>
<!-- NEW!!! -->
<epc:if test="is_set(language) and format.one_of( 'text/plain' ,
    'application/pdf', 'application/postscript', 'application/msword', 'other') >
   <epc:phrase ref="citation:doc_language">
        <epc:param name="lang">
             <epc:print expr="language"/>
        </epc:param>
   </epc:phrase>
 </epc:if>

and, of course, one has to define the "citation:doc_language" phrase (see below) in all languages. In the phrase definition we have the pin "lang" which contains the document language:

 <epp:phrase id="citation:doc_language">
   (The document's language is <epc:pin name="lang"/>.)
</epp:phrase>

WARNING! The example above works only after you've applied some of the patches in the #Patches section.

Phrases

Phrase defintions are scattered in several places. There are repository independent ones and repository dependent ones. The latter has preference over the former: if the same entity is defined in both places then the repository dependent definition takes precedence.

Phrases are parsed and stored during web server initialization for all repositories. If you make any change in any of the phrase files, you must restart the apache server to see the effect.

System-wide (i.e. repository independent) phrases are contained in the files in /opt/eprints3/lib/lang/XX/phrases/ (assuming the standard placement) where XX is the language code. Repository dependent phrase files are in /opt/eprints3/archives/ARCHIVEID/cfg/lang/XX/phrases/. In both directories all files with extension ".xml" are scanned and read in in alphabetical order. You can split the data into several files instead a huge one.

All phrase files must have proper XML syntax. Use existing files as templates. The very first line tells you which encoding is applied in the file:

<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>

If no encoding is given, then utf-8 is assumed (and you get error messages if the actual encoding is not that one). Set the encoding into your favourite one. Don't forget that when using iso-8859-X then you can enter a limited set of characters only.

You can define as many new phrases as you want, and refer them in other phrases. When using a phrase, the most last existing definition is used. This means that you can have forward references with no problem.

Do not attempt to create recursive references, as only your web server will crash after consuming all available memory.

Other configuration files

Several language-independent configuration files contain hardwired English texts. All of them should be replaced by references to phrases, which, in turn, generate the phrases in all languages. The "offending" files are in the citations and the workflows directories.

Subject list

Patches

Apart from translating the language dependent phrase files, some other modification should be done for proper functioning. Theses patches are for Eprints version 3.0.1, please check your current version

  • proper utf-8 encoding for outgoing e-mails
  • citation phrases with embedded pins
  • phrases where the same link is used twice
  • expiration time for authentication codes (e.g. at registering) (hardcoded in English)
  • generation time for browse pages (harcoded in English)
  • user name rendering depends on session language

Proper utf-8 encoding for outgoing e-mails

For outgoing e-mails the charset should be "utf-8". The mail headers (From, To, Subject, Reply-To, etc lines) might contain a restricted set of English characters only, other characters should be encoded. This EPrints.pm.diff corrects these, PLUS it enables the URL text from links into the text version, PLUS converts wide characters to bytes (the Net::SMTP perl routine complains).

See the tech list references [5762], [5783] and [7114].

Citation phrases with embedded pins

You need to apply the second half of the XML/EPC.pm.diff patch if want to use citations with embedded pins. See tech list [7236].

Phrases using the same link twice

If in a phrase you are using the same link pin twice, the second time it does not render correctly. E.g. if you want a picture AND a text refer to the same link, such as

<ep:phrase id="two_links">
  <epc:pin ref="link"><img src="img.png" alt="[img]></epc:pin>
  <epc:pin ref="link">Click here</epc:pin>
<ep:phrase>

For both links to work correctly please apply the first half of the XML/EPC.pm.diff patch. See tech list [6960].

Language dependent date and time

The wording of grace period (expiration time of the random pin number when someone registers or changes the email address) is generated in the Eprints/Time.pm script by the human_delay() routine. Also, the creation time at the bottom of browse pages are created here by human_time()'. Both routines yield English text.

[This patch] replaces human_delay() with reference to phrases for hour, day, week and their plural forms. See also tech list [7243]

Rendering user names depending on session language

All fields can have their separate rendering routine. This routine should be given as the value of the render_single_value attribute. In our case modify the top of cfg.d/user_fields.pl file which contains the definition of the user fields as follows:

 $c->{fields}->{user} = [
     {
         'name' => 'name',
         'type'   => 'name',
         'render_order' => 'gf',
         'render_single_value' => \&my_namefield_rendering,
     },
     ...

and insert the definition of my_namefield_rendering e.g. at the end of the same file:

sub my_namefield_rendering
{
   my ($session,$field,$value,$object)=@_;
   my $langid = $session->{lang}->{id}; 
   my $format = {
# format: f, g - first, given; h - honourific, l - lineage
     'en' => 'hfl,g',
     'hu' => 'hlfg',
# Further lines should be added for other used languages
   } -> {$langid}
   my $all="";
   foreach my $fmtchar ( split //, $format ) {
       my $insert="";
       if( $fmtchar eq "l" ) {$insert = $value->{lineage}; }
       elsif( $fmtchar eq "f" ) { $insert = $value->{family}; }
       elsif( $fmtchar eq "g" ) { $insert = $value->{given}; }
       elsif( $fmtchar eq "h" ) { $insert = $value->{honourific}; }
       elsif( $all ){ $all .= $fmtchar; }
       next if( ! $insert );
       $all .= $insert;
   }
   my $span=$session->make_element("span",class=>"person_name");
   $span->appendChild($session->make_text($all));
   $return $span;
}

Indexing