Difference between revisions of "Translation"

From EPrints Documentation
Jump to: navigation, search
m (some typos and links corrected)
(strategy added)
Line 289: Line 289:
  
 
== Subject list ==
 
== Subject list ==
 +
Unsurprising the subjects have to be translated each.
 +
Thus the GUI (Admin > Config. Tools > Edit subject) is multi-lingual!
 +
Sadly the [[API:bin/import_subjects#The_ASCII_Default_Format|simple format]] couldn't be used any more, but fortunately the [[API:bin/import_subjects#The_XML_File_Format|XML format]] isn't that complicated.
 +
 +
For the indispensable format switch
 +
<ul>
 +
<li>start with an initial tree in any preferred language specified in [[API:bin/import_subjects#The_ASCII_Default_Format|simple format]]
 +
<li>[[Bin/export|export]] this tree in [[API:bin/import_subjects#The_XML_File_Format|XML format]] via
 +
<tt>~/bin/export <repository_id> subject XML > subjects.xml</tt>
 +
<li>[[API:bin/import_subjects#SYNOPSIS|import]] the extended version via
 +
<tt>~/bin/import_subjects <repository_id> --xml --force subjects.xml</tt>
 +
from now on,
 +
</ul>
 +
thus a complete import is part of subject management.
  
 
== Patches ==
 
== Patches ==

Revision as of 08:48, 29 August 2018


Making translation to Eprints3 is similar to that of earlier versions. Here only some of the differences are pointed out. Also, there is a difference between just translating, or making a multilingual site. If you are making a translation, then with little effort you can also make your site speak English too, a courtesy for casual visitors of your repository.

Multilingual sites

In order to customize EPrints you should make several changes to the configuration files. It is a good time to make yourself familiarize with the EPrints3 Directory Structure. Certain files are common for all archives – change these files only if you really know what your are doing. Other files are repository specific, any change affects that repository only.

Repository-dependent files are below archives, each repository has it own substructure. When creating a new repository, configuration files are copied from lib/defaultcfg. Thus making changes in lib/defaultcfg or below will effect all newly created repositories but will have no effect on existing repositories. Probably you want to make certain changes before creating repositories, e.g. adding further languages to lib/defaultcfg/lang (and also to lib/lang), changing the default language and available languages in /lib/defaultcfg/cfg.d/languages.pl, correcting files in lib/defaultcfg/citations and lib/defaultcfg/workflows, etc.

In the description below we assume that you've set the repository, and use directories and file names, whenever appropriate, relative to that repository. It should be straightforward to make these changes so that they will have effect for all newly created archives.

Selecting the session's language

In a multilingual site you probably let the visitors to choose the language of the session. It is determined by the default setting of the browser preferences, but sometimes users want to change this. Manual setting of the session's language can be done by the port of the set_language script from earlier Eprints versions, and should go into the cgi directory. A handy place for the set_language URL is in the menu at the top of the page. As the the default page template is repository dependent, you should edit archives/ArchiveID/lang/en/templates/default.xml, the English language template for that repository as follows (but see the remarks above):

 <ul class="ep_tm_menu">
  <li><a href="{$config{frontpage}}">Home</a></li>
  <li><a href="{$config{perl_url}}/set_lang">Language</a></li>
   <li><a href="{$config{base_url}}/information.html">About</a></li>
  ... 
 </ul>

The template page on other languages should not contain this Language item as not necessarily will people recognise it. Rather use a button which reverts the language to English, i.e. insert the following into archives/ArchiveID/lang/XX/templates/default.xml for all other languages XX except for English:

<li><a href="{$config{perl_url}}/set_lang?langid=en">In English</a></li>

If you want a little more fancy layout, you might consider using flag images to be copied to the directory lib/static/style/images/flags/. If you define a phrase (see the Phrases section) of the form "cgi/set_lang:lang_XX" then the set_lang script will use that phrase to render a link to that language. A typical format could be

 <epp:phrase id="cgi/set_lang:lang_hu"><epc:pin name="link">
     <img src="/style/images/flags/flag_hu.png" alt="[hu]" /> 
      Hungarian</epc:pin>
 </epp:phrase>

Setting the language of outgoing e-mail messages

By default, all e-mail messages are sent out on the default language of the depository. You might let the users choose their preferred language so that they'll receive messages on that language. The lang (system) field is defined for all users; the value can be set automatically in cfg/cfg.d/user_fields_automatic.pl for example by

$c->{set_user_automatic_fields} = sub
{ my ( $user ) = @_;
   if( !$user -> is_set( "frequency" ) )
  {
        $user->set_value("frequency","never");
   }
   ## NEW: set default language to the session language
   if( !$user -> is_set( "lang" ) )
   {
        $user->set_value("lang",$user->{session}->{lang}->{id});
   }
}

and also let the user edit their own preference by inserting the lang field in the cfg/workflows/user/default.xml workflow, say in the personal section:

  <component type="Field::Multi">
     <title><epc:phrase ref="user_section_personal" /></title>
     <field ref="name" required="yes" />
     <field ref="lang"/> <-- NEW!!! edit preferred language -->
     <field ref="dept"/>
     <field ref="org"/>
     <field ref="address"/>
     <field ref="country"/>
     <field ref="url"/>
   </component>

Finally, you might want the default language appear on the user's profile pages. This needs editing the cfg/cfg.d/user_render.pl file by inserting the following into the place of your choice (I coose just next to "country"):

 if( $user->is_set( "lang" ) )
 {
      $p->appendChild($session->make_element( "br" ) );
      $p->appendChild($session->html_phrase("user_preferred_language",
                lang => $user->render_value( "lang" ) ) );
 }

here the new phrase "user_preferred_language" should be defined in one of the phrase files (see the Phrases section); in it the pin "lang" contains the language itself:

 <epp:phrase id="user_preferred_language">Preferred language: 
      < epc:pin name="lang"/>
 </epp:phrase>

WARNING! While e-mail messages are sent out utf-8 encoded, they are not properly formatted. For non-English e-mails you must apply the "Proper utf-8 encoding for outgoing e-mails" patch from the Patches section. Also, by default, the text version of the e-mail does not contain links. This is also corrected by this patch.

Setting document language

Each document file has its own language. The same document might be submitted in different languages. In the default Eprints workflow the reference to the language field is commented out; you only has to enable it. It is in file cfg/workflows/eprint/default.xml, stage "files":

<stage name="files">
  <component type="XHTML">...</component>
  <component type="Upload">
     <field ref="format" />
     <field ref="formatdesc" />
     <field ref="security" />
     <field ref="license" />
     <field ref="date_embargo" />
     <field ref="language" /> <!-- UNCOMMENT! -->
 </component>
</stage>

The default value is set in cfg/cfg.d/document_fields_default.pl to the language of the session:

$c->{set_document_defaults} = sub
{
       my( $data, $session, $eprint ) = @_;
 ###### HERE:
       $data->{language} = $session->get_langid();
       $data->{security} = "public";
};

The available language values are listed in cfg/namedsets/languages, you might consider revising this set. The "undefined" language is rendered as a question mark (?).

You might also want to print the language as well. To this end, edit the cfg/citations/document/default.xml file. Show the language only when it is set, and the document's mime type is text/plain, application/pdf, application/postscript, application/msword, or other. (For a full list of available mime type see cfg/namedsets/document )

<cite:linkhere>...</cite:linkhere>
<epc:if test="security != 'public'"> ... </epc:if>
<!-- NEW!!! -->
<epc:if test="is_set(language) and format.one_of( 'text/plain' ,
    'application/pdf', 'application/postscript', 'application/msword', 'other')" >
   <epc:phrase ref="citation:doc_language">
        <epc:param name="lang">
             <epc:print expr="language"/>
        </epc:param>
   </epc:phrase>
 </epc:if>

and, of course, one has to define the "citation:doc_language" phrase (see below) in all languages. In the phrase definition we have the pin "lang" which contains the document language in the actual language (defined as the phrase "languages_typename_XX"):

 <epp:phrase id="citation:doc_language">
   (The document's language is <epc:pin name="lang"/>.)
</epp:phrase>

WARNING! The example above works only after you've applied the "Citation phrases with embedded pins" patch in the Patches section.

Phrases

Phrase definitions are scattered in several places. There are repository independent ones and repository dependent ones. The latter has preference over the former: if the same entity is defined in both places then the repository dependent definition takes precedence.

Phrases are parsed and stored during web server initialization for all repositories. If you make any change in any of the phrase files, you must restart the apache server to see the effect.

System-wide (i.e. repository independent) phrases are contained in the files in lib/lang/XX/phrases/ where XX is the language code. Repository dependent phrase files are in archives/ARCHIVEID/cfg/lang/XX/phrases/. In both directories all files with extension ".xml" are scanned and read in in alphabetical order. You can split the data into several files instead having a single a huge phrase file.

All phrase files must have proper XML syntax. Use existing files as templates. The very first line tells you which encoding is applied in the file:

<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>

If no encoding is given, then utf-8 is assumed (and you get error messages if the actual encoding is not that one). Set the encoding into your favourite one. Don't forget that when using iso-8859-X then you can enter a limited set of characters only.

Due to a bug in GDOME, when you use GDOME, all XML files must be utf-8 encoded (see tech list 7295).

You can define as many new phrases as you want, and refer them in other phrases. During initialization all phrases are stored, and expanded only when used. If a phrase has several definitions, the one which has been read in last will be used. This means that you can have forward references; and this mechanism ensures that system-wide phrases are redefined by entries in a repository-dependent phrase file.

CAUTION: do not attempt to create recursive references, as only your web server will crash after consuming all available memory.

Phrases are defined as follows:

 <epp:phrase id="lib/dataset:no_language">Unknown/Other</epp:phrase>

This line defined the "lib/dataset:no_language" phrase to be equal to "Unknown/Other". A phrase must have a proper XML syntax (i.e. all tags must be balanced). In the definition you can refer to other phrases:

<epp:phrase id=double_ruler>
   <epc:phrase ref="ruler"/><epc:phrase ref="ruler"/>
</epp:phrase>

Certain phrases use pins. Pins are typically defined by the calling routine, and carry variable content, such as the document's name

<epp:phrase id="doc_title">Document: <epc:pin name="doc" /></epp:phrase>

or a URL (pins typically called "link") where the link text comes between "<epc:pin name=...>" and "</epc:pin>":

<epp:phrase id="show_help"><epc:pin name="link">
   <img src="/images/help.gif" alt="+"></epc:pin></epp:phrase>

If a phrase requires a pin but none defined, you get an ugly error message about undefined pins. Thus if you use the "show_help" phrase in another one, you must supply the pin. It is done through "epc:param" tags. The pin in this case is an "<a />" tag which defines the URL of the help text:

<epp:phrase id="use_help">Help using the show_help phrase: 
   <epc:phrase ref="show_help">
        <epc:param name="link"><a href="http://..." /></epc:param>
   </epc:phrase>
</epp:phrase>

You can use other control format elements as well.

Other configuration files

Several language-independent configuration files contain hard-wired English texts. All of them should be replaced by references to phrases, which, in turn, generate the phrases in all languages. First the list of these files and places, and then two examples of how they can be handled.


See also the tech list references 7095, 7157 and 7300.


Example: the citations/saved_search/default.xml file

In the default EPrints distribution the lib/citations/saved_search/default.xml file contains the following lines:

  9  <cite:linkhere><epc:choose>
10   <epc:when test="name"><epc:print expr="name"/></epc:when>
11   <epc:otherwise>Untitled search #<epc:print expr="id"/></epc:otherwise>
12  </epc:choose></cite:linkhere>

Whenever a saved search is formatted using the default citation style and the search has no attached name, then the "otherwise" case in line 11 is chosen. It prints Untitled search # followed by the search internal number. This text should be language dependent and use a phrase. For example the replacement

11   <epc:otherwise><epc:phrase ref="saved_search_no_name" />
                #<epc:print expr="id"/></epc:otherwise>

would do the job. If you want to be more gentle to translators (probably to yourself) then you could rather use a phrase with a pin for the search number:

11 <epc:otherwise><epc:phrase ref="saved_search_no_name_with_pin">
          <epc:param name="n"><epc:print expr="id"/></epc:param>
          </epc:phrase></epc:otherwise>

In this last case the English definition of the phrase would be

<epp:phrase id="saved_search_no_name_with_pin">
    Untitled search #<epc:pin name="n"/>
</epp:phrase>

WARNING! You must apply the "Citation phrases with embedded pins" patch in the Patches section for this work correctly.


Example: the citations/document/default.xml file

This file describes how documents should be rendered. When the document's type is pdf or postscript, then a helper text appears, which informs the reader that a special viewer program is necessary (long lines are wrapped)

13<epc:choose>
14  <epc:when test="format = 'application/pdf'"> - Requires a PDF viewer such as <a href="http://www.cs.wisc.edu/~ghost/gsview/index.htm"> GSview </a>, <a href="http://www.foolabs.com/xpdf/download.html"> Xpdf </a> or <a href="http://www.adobe.com/products/acrobat/"> Adobe Acrobat Reader </a></epc:when>
15  <epc:when test="format = 'application/postscript'"> - Requires a viewer, such as <a href="http://www.cs.wisc.edu/~ghost/gsview/index.htm"> GSview </a></epc:when>
16 </epc:choose>

Use phrases instead:

13<epc:choose>
14  <epc:when test="format = 'application/pdf'"> <epc:phrase ref="need_pdf_viewer"/> </epc:when>
15  <epc:when test="format = 'application/postscript'"> <epc:phrase ref="need_postscript_viewer"/> </epc:when>
16 </epc:choose>

The appropriate English phrase definitions would run as follows:

<epp:phrase id="need_pdf_viewer"> - Requires a PDF viewer such as ...
</epp:phrase>
<epp_phrase id="need_postscript_viewer"> - Requires a viewer, such ...
</epp:phrase>

You might consider applying some of the patches.

Subject list

Unsurprising the subjects have to be translated each. Thus the GUI (Admin > Config. Tools > Edit subject) is multi-lingual! Sadly the simple format couldn't be used any more, but fortunately the XML format isn't that complicated.

For the indispensable format switch

  • start with an initial tree in any preferred language specified in simple format
  • export this tree in XML format via ~/bin/export <repository_id> subject XML > subjects.xml
  • import the extended version via ~/bin/import_subjects <repository_id> --xml --force subjects.xml from now on,

thus a complete import is part of subject management.

Patches

Apart from translating the language dependent phrase files, some other modification should be done for proper functioning. Theses patches are for Eprints version 3.0.1, please check your current version

Proper utf-8 encoding for outgoing e-mails

For outgoing e-mails the charset should be "utf-8". The mail headers (From, To, Subject, Reply-To, etc lines) might contain a restricted set of English characters only, other characters should be encoded. This EPrints.pm.diff patch file corrects these, PLUS it enables the URL text from links into the text version, PLUS converts wide characters to bytes (the Net::SMTP perl routine complains).

See the tech list references 5762, 5783 and 7114.

Citation phrases with embedded pins

You need to apply the second half of the XML/EPC.pm.diff patch if want to use citations with embedded pins.

Phrases using the same link twice

If in a phrase you are using the same link pin twice, the second time it does not render correctly. E.g. if you want a picture AND a text refer to the same link, such as

<ep:phrase id="two_links">
  <epc:pin ref="link"><img src="img.png" alt="[img]></epc:pin>
  <epc:pin ref="link">Click here</epc:pin>
<ep:phrase>

For both links to work correctly please apply the first half of the XML/EPC.pm.diff patch.

Pluralising depends on language

When invoking a phrase in a citation, the Escript constructs such as <epc:if test="..." /> or <epc print /> are not available any more. To transfer pluralising from the citation file into phrases file requires this feature which is achieved by the XML/EPC.pm.diff.2 patch.

Language dependent date and time

The wording of grace period (expiration time of the random pin number when someone registers or changes the email address) is generated in the Eprints/Time.pm script by the human_delay() routine. Also, the creation time at the bottom of browse pages are created here by human_time()'. The first routine always yields English text; the second one uses the server's locale setting for all languages.

This patch replaces human_delay() with reference to phrases for hour, day, week and their plural forms. See also tech list 7243

Rendering user names depending on session language

All fields can have their separate rendering routine. This routine should be given as the value of the render_single_value attribute. In our case modify the top of cfg/cfg.d/user_fields.pl file which contains the definition of the user fields as follows:

 $c->{fields}->{user} = [
     {
         'name' => 'name',
         'type'   => 'name',
         'render_order' => 'gf',
         'render_single_value' => \&my_namefield_rendering,
     },
     ...

and insert the definition of my_namefield_rendering e.g. at the end of the same file:

sub my_namefield_rendering
{
   my ($session,$field,$value,$object)=@_;
   my $langid = $session->{lang}->{id}; 
   my $format = {
# format: f, g - first, given; h - honourific, l - lineage
     'en' => 'hfl,g',
     'hu' => 'hlfg',
# Further lines should be added for other used languages
   } -> {$langid}
   my $all="";
   foreach my $fmtchar ( split //, $format ) {
       my $insert="";
       if( $fmtchar eq "l" ) {$insert = $value->{lineage}; }
       elsif( $fmtchar eq "f" ) { $insert = $value->{family}; }
       elsif( $fmtchar eq "g" ) { $insert = $value->{given}; }
       elsif( $fmtchar eq "h" ) { $insert = $value->{honourific}; }
       elsif( $all ){ $all .= $fmtchar; }
       next if( ! $insert );
       $all .= $insert;
   }
   my $span=$session->make_element("span",class=>"person_name");
   $span->appendChild($session->make_text($all));
   $return $span;
}

List of files to translate

This was captured from EPrints 3.0.3 - some extra phrase files or .xpage files may appear, but it should give the gist:

These files are part of the main EPrints system, not specific to each repository:

  • lib/lang/en/phrases/system.xml
  • lib/lang/en/static/eprints/index.xpage

The following files are repository-specific. They need to be changed depending on the repository configuration.

The main web pages:

  • lib/defaultcfg/lang/en/static/index.xpage
  • lib/defaultcfg/lang/en/static/vlit.xpage
  • lib/defaultcfg/lang/en/static/information.xpage
  • lib/defaultcfg/lang/en/static/contact.xpage
  • lib/defaultcfg/lang/en/static/error401.xpage

The web page template:

  • lib/defaultcfg/lang/en/templates/default.xml

Phrases used by the system:

  • lib/defaultcfg/lang/en/phrases/eprint_types.xml
  • lib/defaultcfg/lang/en/phrases/mail_email.xml
  • lib/defaultcfg/lang/en/phrases/user_order.xml
  • lib/defaultcfg/lang/en/phrases/mail_bounce_reason.xml
  • lib/defaultcfg/lang/en/phrases/eprint_order.xml
  • lib/defaultcfg/lang/en/phrases/mail_sig.xml
  • lib/defaultcfg/lang/en/phrases/dynamic.xml
  • lib/defaultcfg/lang/en/phrases/document_security.xml
  • lib/defaultcfg/lang/en/phrases/intro.xml
  • lib/defaultcfg/lang/en/phrases/eprint_fields.xml
  • lib/defaultcfg/lang/en/phrases/intro_mini.xml
  • lib/defaultcfg/lang/en/phrases/document_formats.xml
  • lib/defaultcfg/lang/en/phrases/mail_delete_reason.xml
  • lib/defaultcfg/lang/en/phrases/deposit_agreement.xml
  • lib/defaultcfg/lang/en/phrases/views.xml
  • lib/defaultcfg/lang/en/phrases/warnings.xml
  • lib/defaultcfg/lang/en/phrases/validate.xml
  • lib/defaultcfg/lang/en/phrases/mail_password.xml
  • lib/defaultcfg/lang/en/phrases/workflow.xml
  • lib/defaultcfg/lang/en/phrases/user_fields.xml
  • lib/defaultcfg/lang/en/phrases/render.xml