I18n Transformer
http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Main
User Documentation

Transformers
Overview

Default
XSLT Transformer

Core
Fragment Extractor Transformer
I18n Transformer
Log Transformer
SQL Transformer
Filter Transformer
Read DOM Session Transformer
Write DOM Session Transformer
XInclude Transformer
CInclude Transformer
EncodeURL Transformer
SourceWriting Transformer

Optional
XT Transformer
LDAP Transformer

Introduction

Developing and maintaining multi-language sites is a common problem for web developers. The usage of XML and XSL makes this task much more easier, especially with content, logic and presentation separation concept.

Internationalization (i18n) -
Process of developing a product in such a way that it works with data in different languages and can be adapted to various target markets without engineering changes.
Localization (l10n) -
Subsequent process of translating and adapting a product to a given market's cultural conventions.

This approach for internationalization (further - i18n) of XML documents within Cocoon is based on a transformer - I18nTransformer , which uses XML dictionaries for all the multilingual data. The namespace URI of i18n transformer is defined as follows:

xmlns:i18n="http://apache.org/cocoon/i18n/2.0
Other implementation details
  • Default name in sitemap: i18n
  • Class: org.apache.cocoon.transformation.I18nTransformer
  • Cacheable: no.
  • Poolable: yes.
Brief description

The following features are supported by the i18n transformer:

  • Text translation
  • Attribute translation
  • Parameter substitution (with translation if needed)
  • Date, number and currency formatting

A simple example of i18n markup:

<para title="first" name="article"  i18n:attr="title name">
  <i18n:text>This text will be translated.</i18n:text>
</para>

The text inside the <i18n:text> will be used as a key to find the translation in the message catalogue. All attributes (of any other namespace element) that are listed in the i18n:attr attribute also will be translated and their values will be used as dictionary keys.

Note Although, date, time, number and currency formatting is also supported, in some cases it is needed to use XSP or some other dynamic means to achieve more flexibility.

Markup Reference
Summary

Special tags in i18n namespace are used to mark parts of XML document that should be substituted with dictionary messages.

Tags list
Element Description
i18n:text Used for simple text translation
i18n:attr Attribute for any element (not in i18n-namespace). Contains the names of other attributes of that element to be translated
i18n:translate Translates text with parameter substitution
i18n:param Used with i18n:translate to provide substitution parameter
i18n:date Formats the date in localized manner
i18n:time Formats the time in localized manner
i18n:date-time Formats the date and time in localized manner
i18n:number Formats numbers, currencies and percent in localized manner

i18n:text

To translate some simple text we use the <i18n:text> tag:

<i18n:text>Text to be translated</i18n:text>

The text between the <i18n:text>-tags is used as a key to find the translation in the dictionary.

The 'i18n:key' attribute can be used to specify a special key for the dictionary. Normally, the text itself is used as the key to find the translation in the dictionary. If we specify the 'i18n:key' attribute this key is used to find the translation and the text itself is used as the default value, if no translation can be found.

<i18n:text i18n:key="key_text">Default value</i18n:text>
Translation with param substitution

To translate the text with param substitution the <i18n:translate> tag must be used. We can specify some <i18n:param>-tags which contain parameters. The values of these parameters will be inserted into the translated text, replacing placeholders. Placeholders have the following syntax: \{[0-9]+\}. An example:

    
<i18n:translate>
	<i18n:text>Some {0} was inserted {1}.</i18n:text>
	<i18n:param>text</i18n:param>
	<i18n:param>here</i18n:param>
</i18n:translate>

Now we want to translate this into German. First, the processor will look into the dictionary, we specified, for the string:

Some {0} was inserted {1}.

It finds the string and translates it to German:

Etwas {0} wurde {1} eingesetzt.

Now the processor will replace the parameters. {0} will be replaced with "text" and {1} with "here". This results in:

Etwas text wurde here eingesetzt.

As we see, it is sometimes necessary to translate the parameters as well, since "here" is not a German word and "text" should be written uppercase. This can simply be done by marking up the parameters with <i18n:text> again:

<i18n:translate>
	<i18n:text>Some {0} was inserted {1}.</i18n:text>
	<i18n:param><i18n:text>text</i18n:text></i18n:param>
	<i18n:param><i18n:text>here</i18n:text></i18n:param>
</i18n:translate>

Note Generally, it is not necessary for the text for param substitution to be translated. E.g., it can come from a database with predefined placeholders for i18n params and there is no need to use <i18n:text> for its translation.

Parameters can be dates, numbers and currencies. Use type attribute to specify one of the possible types: date | time | date-time | number | currency | currency-no-unit | int-currency | percent . See more on params here.

Attributes

Additionally we can translate attributes. This is very useful for HTML-forms since labels of buttons are set via an attribute in HTML. To translate attributes of a tag, add an additional attribute named 'i18n:attr' containing a list of attributes, which should be translated, separated by spaces. An example:

<INPUT type="submit" value="Submit" i18n:attr="value"/>

The attribute, which will be translated is 'value'. Parameter replacement is not available for attributes at this time.

Note Some versions of Xerces have a bug in removeAttribute() method implementation and this results in a NullPointerException if attributes translation is used. The solution is to upgrade to a newer version of Xerces.

Date, time and number formatting

To format dates according to the current locale use <i18n:date src-pattern="dd/MM/yyyy" pattern="dd:MMM:yyyy" value="01/01/2001" />. The 'src-pattern' attribute will be used to parse the 'value', then the date will be formatted according to the current locale using the format specified by 'pattern' attribute.

To format time for a locale (e.g. de_DE) use <i18n:time src-pattern="dd/MM/yyyy hh:mm" locale="de_DE" value="01/01/2001 12:00" />. The 'src-pattern' and 'pattern' attribute may also contain 'short', 'medium', 'long' or 'full'. The date will be formatted according to this format.

To format date and time use <i18n:date-time />.

It is also possible to specify a src-locale: <i18n:date src-pattern="short" src-locale="en_US" locale="de_DE"> 12/24/01 </i18n:date> will result in 24.12.2001

A given real pattern and src-pattern (not short, medium, long, full) overwrites the locale and src-locale.

If no pattern was specified then the date will be formatted with the DateFormat.DEFAULT format (both date and time). If no value for the date is specified then the current date will be used. E.g.: <i18n:date/> will result in the current date, formatted with default localized pattern.

To format numbers in locale sensitive manner use <i18n:number pattern="0.##" value="2.0" />. This will be useful also for Arabic, Indian, etc. number formatting. Additionally, currencies and percent formatting can be used, known types are currency, currency-no-unit, int-currency, int-currency-no-unit and percent. Another useful attribute is fraction-digits, E.g.:

  • <i18n:number type="currency" value="1703.7434" /> will result in localized presentation of the value for US locale: $1,703.74
  • <i18n:number type="currency" fraction-digits="3" value="1703.7434" /> will result in localized presentation of the value for US locale so you can print gasonline prices: $1,703.743
  • <i18n:number type="int-currency" value="170374" /> will result in localized presentation of the value for US locale: $1,703.74, and 170374 (with currency unit) for a currency without subunit.
  • <i18n:number type="int-currency-no-unit" value="170374" /> will result in localized presentation of the value for US locale: 1,703.74, and 170374 (without currency unit) for a currency without subunit.
  • <i18n:number type="percent" value="1.2" /> will result in localized percent value: %120 for most of the locales.

Also, date and number formatting can be used with substitution params. type attribute must be used with params to indicate the param type (date, number, currency, ...). Default type is string.

<i18n:translate>
  <i18n:text>
    You have to pay {0} for {1} pounds or {2} of your profit. Valid from {3}
  </i18n:text>
  <i18n:param type="currency"
              pattern="$#,##0.00">102.5</i18n:param>
  <i18n:param type="number" value="2.5">
  <i18n:param type="percent" value="0.10" />	
  <i18n:param type="date" pattern="dd-MMM-yy" />
</i18n:translate>

Result will be like this: You have to pay $102.5 for 2.5 pounds or 10% of your profit. Valid from 13-Jun-01

Catalogues (Dictionaries)

Message catalogues contain translations to be used by the i18n transformer.

Catalogues format

A single message catalogue file contains translations for a particular language, e.g.:

<?xml version="1.0"?>
  <!-- message catalogue file for locale ... -->
  <catalogue xml:lang="locale">
         <message key="key">text</message>
         <message key="other_key">Other text</message>         
         ....
  </catalogue>

Where key attribute specifies a particular message for that language.

Usage

Files to be translated contain i18n markup. At runtime, the i18n transformer will find a message catalogue for the user's locale, and will appropriately replace the text between the <i18n:text> markup, using either the value between the tags as the lookup key or the value of the key attribute if specified. In the latter case the body value of the tag will be used in case of the not found translation.

If the i18n transformer cannot find an appropriate message catalogue for the user's given locale, it will recursively try to locate a parent message catalogue, until a valid catalogue can be found. ie:

  • catalogue_language_country_variant.xml
  • catalogue_language_country.xml
  • catalogue_language.xml
  • catalogue.xml

eg: Assuming a basename of messages and a locale of en_AU (no variant), the following search will occur:

  • messages_en_AU.xml
  • messages_en.xml
  • messages.xml

This allows the developer to write a hierarchy of message catalogues, at each defining messages with increasing depth of variation.

Sitemap configuration
<map:transformer name="i18n"
     src="org.apache.cocoon.transformation.I18nTransformer">

     <catalogue-name>messages</catalogue-name>
     <catalogue-location>translations</catalogue-location>
     <untranslated-text>untranslated</untranslated-text>
     <cache-at-startup>true</cache-at-startup>
</map:transformer>

where:

  • catalogue-name: base name of the message catalogue (mandatory).
  • catalogue-location: location of the message catalogues (mandatory).
  • untranslated-text: text used for untranslated keys (default is to output the key name).
  • cache-at-startup: flag whether to cache messages at startup (false by default).

To use the transformer in a pipeline, simply specify it in a particular transform and indicate the needed locale. eg:

<map:match pattern="file">
        <map:generate src="file.xml"/>
         <map:transform type="i18n">
             <map:parameter name="locale" value="en_AU"
         </map:transform>
         <map:serialize/>
</map:match>

NoteNote, that since Cocoon version 2.0.1 you should specify the needed locale as a parameter at pipeline level. This gives more flexibility in locale selection, e.g. URI parts can be used: /en_AU/file. See LocaleAction documentation for other possibilities.

Also, catalogue-name, catalogue-location and untranslated-text can all be overridden at the pipeline level by specifying them as parameters to the transform statement.

Samples

i18n samples from Cocoon demonstrate all the features of i18n transformer and give some ideas on user's locale determination.

Usage Pattern for Dictionary Generator Stylesheet

It is sometimes better to maintain a master dictionary that contains all the keys with translations in all the supported languages. For this purposes several helper stylesheets can be used. The stylesheets are found in Cocoon sources: src/resources/dev/i18n (in version 2.1 and higher).
Below is given an example for a new language addition using a master dictionary.

Initial key generation

To generate all the i18n keys from a source file (XML or XSP) use the markup2message.xsl stylesheet. Simply transform your content file using this stylesheet. Result will be an empty message catalogue for the given language.

Key generation from master dictionary

Generate a dictionary with keys and placeholders for Spanish translations using the merge.xsl stylesheet. Optionally, for one of the languages existing translations can be kept. To do it set stylesheet params (manually in stylesheet or in command-line):

  • mode = keys (indicates, that only keys must be in result)
  • new-lang = es (language to be added)
  • keep-lang = en (language to be kept in result, for convenience)

Command line for Xalan (Of course, Xerces and Xalan must be in your classpath):

java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_es.xml -PARAM mode keys -PARAM new-lang es -PARAM keep-lang en

(Windows users: Do not enter '\' symbol, continue typing on the same line.)

This will create a file simple_dict_es.xml with entries, keys and placeholders.

Translation

Replace placeholders with translation according to the keys or original translations, if they were kept during generation.

Add to the master dictionary

Use the same stylesheet for this purpose with this params:

mode = merge
new-lang = es
new-dict = simple_dict_es.xml

Command line for Xalan:

java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_new.xml -PARAM mode merge -PARAM new-lang es \
-PARAM new-dict simple_dict_es.xml

(Windows users: Do not enter '\' symbol, continue typing on the same line.)

Finally
To be done
  • Multiple dictionaries per pipeline support
  • Markup support in translations
  • Named parameters support
  • Dictionary caching
  • Different bundle implementations
Contacts

Feel free to contact for any comments and improvement ideas either directly Konstantin Piroumian or through the Cocoon Mail List.

Copyright © 1999-2002 The Apache Software Foundation. All Rights Reserved.