apache > cocoon
 

I18n Transformer

Introduction

Developing and maintaining multi-language sites is a common problem for web developers. The usage of XML and XSL makes this task much easier, especially with content, logic and presentation separation concept.

  • Internationalization (i18n): Process of developing a product in such a way that it works with data in different languages and can be adapted to various target markets without engineering changes.
  • Localization (l10n): Subsequent process of translating and adapting a product to a given market's cultural conventions.

This approach for internationalization (further - i18n) of XML documents within Cocoon is based on a transformer - I18nTransformer , which uses XML dictionaries for all the multilingual data. The namespace URI of i18n transformer is defined as follows:

xmlns:i18n="http://apache.org/cocoon/i18n/2.1

Other implementation details

  • Default name in sitemap: i18n
  • Class: org.apache.cocoon.transformation.I18nTransformer
  • Cacheable: no.
  • Poolable: yes.

Brief description

The following features are supported by the i18n transformer:

  • Text translation
  • Attribute translation
  • Parameter substitution (with translation if needed)
  • Date, number and currency formatting

A simple example of i18n markup:

<para title="first" name="article"  i18n:attr="title name">
  <i18n:text>This text will be translated.</i18n:text>
</para>

The text inside the <i18n:text> will be used as a key to find the translation in the message catalogue. All attributes (of any other namespace element) that are listed in the i18n:attr attribute also will be translated and their values will be used as dictionary keys.

Note
Although, date, time, number and currency formatting is also supported, in some cases it is needed to use XSP or some other dynamic means to achieve more flexibility.

Markup Reference

Summary

Special tags in i18n namespace are used to mark parts of XML document that should be substituted with dictionary messages.

Tags list

Element

Description

i18n:text

Used for simple text translation

i18n:attr

Attribute for any element (not in i18n-namespace). Contains the names of other attributes of that element to be translated

i18n:translate

Translates text with parameter substitution

i18n:param

Used with i18n:translate to provide substitution parameter

i18n:date

Formats the date in localized manner

i18n:time

Formats the time in localized manner

i18n:date-time

Formats the date and time in localized manner

i18n:number

Formats numbers, currencies and percent in localized manner

i18n:text

To translate some simple text we use the <i18n:text> tag:

<i18n:text>Text to be translated</i18n:text>

The text between the <i18n:text>-tags is used as a key to find the translation in the dictionary.

The 'i18n:key' attribute can be used to specify a special key for the dictionary. Normally, the text itself is used as the key to find the translation in the dictionary. If we specify the 'i18n:key' attribute this key is used to find the translation and the text itself is used as the default value, if no translation can be found.

<i18n:text i18n:key="key_text">Default value</i18n:text>

Messages can be taken from multiple dictionaries. The dictionaries are configured in the sitemap (see further on), and each dictionary is assigned an id. There is one dictionary that serves as the default one. To translate a key using a non-default dictionary, mention the id of the dictionary in an i18n:catalogue attribute:

<i18n:text i18n:catalogue="menu">key_text</i18n:text>

Translation with param substitution

To translate the text with param substitution the <i18n:translate> tag must be used. We can specify some <i18n:param>-tags which contain parameters. The values of these parameters will be inserted into the translated text, replacing placeholders. Placeholders have the following syntax: \{[0-9]+\}. An example:

    
<i18n:translate>
    <i18n:text>Some {0} was inserted {1}.</i18n:text>
    <i18n:param>text</i18n:param>
    <i18n:param>here</i18n:param>
</i18n:translate>

Now we want to translate this into German. First, the processor will look into the dictionary, we specified, for the string:

Some {0} was inserted {1}.

It finds the string and translates it to German:

Etwas {0} wurde {1} eingesetzt.

Now the processor will replace the parameters. {0} will be replaced with "text" and {1} with "here". This results in:

Etwas text wurde here eingesetzt.

As we see, it is sometimes necessary to translate the parameters as well, since "here" is not a German word and "text" should be written uppercase. This can simply be done by marking up the parameters with <i18n:text> again:

<i18n:translate>
    <i18n:text>Some {0} was inserted {1}.</i18n:text>
    <i18n:param><i18n:text>text</i18n:text></i18n:param>
    <i18n:param><i18n:text>here</i18n:text></i18n:param>
</i18n:translate>
Note
Generally, it is not necessary for the text for param substitution to be translated. E.g., it can come from a database with predefined placeholders for i18n params and there is no need to use <i18n:text> for its translation.

Parameters can be dates, numbers and currencies. Use type attribute to specify one of the possible types: date | time | date-time | number | currency | currency-no-unit | int-currency | percent . See more on params here.

Attributes

Additionally we can translate attributes. This is very useful for HTML-forms since labels of buttons are set via an attribute in HTML. To translate attributes of a tag, add an additional attribute named 'i18n:attr' containing a list of attributes, which should be translated, separated by spaces. An example:

<INPUT type="submit" value="Submit" i18n:attr="value"/>

The attribute, which will be translated is 'value'. Parameter replacement is not available for attributes at this time.

Note
Some versions of Xerces have a bug in removeAttribute() method implementation and this results in a NullPointerException if attributes translation is used. The solution is to upgrade to a newer version of Xerces.

Just as with i18n:text, the translations for attributes can come from multiple dictionaries. To use a specific dictionary, add the id of the dictionary before the key, separated by a colon:

<INPUT type="submit" value="form:Submit" i18n:attr="value"/>

Date, time and number formatting

To format dates according to the current locale use <i18n:date src-pattern="dd/MM/yyyy" pattern="dd:MMM:yyyy" value="01/01/2001" />. The 'src-pattern' attribute will be used to parse the 'value', then the date will be formatted according to the current locale using the format specified by 'pattern' attribute.

To format time for a locale (e.g. de_DE) use <i18n:time src-pattern="dd/MM/yyyy hh:mm" locale="de_DE" value="01/01/2001 12:00" />. The 'src-pattern' and 'pattern' attribute may also contain 'short', 'medium', 'long' or 'full'. The date will be formatted according to this format.

To format date and time use <i18n:date-time />.

It is also possible to specify a src-locale: <i18n:date src-pattern="short" src-locale="en_US" locale="de_DE"> 12/24/01 </i18n:date> will result in 24.12.2001

A given real pattern and src-pattern (not short, medium, long, full) overwrites the locale and src-locale.

If no pattern was specified then the date will be formatted with the DateFormat.DEFAULT format (both date and time). If no value for the date is specified then the current date will be used. E.g.: <i18n:date/> will result in the current date, formatted with default localized pattern.

To format numbers in locale sensitive manner use <i18n:number pattern="0.##" value="2.0" />. This will be useful also for Arabic, Indian, etc. number formatting. Additionally, currencies and percent formatting can be used, known types are currency, currency-no-unit, int-currency, int-currency-no-unit and percent. Another useful attribute is fraction-digits, E.g.:

  • <i18n:number type="currency" value="1703.7434" /> will result in localized presentation of the value for US locale: $1,703.74
  • <i18n:number type="currency" fraction-digits="3" value="1703.7434" /> will result in localized presentation of the value for US locale so you can print gasonline prices: $1,703.743
  • <i18n:number type="int-currency" value="170374" /> will result in localized presentation of the value for US locale: $1,703.74, and 170374 (with currency unit) for a currency without subunit.
  • <i18n:number type="int-currency-no-unit" value="170374" /> will result in localized presentation of the value for US locale: 1,703.74, and 170374 (without currency unit) for a currency without subunit.
  • <i18n:number type="percent" value="1.2" /> will result in localized percent value: %120 for most of the locales.

If someone from US want to see sales figures from a store in Germany, formatted using the german currency, you would need to use locale="de_DE" to get the currency right, e.g. 100,00 €. The decimal and grouping separator is then also from the de_DE locale. This may lead to some confusion because people from US know the "," as thousand separator. Therefore a "currency" attribute is available, so that the output from <i18n:number type="currency" locale="en_US" currency="de_DE">100</i18n:number> results in 100.00 €

Also, date and number formatting can be used with substitution params. type attribute must be used with params to indicate the param type (date, number, currency, ...). Default type is string.

<i18n:translate>
  <i18n:text>
    You have to pay {0} for {1} pounds or {2} of your profit. Valid from {3}
  </i18n:text>
  <i18n:param type="currency"
              pattern="$#,##0.00">102.5</i18n:param>
  <i18n:param type="number" value="2.5">
  <i18n:param type="percent" value="0.10" />    
  <i18n:param type="date" pattern="dd-MMM-yy" />
</i18n:translate>

Result will be like this: You have to pay $102.5 for 2.5 pounds or 10% of your profit. Valid from 13-Jun-01

Catalogues (Dictionaries)

Message catalogues contain translations to be used by the i18n transformer.

Catalogues format

A single message catalogue file contains translations for a particular language, e.g.:

<?xml version="1.0"?>
  <!-- message catalogue file for locale ... -->
  <catalogue xml:lang="locale">
         <message key="key">text</message>
         <message key="other_key">Other text</message>         
         ....
  </catalogue>

Where key attribute specifies a particular message for that language.

Usage

Files to be translated contain i18n markup. At runtime, the i18n transformer will find a message catalogue for the user's locale, and will appropriately replace the text between the <i18n:text> markup, using either the value between the tags as the lookup key or the value of the key attribute if specified. In the latter case the body value of the tag will be used in case of the not found translation.

If the i18n transformer cannot find an appropriate message catalogue for the user's given locale, it will recursively try to locate a parent message catalogue, until a valid catalogue can be found. ie:

  • catalogue_language_country_variant.xml
  • catalogue_language_country.xml
  • catalogue_language.xml
  • catalogue.xml

eg: Assuming a basename of messages and a locale of en_AU (no variant), the following search will occur:

  • messages_en_AU.xml
  • messages_en.xml
  • messages.xml

This allows the developer to write a hierarchy of message catalogues, at each defining messages with increasing depth of variation.

Sitemap configuration

<map:transformer name="i18n"
     src="org.apache.cocoon.transformation.I18nTransformer">

     <catalogues default="messages">
       <catalogue id="messages" name="messages" location="translations"/>
       <catalogue id="menu" name="menu" location="{defaults:skin}/translations"/>
     </catalogues>
     <untranslated-text>untranslated</untranslated-text>
     <cache-at-startup>true</cache-at-startup>
</map:transformer>

where:

  • catalogues: container element in which the catalogues are defined. It must have an attribute 'default' whose value is one of the id's of the catalogue elements. (mandatory).
  • catalogue: specifies a catalogue. It takes 3 required attributes: id (can be wathever you like), name (base name of the catalogue) and location (location of the message catalogue). The name and location attributes can contain references to "input modules" (same syntax as in other places in the sitemap). They are resolved on each usage of the transformer, so they can refer to e.g. request parameters. (at least 1 catalogue element required).
  • untranslated-text: text used for untranslated keys (default is to output the key name). For the i18n:text element, this will only be used if the content of the i18n:text element was empty (thus when the key was specified using the i18n:key attribute). Otherwise the content of the i18n:text element will be outputted when no translation is found.
  • cache-at-startup: flag whether to cache messages at startup (false by default).
Note
The configuration syntax was changed to add support for configuring mulitple catalogues. The old syntax, using the elements 'catalogue-name' and 'catalogue-location', is still supported, but cannot be used concurrently with the new syntax.

To use the transformer in a pipeline, simply specify it in a particular transform and indicate the needed locale. eg:

<map:match pattern="file">
        <map:generate src="file.xml"/>
         <map:transform type="i18n">
           <map:parameter name="locale" value="en_AU"/>
         </map:transform>
         <map:serialize/>
</map:match>
Note
Note, that since Cocoon version 2.0.1 you should specify the needed locale as a parameter at pipeline level. This gives more flexibility in locale selection, e.g. URI parts can be used: /en_AU/file. See LocaleAction documentation for other possibilities.

The default catalogue can be changed at the pipeline level by specifying a parameter default-catalogue-id. Likewise, a parameter untranslated-text can be used to override the default untranslated text.

Note
Before multiple catalogues were supported, the catalogue could be defined at the pipeline level by adding parameters catalogue-name and catalogue-location. This is still supported, but cannot be used concurrently with the default-catalogue-id parameter.

Samples

i18n samples from Cocoon demonstrate all the features of i18n transformer and give some ideas on user's locale determination.

Usage Pattern for Dictionary Generator Stylesheet

It is sometimes better to maintain a master dictionary that contains all the keys with translations in all the supported languages. For this purposes several helper stylesheets can be used. The stylesheets are found in Cocoon sources: src/resources/dev/i18n (in version 2.1 and higher).
Below is given an example for a new language addition using a master dictionary.

Initial key generation

To generate all the i18n keys from a source file (XML or XSP) use the markup2message.xsl stylesheet. Simply transform your content file using this stylesheet. Result will be an empty message catalogue for the given language.

Key generation from master dictionary

Generate a dictionary with keys and placeholders for Spanish translations using the merge.xsl stylesheet. Optionally, for one of the languages existing translations can be kept. To do it set stylesheet params (manually in stylesheet or in command-line):

  • mode = keys (indicates, that only keys must be in result)
  • new-lang = es (language to be added)
  • keep-lang = en (language to be kept in result, for convenience)

Command line for Xalan (Of course, Xerces and Xalan must be in your classpath):

java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_es.xml -PARAM mode keys -PARAM new-lang es -PARAM keep-lang en

(Windows users: Do not enter '\' symbol, continue typing on the same line.)

This will create a file simple_dict_es.xml with entries, keys and placeholders.

Translation

Replace placeholders with translation according to the keys or original translations, if they were kept during generation.

Add to the master dictionary

Use the same stylesheet for this purpose with this params:

mode = merge
new-lang = es
new-dict = simple_dict_es.xml

Command line for Xalan:

java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_new.xml -PARAM mode merge -PARAM new-lang es \
-PARAM new-dict simple_dict_es.xml

(Windows users: Do not enter '\' symbol, continue typing on the same line.)