Introduction
Developing and maintaining multi-language sites is a common problem for web developers.
The usage of XML and XSL makes this task much more easier, especially with
content, logic and presentation separation concept.
-
Internationalization (i18n) -
- Process of developing a product in such a way that it works with data in different languages and can be adapted to various target markets without engineering changes.
-
Localization (l10n) -
- Subsequent process of translating and adapting a product to a given market's cultural conventions.
This approach for internationalization (further - i18n) of XML documents within Cocoon
is based on a transformer -
I18nTransformer
, which uses XML dictionaries for all the multilingual data. The namespace URI of i18n transformer is defined as follows:
| | |
|
xmlns:i18n="http://apache.org/cocoon/i18n/2.0
| |
| | |
Other implementation details
-
Default name in sitemap: i18n
-
Class: org.apache.cocoon.transformation.I18nTransformer
-
Cacheable: no.
-
Poolable: yes.
Brief description
The following features are supported by the i18n transformer:
-
Text translation
-
Attribute translation
-
Parameter substitution (with translation if needed)
-
Date, number and currency formatting
A simple example of i18n markup:
| | |
|
<para title="first" name="article" i18n:attr="title name">
<i18n:text>This text will be translated.</i18n:text>
</para>
| |
| | |
The text inside the <i18n:text> will be used as a key to find the
translation in the message catalogue. All attributes (of any other namespace element) that are listed in the i18n:attr attribute also will be translated and their values will be used as dictionary keys.
|
Although, date, time, number and currency formatting is also supported, in some cases it is needed to use XSP or some other dynamic means to achieve more flexibility.
|
Markup Reference
Summary
Special tags in i18n namespace are used to mark parts of XML document that should be substituted with dictionary messages.
Tags list
Element |
Description |
i18n:text |
Used for simple text translation |
i18n:attr |
Attribute for any element (not in i18n-namespace). Contains the names of other attributes of that element to be translated |
i18n:translate |
Translates text with parameter substitution |
i18n:param |
Used with i18n:translate to provide substitution parameter |
i18n:date |
Formats the date in localized manner |
i18n:time |
Formats the time in localized manner |
i18n:date-time |
Formats the date and time in localized manner |
i18n:number |
Formats numbers, currencies and percent in localized manner |
i18n:text
To translate some simple text we use the <i18n:text> tag:
| | |
|
<i18n:text>Text to be translated</i18n:text>
| |
| | |
The text between the <i18n:text> -tags is used as a key to find the translation in the dictionary.
The 'i18n:key' attribute can be used to specify a special key for
the dictionary. Normally, the text itself is used as the key to find
the translation in the dictionary. If we specify the 'i18n:key' attribute this
key is used to find the translation and the text itself is used as the default value,
if no translation can be found.
| | |
|
<i18n:text i18n:key="key_text">Default value</i18n:text>
| |
| | |
Translation with param substitution
To translate the text with param substitution the <i18n:translate> tag must be used.
We can specify some <i18n:param> -tags which contain
parameters. The values of these parameters will be inserted into the
translated text, replacing placeholders. Placeholders have the
following syntax: \{[0-9]+\} . An example:
| | |
|
<i18n:translate>
<i18n:text>Some {0} was inserted {1}.</i18n:text>
<i18n:param>text</i18n:param>
<i18n:param>here</i18n:param>
</i18n:translate>
| |
| | |
Now we want to translate this into German.
First, the processor will look into the dictionary, we specified, for
the string:
Some {0} was inserted {1}.
It finds the string and translates it to German:
Etwas {0} wurde {1} eingesetzt.
Now the processor will replace the parameters. {0} will be replaced
with "text" and {1} with "here". This results in:
Etwas text wurde here eingesetzt.
As we see, it is sometimes necessary to translate the parameters
as well, since "here" is not a German word and "text" should be written
uppercase. This can simply be done by marking up the parameters with
<i18n:text> again:
| | |
|
<i18n:translate>
<i18n:text>Some {0} was inserted {1}.</i18n:text>
<i18n:param><i18n:text>text</i18n:text></i18n:param>
<i18n:param><i18n:text>here</i18n:text></i18n:param>
</i18n:translate>
| |
| | |
|
Generally, it is not necessary for the text for param substitution to be translated.
E.g., it can come from a database with predefined placeholders for i18n params and there is no need to use <i18n:text> for its translation.
|
Parameters can be dates, numbers and currencies. Use type attribute to specify one of the possible types: date | time | date-time | number | currency | currency-no-unit | int-currency | percent . See more on params here.
Attributes
Additionally we can translate attributes. This is very useful for
HTML-forms since labels of buttons are set via an attribute in
HTML. To translate attributes of a tag, add an additional attribute
named 'i18n:attr' containing a list of attributes, which should be
translated, separated by spaces. An example:
| | |
|
<INPUT type="submit" value="Submit" i18n:attr="value"/>
| |
| | |
The attribute, which will be translated is 'value'.
Parameter replacement is not available for attributes at this time.
|
Some versions of Xerces have a bug in removeAttribute() method implementation and this
results in a NullPointerException if attributes translation is used. The solution is to upgrade
to a newer version of Xerces.
|
Date, time and number formatting
To format dates according to the current locale use <i18n:date src-pattern="dd/MM/yyyy" pattern="dd:MMM:yyyy" value="01/01/2001" /> . The 'src-pattern' attribute will be used to parse the 'value' , then the date will be formatted according to the current locale using the format specified by 'pattern' attribute.
To format time for a locale (e.g. de_DE) use <i18n:time src-pattern="dd/MM/yyyy hh:mm" locale="de_DE" value="01/01/2001 12:00" /> . The 'src-pattern' and 'pattern' attribute may also contain 'short' , 'medium' , 'long' or 'full' . The date will be formatted according to this format.
To format date and time use <i18n:date-time /> .
It is also possible to specify a src-locale: <i18n:date src-pattern="short" src-locale="en_US" locale="de_DE"> 12/24/01 </i18n:date> will result in 24.12.2001
A given real pattern and src-pattern (not short, medium, long, full) overwrites the locale and src-locale .
If no pattern was specified then the date will be formatted with the DateFormat.DEFAULT format (both date and time). If no value for the date is specified then the current date will be used. E.g.: <i18n:date/> will result in the current date, formatted with default localized pattern.
To format numbers in locale sensitive manner use <i18n:number pattern="0.##" value="2.0" /> . This will be useful also for Arabic, Indian, etc. number formatting. Additionally, currencies and percent formatting can be used, known types are currency , currency-no-unit , int-currency , int-currency-no-unit and percent . Another useful attribute is fraction-digits , E.g.:
-
<i18n:number type="currency" value="1703.7434" /> will result in localized presentation of the value for US locale: $1,703.74
-
<i18n:number type="currency" fraction-digits="3" value="1703.7434" /> will result in localized presentation of the value for US locale so you can print gasonline prices: $1,703.743
-
<i18n:number type="int-currency" value="170374" /> will result in localized presentation of the value for US locale: $1,703.74, and 170374 (with currency unit) for a currency without subunit.
-
<i18n:number type="int-currency-no-unit" value="170374" /> will result in localized presentation of the value for US locale: 1,703.74, and 170374 (without currency unit) for a currency without subunit.
-
<i18n:number type="percent" value="1.2" /> will result in localized percent value: %120 for most of the locales.
Also, date and number formatting can be used with substitution params. type attribute must be used with params to indicate the param type (date, number, currency, ...). Default type is string .
| | |
|
<i18n:translate>
<i18n:text>
You have to pay {0} for {1} pounds or {2} of your profit. Valid from {3}
</i18n:text>
<i18n:param type="currency"
pattern="$#,##0.00">102.5</i18n:param>
<i18n:param type="number" value="2.5">
<i18n:param type="percent" value="0.10" />
<i18n:param type="date" pattern="dd-MMM-yy" />
</i18n:translate>
| |
| | |
Result will be like this: You have to pay $102.5 for 2.5 pounds or 10% of your profit. Valid from 13-Jun-01
Catalogues (Dictionaries)
Message catalogues contain translations to be used by the i18n transformer.
Catalogues format
A single message catalogue file contains translations for a particular language, e.g.:
| | |
|
<?xml version="1.0"?>
<!-- message catalogue file for locale ... -->
<catalogue xml:lang="locale">
<message key="key">text</message>
<message key="other_key">Other text</message>
....
</catalogue>
| |
| | |
Where key attribute specifies a particular message for that language.
Usage
Files to be translated contain i18n markup.
At runtime, the i18n transformer will find a message catalogue for the
user's locale, and will appropriately replace the text between the
<i18n:text> markup, using either the value between the tags as
the lookup key or the value of the key attribute if specified. In the latter
case the body value of the tag will be used in case of the not found translation.
If the i18n transformer cannot find an appropriate message catalogue for
the user's given locale, it will recursively try to locate a parent
message catalogue, until a valid catalogue can be found. ie:
-
catalogue_language_country_variant.xml
-
catalogue_language_country.xml
-
catalogue_language.xml
-
catalogue.xml
eg: Assuming a basename of messages and a locale of en_AU
(no variant), the following search will occur:
-
messages_en_AU.xml
-
messages_en.xml
-
messages.xml
This allows the developer to write a hierarchy of message catalogues,
at each defining messages with increasing depth of variation.
Sitemap configuration
| | |
|
<map:transformer name="i18n"
src="org.apache.cocoon.transformation.I18nTransformer">
<catalogue-name>messages</catalogue-name>
<catalogue-location>translations</catalogue-location>
<untranslated-text>untranslated</untranslated-text>
<cache-at-startup>true</cache-at-startup>
</map:transformer>
| |
| | |
where:
-
catalogue-name: base name of the message
catalogue (mandatory).
-
catalogue-location: location of the
message catalogues (mandatory).
-
untranslated-text: text used for
untranslated keys (default is to output the key name).
-
cache-at-startup: flag whether to cache
messages at startup (false by default).
To use the transformer in a pipeline, simply specify it in a particular transform and indicate the needed locale. eg:
| | |
|
<map:match pattern="file">
<map:generate src="file.xml"/>
<map:transform type="i18n">
<map:parameter name="locale" value="en_AU"
</map:transform>
<map:serialize/>
</map:match>
| |
| | |
| Note, that since Cocoon version 2.0.1 you should specify the needed locale as a parameter at pipeline level. This gives more flexibility in locale selection, e.g. URI parts can be used: /en_AU/file . See LocaleAction documentation for other possibilities. |
Also, catalogue-name, catalogue-location
and untranslated-text can all be overridden at the
pipeline level by specifying them as parameters to the transform statement.
Samples
i18n samples from Cocoon demonstrate all the features of i18n transformer and give some ideas on user's locale determination.
Usage Pattern for Dictionary Generator Stylesheet
It is sometimes better to maintain a master dictionary that contains
all the keys with translations in all the supported languages. For this purposes several helper stylesheets can be used.
The stylesheets are found in Cocoon sources: src/resources/dev/i18n (in version 2.1 and higher).
Below is given an example for a new language addition using a master dictionary.
Initial key generation
To generate all the i18n keys from a source file (XML or XSP) use the markup2message.xsl stylesheet. Simply transform your content file using this stylesheet. Result will be an empty message catalogue for the given language.
Key generation from master dictionary
Generate a dictionary with keys and placeholders for Spanish translations using the merge.xsl stylesheet. Optionally, for one of the languages existing translations can be kept.
To do it set stylesheet params (manually in stylesheet or in command-line):
-
mode = keys (indicates, that only keys must be in result)
-
new-lang = es (language to be added)
-
keep-lang = en (language to be kept in result, for convenience)
Command line for Xalan (Of course, Xerces and Xalan must be in your classpath):
| | |
|
java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_es.xml -PARAM mode keys -PARAM new-lang es -PARAM keep-lang en
| |
| | |
(Windows users: Do not enter '\' symbol, continue typing on the same line.)
This will create a file simple_dict_es.xml with entries, keys and placeholders.
Translation
Replace placeholders with translation according to the keys or original
translations, if they were kept during generation.
Add to the master dictionary
Use the same stylesheet for this purpose with this params:
| | |
|
mode = merge
new-lang = es
new-dict = simple_dict_es.xml
| |
| | |
Command line for Xalan:
| | |
|
java org.apache.xalan.xslt.Process -IN simple_dict.xml -XSL merge.xsl \
-OUT simple_dict_new.xml -PARAM mode merge -PARAM new-lang es \
-PARAM new-dict simple_dict_es.xml
| |
| | |
(Windows users: Do not enter '\' symbol, continue typing on the same line.)
Finally
To be done
-
Multiple dictionaries per pipeline support
-
Markup support in translations
-
Named parameters support
-
Dictionary caching
-
Different bundle implementations
Contacts
Feel free to contact for any comments and improvement ideas either directly Konstantin Piroumian
or through the Cocoon Mail List.
|