Creating localizable web applications

by 2 contributors:

An important step of developing a web application or creating web content is making sure that it can be localized. Listed below are good practices and recommendations that should be followed in order to make your content easily localizable.

Most of the code snippets used in the examples below come from an early version of the getpersonas.com website. In some cases, the code snippets were slightly changed to better illustrate the recommendations or for clarity.

Cheatsheet

  • Don't hardcode English text, formats (numbers, dates, addresses, etc.), word order or sentence structure.
  • Don't put text or numbers in images.
  • Don't forget about right-to-left locales.
  • Take advantage of printf() (or equivalents) and use variables in the English strings.
  • Write semantic code (e.g. don't use text and <img/> for decorations; instead, use CSS).
  • Document your code so that localizers know what they're translating (e.g. in gettext use comments and contexts).

App Logic

Detect the locale correctly

Be smart about detecting the user's locale correctly. You can use one or more of the following techniques:

  • HTTP Accept-Language headers,
  • the UA string,
  • IP geolocation.

See examples of the addons.mozillaorg code at /addons/trunk/site/app/config/language.php and /addons/trunk/site/app/config/language.inc.php. The LANGUAGE_CONFIG class expects arrays of valid languages & supported languages.

Always give the user a possibility to change the locale (e.g. by adding locale dropdown menu at bottom of page) and remember this choice for the future visits.

Use the locale code in the URLs

Depending on how you detect user's locale, you may want to provide a way of overriding the autodetection. You can achieve this by setting a cookie when the changes the locale with the language dropdown, or by looking for locale code in the URL. The latter involves rewriting the URLs to include the locale code and rewriting Apache's aliases to handle locale in URLs.

You can put the locale code as the top-most element of the URL's path (e.g. http://example.com/en-US/foo/bar) or on its end (e.g. http://example.com/foo/bar/en-US). Avoid using it in a subdomain, as it can cause problems with certificates (this is wrong: <strike>http://en-us.example.com/foo/bar</strike>).

Simplify localized versions if necessary

Oftentimes, it is better to slightly simplify the localized version of your web application than to serve a mix of localized and English content. For example, if not all the pages of your website are going to be localized, you may consider removing links to the English-only pages from the navigation (headers, footers, sidebars) in the localized versions.

Define the locale and the direction in the HTML

Generate the lang attribute dynamically, depending on the current locale. Use the dir attribute on the <html/> element and consider using a rtl class on <html/> or <body/> as well, in order to easily change CSS rules like in the example below.

Snippet 1. Bad:

<html lang="en">

Snippet 2. Good:

CSS:

html[dir='rtl'] foo { 
  /* RTL-specific rules for the FOO element */ 
}

body.rtl foo { 
  /* RTL-specific rules for the FOO element */ 
}

HTML/PHP:

<?php
    function isRTL($locale) {
        $RTL_locales = array('ar', 'fa', 'he');
        return in_array($locale, $RTL_locales);
    }
?>
<html lang="<?= $locale?>" dir="<?= isRTL($locale) ? 'rtl' : 'ltr' ?>" >
    <body class="<?= $locale?> <?= isRTL($locale) ? 'rtl' : 'ltr' ?>">
    </body>
</html>

Notice that <body/> is given a class equal to the current locale. This is useful to add minor corrective rules to the CSS that apply only for selected locales. For example, for locales that tend to have longer words than English, you may want to make an element slightly wider.

Snippet 3. Good:

body.de foo, body.fr foo, body.pl foo { 
  /* locale-specific rules for the FOO element */ 
  width: 10em; /* originally 8em */
}

Adapt the interaction to RTL locales

Right-to-left locales not only require good images handling (see Images), but also should be taken into account when designing the interaction on the website. Consider the following example: a filmreel-like slideshow showcasing highlighted features of the product or featured designs. For right-to-left languages, the slideshow should go from right to left as well, making the last element in the HTML the first one to be displayed.

Separate URLs from navigation

Sometimes, when the URLs are well-designed, you may want to use the URL to do something in the code depending on when the user is. Take the URL structure of the getpersonas.com website for example. The URL http://getpersonas.com/nature/popular/2 points to the second page of the listing of the popular Personas in the "Nature" category. You could easily use list($category, $tab, $page) = explode('/', $path); to get this information directly from the URL. After that, it is tempting to use the $category or $tab variables in the interface. However, this is problematic for localization. You probably don't want to localize the URLs to keep them uniform across locales as well as to avoid issues with non-Latin and/or RTL characters.  So in order to display a localized label of a category or a tab, you should create a mapping between the non-localizable English names used in the URLs and the localizable English strings used in the interface. Consider the following example:

Snippet 1. Good:

$tab_labels = array( "popular" => _('Popular'),
                     "recent" => _('Recent'),
                     "all" => _('All'),
                     "my" => _('My'),
                     "favorites" => _('Favorites')
                     );
list($category, $tab, $page) = explode('/', $path);
if ($tab == 'popular') {          // $tab is always English
  // ....
  echo $tab_labels($tab);         // this will display the translation
}

Indicate the language of the pages you link to if it is different from the user's current language. For English, add hreflang="en" to links to resources that are not going to be localized or are external to your web application. Then, use CSS to give a cue to the user that if she follows the link, she will be served English content.

Snippet 1. Bad:

<a href="http://www.mozilla.com/en-US/privacy-policy.html"><?= _('Privacy policy'); ?></a>

Snippet 2. Good:

CSS:

a[hreflang="en"]::after {
  content: " [en]"
}

HTML/PHP:

<a href="http://www.mozilla.com/en-US/privacy-policy.html" hreflang="en"><?= _('Privacy policy'); ?></a>

 

Don't mingle app logic and localizable content when using pure HTML

If you decide not to use gettext on some pages (e.g. because they contain a lot of text and localizing source HTML is easier), make sure to keep the code responsible for application logic separate from the localizable content. The logic of the website should not be exposed directly in the localization files, to avoid any accidental changes by localizers.

Snippet 1. Bad:

require_once('lib/user.php');
$user = new PersonaUser();

Snippet 2. Good:

require_once('templates/footer.php');

If it's not possible to remove the app logic code, you should consider using gettext. Gettext extracts localizable content form the source files, thus making it impossible for localizers to accidentally change them. You can learn more about the choice of the format for your project at File formats.

Text messages

Don't hardcode English content

Allow localizers to localize English content, such as:

  • text messages,
  • number formats,
  • date formats,
  • word order and sentence structure.

Note that some strings might be hidden in libraries' code (e.g. error messages), or in JavaScript libraries and scripts.

If you are using pure HTML instead of gettext to localize your webapp, consider using an additional gettext-like format such as .lang to streamline localizers' work with repeating content. This is useful for strings occurring in the webapp multiple times, like "return to top", "comments", "click to see larger image" etc. Might be also helpful for headers and footers, if you're not using templates to display them.

In most of the cases though, you should use gettext whenever technically possible (i.e. the server's PHP has been built with gettext support).

Localize the date format

Localizing the date format is as easy as localizing any other string. Just let the localizers localize the format specification string.

Snippet 1. Bad:

$persona['date'] = date("n/j/Y", strtotime($persona['approve']));

Snippet 2. Good:

$persona['date'] = date(_("n/j/Y"), strtotime($persona['approve']));

Localize the number format

You can make the number format localizable using the information returned by localeconv() in PHP.

Snippet 1. Bad:

printf(_("%s MB>"), $size);

Snippet 2. Good:

function num_format($num, $decimals) {
  $locale_info = localeconv();
  return number_format($num, $decimals, $locale_info['decimal_point'], $locale_info['thousands_sep']);
}

printf(_("%s MB"), num_format($size, 1));

Wrap as few HTML tags as possible

When wrapping the localizable content with the gettext function calls, put all the code that irrelevant to localization outside the function call.

Snippet 1. Bad:

<?= _("<a href=\"https://addons.mozilla.org/firefox/downloads/latest/10900\" class=\"get-personas\" id=\"download\"><span>Get Personas for Firefox - Free</span>");?><span class="arrow"></span></a>

Snippet 2. Good:

<a href="https://addons.mozilla.org/firefox/downloads/latest/10900" class="get-personas" id="download">
  <span><?= _("Get Personas for Firefox - Free");?></span><span class="arrow"></span>
</a>

 

Snippet 3. Bad:

<p><?= _("<strong class=\"legal\">Design Acceptance:</strong> If a design is accepted, we will send the following message:");?></p>
<p><?= _("<strong class=\"legal\">Design Rejection:</strong> If a design is rejected, we will send the following message:");?></p>

Snippet 4. Good:

<p><strong class="legal"><?= _("Design Acceptance:");?></strong> <?= _("If a design is accepted, we will send the following message:");?></p>
<p><strong class="legal"><?= _("Design Rejection:");?></strong> <?= _("If a design is rejected, we will send the following message:");?></p>

 

Snippet 5. Bad:

<p id="breadcrumbs">
  <?printf(_("<a href=\"%s\">Personas Home</a> : <a href=\"%s\">Sign In</a> : Forgot Your Password?"), 
             $locale_conf->url('/'), 
             $locale_conf->url('/signin'));?>
</p>

Snippet 6. Good:

<p id="breadcrumbs">
  <?printf("<a href=\"%s\">" . _("Personas Home") . "</a> : <a href=\"%s\">" . _("Sign In") . "</a> : " . _("Forgot Your Password?"), 
            $locale_conf->url('/'), 
            $locale_conf->url('/signin'));?>
</p>

 

Snippet 7. Bad:

<p class="description"><?= _("<strong>Description:</strong>");?></p>

Snippet 8. Good:

<p class="description"><strong><?= _("Description:");?></strong></p>

 

Snippet 9. Good:

<h1>
  <?printf("<a href=\"%s\"><img src=\"/static/img/logo.png\" alt=\"" . _("Mozilla Labs Personas") . "\" /></a>", 
           $locale_conf->url('/'));?>
</h1>

Snippet 10. Better:

<h1>
  <a href="<?= $locale_conf->url('/') ?>">
    <img src="/static/img/logo.png" alt="<?= /* L10N: link title attribute */ _("Mozilla Labs Personas"); ?> " />
  </a>
</h1>

...but don't sacrifice flexibility

Don't sacrifice flexibility trying to satisfy the rule above. Make sure the content supports changing the order of the sentence, which may be required by some grammars.

Snippet 1. Bad:

<p class="added"><?= _("<strong>Added:</strong>") . $persona['date']; ?></p>

Snippet 2. Bad:

<p class="added"><strong><?= _("Added:") ?></strong><?= $persona['date']; ?></p>

Snippet 3. Good:

<p class="added"><? printf( /* L10N: %s is a date */ _("<strong>Added:</strong> %s"), $persona['date']);?></p>

The first bad snippet puts the <strong/> HTML elements inside the gettext function call and concatenates the $persona['date'] variable to it. Following the rule about wrapping as few HTML elements with the gettext function call as possible, you could try to put the <strong/> HTML tag outside of the PHP code (cf. snippet 2). However, in this snippet, the concatenation of the $persona['date'] variable is still hardcoded and only allows one ordering of the sentence, while some grammars might require, for instance, to put the date in front of the "Added" descriptor. For this reason, it is better to leave the <strong/> HTML tags inside the gettext function call and take advantage of the printf() variable that will be substituted by the date upon interpretation of the code (snippet 3).

Snippet 4. Good:

<h3>
  <?printf( /* L10N: %s is the author's username */ _("created by <a href=\"%s\">%s</a>"), 
           $locale_conf->url('/gallery/Designer/' . $persona['author']), 
           $persona['display_username']);?>
</h3>

In this example the link is in the _() call so that localizers can adjust the position of the author's name, depending on the grammar of their language.

Use printf() for string substitution

Whenever there is content that will change, either upon interpretation of the code or as part of development, don't use concatenation. Instead, use printf() and string formatting. For instance, don't put URIs into msgid's. If you do, if the static URI changes, you'll have to regenerate the *.po files to include the new msgids.

Snippet 1. Bad:

<?= _("View a sample Persona Header <b><a href=\"/static/img/Persona_Header_LABS.jpg\">here</a></b>.");?>

Snippet 2. Good:

<?php printf(_("View a sample Persona Header <b><a href=\"%s\">here</a></b>."), '/static/img/Persona_Header_LABS.jpg'); ?>

 

Snippet 3. Bad:

<p><?=_("If you are interested in supporting the approval process by becoming an approver, please email <a href=\"mailto:personas@mozilla.com\">personas@mozilla.com</a>.")?></p>

Snippet 4. Good:

<p><?= printf(_("If you are interested in supporting the approval process by becoming an approver, please email <a href=\"mailto:%s\">%s</a>."),
              'personas@mozilla.com', 
              'personas@mozilla.com')?>
</p>

Snippet 5. Also good:

<p><?= printf(_("If you are interested in supporting the approval process by becoming an approver, please email <a href=\"mailto:%1$s\">%1$s</a>."),
              'personas@mozilla.com')?>
</p>

The same goes for variables that are unknown until the code is interpreted. Localizers should have a possibility to adapt the order of the sentence (including the variable part) to the grammar and preferred style used in their language. Consider the following example.

Snippet 6. Bad:

<p class="added"><?= _("<strong>Added:</strong>") . $persona['date']; ?></p>

Snippet 7. Good:

<p class="added"><? printf( /* L10N: %s is a date */ _("<strong>Added:</strong> %s"), $persona['date']);?></p>

In Snippet 6 the concatenation causes the ordering of the sentence to be fixed, while some grammars might require, for instance, to put the date in front of the "Added" descriptor. You should take advantage of the printf() variable that will be substituted by the date upon interpretation of the code (snippet 7).

Use gettext comments

Use comments in the code to help localizers understand what they are translating. You can explain where the string will appear in the application, or what the variables used in the string will be replaced with. Put comments in the same line as the gettext function call (inline comments, in PHP these are /* ... */), or one line directly above the gettext function call (block comments, in PHP they start with # ... or // ...). In either way, use a consistent prefix for localization-related comments, e.g. "L10n". When extracting strings with xgettext you will be able to include only comments starting with this prefix using the --add-comments=PREFIX option, for example xgettext --add-comments=L10n.

Snippet 1. Bad:

<h1>
  <a href="<?= $locale_conf->url('/') ?>">
    <img src="/static/img/logo.png" alt="<?= _("Mozilla Labs Personas"); ?>" />
  </a>
</h1>

Snippet 2. Good:

<h1>
  <a href="<?= $locale_conf->url('/') ?>">
    <img src="/static/img/logo.png" alt="<?= /* L10n: link title attribute */ _("Mozilla Labs Personas") ?> " />
  </a>
</h1>

 

Snippet 3. Bad:

<p class="added"><? printf(_("<strong>Added:</strong> %s"), $persona['date']);?></p>

Snippet 4. Good:

<p class="added"><? printf( /* L10N: %s is a date */ _("<strong>Added:</strong> %s"), $persona['date']);?></p>

 

Snippet 5. Bad:

printf(_("%1$s by %2$s"), $persona['name'], $persona['display_username']);

Snippet 6. Good:

// %1$s is persona name, %2$s is athor's username
printf(_("%1$s by %2$s"), $persona['name'], $persona['display_username']);

Use printf variables swapping

Use printf() ordered variables (%1$s, %2$s, etc.) to allow changes to the order of the sentence. Some languages may require this. Remember to use single quotes around the strings containing the formatting symbols. Otherwise, PHP will treat $s as a regular variable, instead of parsing the whole %1$s formatting symbol.

Snippet 1. Bad:

$page_header = $persona['name'] . ' by ' . $persona['display_username'];

Snippet 2. Better:

printf(_("%s by %s"), $persona['name'], $persona['display_username']);

Snippet 3. Good:

// %1$s is the persona's name, %2$s is the athor's username
printf(_('%1$s by %2$s'), $persona['name'], $persona['display_username']);

Note the single quotes around '%1$s by %2$s'.

Don't nest gettext calls

Snippet 1. Bad:

<?printf(_("<a href=\"%s\">" . _("Personas Home") . "</a> : How to Create Personas"), $locale_conf->url('/'));?>

Snippet 2. Good:

<?printf("<a href=\"%s\">" . _("Personas Home") . "</a> : " . _("How to Create Personas"), $locale_conf->url('/'));?>

Don't break long text content into multiple strings

Don't break long text messages into smaller pieces if the text is a coherent whole. Examples include long paragraphs or e-mail bodies. Gettext doesn't specify the order of the strings in the messages.po file, so a localizer may end up seeing the partial strings of your content scattered all over the file. If you really have to use multiple strings, then make sure you're using comments or event contexts to let localizers know which part they're translating (cf. snippet 2 below).

Snippet 1. Bad:

echo _("Long text\n");
echo _("Second part\n");
echo _("Third part\n");

Snippet 2. Still bad (but slightly better than snippet 1):

# L10n: Long text example, part 1.
echo _("Long text\n");
# L10n: Long text example, part 2.
echo _("Second part\n");
# L10n: Long text example, part 3.
echo _("Third part\n");

Snippet 3. Good:

# L10n: No indentation is possible after the first line.
echo _("Long text
Second part
Third part\n");

Snippet 4. Good (even better):

# L10n: You can indent lines to your liking.
echo _("Long text\n"
      . "Second part\n"
      . "Third part\n");

The solution in snippet 3 doesn't allow to use code indentation for "Second part" and "Third part". If you indent "Second part", the resulting string (interpreted by PHP and Gettext) will end up indented as well. It is thus recommended to use the solution from snippet 4. Consider the following example:

Snippet 5. Bad indentation:

PHP code:

# L10n: This will be wrongly indented.
echo _("Long text
        Second part
        Third part\n");

PHP output:

Long text
        Second part
        Third part

messages.po:

#. L10n: This will be wrongly indented.
msgid ""
"Long text\n"
"        Second part\n"
"        Third part\n"
msgstr ""

In order to indent your code, you must use string concatenation. See snippet 4 above for an example of how to do this.

Use gettext contexts

Depending on context in which it is used, one English string might require two or more different translations. This is particularly true for short strings, like "File" or "Log in". For instance, "Log in" as a button label might be translated by a localizer as the imperative, but for a dialog title, the localizer may choose to use a different form, like gerund (much like "Logging in"). Gettext's context feature allows the developer to distinguish between two identical English strings and disambiguate the translation.

Use gettext plurals

Whenever you put numbers in your messages, make it possible to use different singular and plural forms.

Snippet 1. Bad:

print '<p class="numb-users">' . sprintf(_("%d active daily users"), number_format($persona['popularity'])) . '</p>';

Snippet 2. Good:

print '<p class="numb-users">' . sprintf(ngettext("%d active daily user", "%d active daily users"), 
                                         number_format($persona['popularity'])) . '</p>';

One might argue that adding plural support here is not necessary because, for instance, the number of daily users in the example above will always be greater than 1, i.e. will always require the use of the plural form. While this is true for English, it should be noted that some languages require different forms of strings for numbers greater than 1 as well. For example, all numbers ending in 2, 3 or 4 (be it 21 or 1021) might require a special plural form.

Read more about plurals in gettext and about plural rules for different languages.

Don't use text as decoration

 This needs more work.

<?printf("<a href=\"%s\">" . _("Step 3: Testing your Persona Images") . "</a> &raquo;", $locale_conf->url('/demo_create_3'));?>
<?printf("<a href=\"%s\">" . _("Step 2: Creating a Persona Footer Image") . "</a> &raquo;", $locale_conf->url('/demo_create_2'));?>
<?printf("<a href=\"%s\">" . _("Step 4: Submit your Persona!") . "</a> &raquo;", $locale_conf->url('/demo_create_4'));?>

<div class="tut_left"><?printf("<b>&laquo; <a href=\"%s\">" . _("Back to Step 1") . "</a></b>", $locale_conf->url('/demo_create'));?></div>
<div class="tut_right"><?printf("<b><a href=\"%s\">" . _("Continue to Step 3") . "</a> &raquo;</b>", $locale_conf->url('/demo_create_3'));?></div>

Using &laquo; and &raquo; should be OK here for RTL languages (they are flipped correctly if there are no Latin characters next to them, which there aren't any), so let's leave it as it is. In general though, we should consider implementing such decorations as CSS images (background-image or ::after's/::before's content) and then select them with "html[dir="rtl"] > ...". It a safer method.

<?php if($showWearThis) { ?>
  $(".try-button").personasButton({
    'hasPersonas':'<span><?= _("wear this");?></span><span>&nbsp;</span>',
    'hasFirefox':'<span><?= _("get personas now!");?></span><span>&nbsp;</span>',
    'noFirefox':'<span><?= _("get personas with firefox");?></span><span>&nbsp;</span>'
  });
<?php } ?>

Images

Don't put text or numbers in the images

Just don't do that. Applies also to numbers.

Image 1. Bad:

personas-btn-get.png

Image 2. Bad:

personas-faq-header.png

If you wish to use a non-standard font (as in the image above), take advantage of the CSS's on-line fonts feature available via @font-face.

Image 3. Bad:

personas-logo-beta.png

The trouble with the above image is the "for Firefox" part, which should be made localizable. Keep in mind that you should allow to localize the whole "for Firefox" part, not only the "for" preposition to which you'd concatenate the "Firefox" part. That's because some languages might require changing the word order, and others might require putting the word Firefox in the correct grammatical case.

Image 4. Bad:

feature-bg-performance.png

Image 5 & Snippet 1. Good:

Image file (/img/tignish/firefox/performance-chart.png):

performance-chart.png

HTML: (in this case, no gettext was used and the localizers worked on pure HTML files)

<div id="performance-chart">
  <h4>Firefox Performance: Fast — Faster — <em>Fastest</em></h4>
  <p>Results of a SunSpider test on a Windows XP machine</p>
  <img src="/img/tignish/firefox/performance-chart.png" alt="Firefox 2, Firefox 3, Firefox 3.5 performance chart" />
  <ul>
    <li>18,148 ms</li>
    <li>3,669 ms</li>
    <li>1,524 ms!</li>
   </ul>
</div>

In the above example, not only does the text above the clock charts require translation, but so do the milliseconds captions below them. Many languages use different number formats than English, like 18 148 or 18.148. Also, the last caption includes an exclamation mark, and for some languages (e.g. French), the orthographic rules might require putting a space between the exclamation mark and the preceding word.

Make icons flippable for RTL

Image 1.

question-64.png

This icon should have its right-to-left equivalent, with the "؟" character which is used in some RTL languages, like Arabic and Persian (note that Hebrew uses "?"). You should then display the right icon depending on the locale. The following example shows how to achieve this with CSS.

Snippet 1. Bad:

<div class="tut_didyouknow">
  <img src="/static/img/question-64.png" class="tut_icon">
  <?printf (_("Did you know you can test a Persona before you submit it?  <b><a href=\"%s\">Find out how!</a>&raquo;</b>"), 
            $locale_conf->url('/demo_create_3#test'));?>
</div>

Snippet 2. Good:

CSS:

div.tut_didyouknow {
  background: url(/static/img/question-64.png) no-repeat 0 0;
  padding-left: 64px;
}
   
html[dir='rtl'] div.tut_didyouknow {
  background-image: url(/static/img/question-64.png);  
  background-position: 100% 0;
  padding-left: 0;
  padding-right: 64px;
}

HTML/PHP:

<div class="tut_didyouknow">
  <?printf (_("Did you know you can test a Persona before you submit it?  <b><a href=\"%s\">Find out how!</a>&raquo;</b>"), 
            $locale_conf->url('/demo_create_3#test'));?>
</div>

Notice that the icon has been moved to CSS, so that it doesn't sit in a <img/> element. This is generally considered a good practice for decorative graphics.

Don't use images as buttons

Instead, use <button/> and style it with CSS.

Image 1. Bad:

tut_btn_getStarted.gif

Snippet 1. Good:

CSS:

.button {
    font-weight: bold;
    color: #0077a6;
    font-family: Arial, sans-serif;
    border: none;
    background: none;
    cursor: pointer;
    overflow: visible;
    width: auto;
    height: 30px;
    text-decoration: none;
    vertical-align: middle;
}

.button span {
    background: #fff url(../img/main-sprites.png) no-repeat scroll -384px 1px;
    display:inline;
    line-height: 25px;
    padding: 6px 6px 6px 10px;
}

.button .arrow {
    background: transparent url(../img/main-sprites.png) no-repeat scroll -651px 1px;
    padding: 6px 15px;
}

html[dir='rtl'] .button .arrow {
    /* Flip the arrow to point to the left*/
    background: transparent url(../img/main-sprites.png) no-repeat scroll -601px 1px;
}

HTML/PHP:

<button type="submit" class="button"><span><?= _('get started'); ?></span><span class="arrow"></span></button>

Don't put captions in the images

Image 1. Bad:

tut_headerImage.jpg

Document Tags and Contributors

Contributors to this page: stasm, gandalf
Last updated by: stasm,