Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing internationalization (language strings) in a PHP application

I want to build a CMS that can handle fetching locale strings to support internationalization. I plan on storing the strings in a database, and then placing a key/value cache like memcache in between the database and the application to prevent performance drops for hitting the database each page for a translation.

This is more complex than using PHP files with arrays of strings - but that method is incredibly inefficient when you have 2,000 translation lines.

I thought about using gettext, but I'm not sure that users of the CMS will be comfortable working with the gettext files. If the strings are stored in a database, then a nice administration system can be setup to allow them to make changes whenever they want and the caching in RAM will insure that the fetching of those strings is as fast, or faster than gettext. I also don't feel safe using the PHP extension considering not even the zend framework uses it.

Is there anything wrong with this approach?

Update

I thought perhaps I would add more food for thought. One of the problems with string translations it is that they doesn't support dates, money, or conditional statements. However, thanks to intl PHP now has MessageFormatter which is what really needs to be used anyway.

// Load string from gettext file
$string = _("{0} resulted in {1,choice,0#no errors|1#single error|1<{1, number} errors}");

// Format using the current locale
msgfmt_format_message(setlocale(LC_ALL, 0), $string, array('Update', 3));

On another note, one of the things I don't like about gettext is that the text is embedded into the application all over the place. That means that the team responsible for the primary translation (usually English) has to have access to the project source code to make changes in all the places the default statements are placed. It's almost as bad as applications that have SQL spaghetti-code all over.

So, it makes sense to use keys like _('error.404_not_found') which then allow the content writers and translators to just worry about the PO/MO files without messing in the code.

However, in the event that a gettext translation doesn't exist for the given key then there is no way to fall back to a default (like you could with a custom handler). This means that you either have the writter mucking around in your code - or have "error.404_not_found" shown to users that don't have a locale translation!

In addition, I am not aware of any large projects which use PHP's gettext. I would appreciate any links to well-used (and therefore tested), systems which actually rely on the native PHP gettext extension.

like image 750
Xeoncross Avatar asked Oct 19 '11 15:10

Xeoncross


People also ask

How to add localization in php?

The first part is similar that in the PHP array localization section and what it does is take the locale code related to the user-selected language and pass it to PHP code. So, next putenv() the function sets the environment variable of the LANG variable. Then the setlocale() method was employed to set the locale.

How to change language in php code?

This tutorial will show you how to do just that, very simply create language files and call a specific language on request. First create a folder called 'lang' in this folder is there all the languages will be stored. Create a new file in the lang folder one for each language for example eng. php for english fre.

What is I18n php?

With I18n, all of the text strings displayed to the user from the application are replaced by function calls which can dynamically load translated strings for any language the user selects.

What does php gettext do?

The gettext functions implement an NLS (Native Language Support) API which can be used to internationalize your PHP applications. Translating strings can be done in PHP by setting the locale, setting up your translation tables and calling gettext() on any string you want to translate.


2 Answers

Gettext uses a binary protocol that is quite quick. Also the gettext implementation is usually simpler as it only requires echo _('Text to translate');. It also has existing tools for translators to use and they're proven to work well.

You can store them in a database but I feel it would be slower and a bit overkill, especially since you'd have to build the system to edit the translations yourself.

If only you could actually cache the lookups in a dedicated memory portion in APC, you'd be golden. Sadly, I don't know how.

like image 189
Andre Avatar answered Oct 11 '22 16:10

Andre


For those that are interested, it seems full support for locales and i18n in PHP is finally starting to take place.

// Set the current locale to the one the user agent wants
$locale = Locale::acceptFromHttp(getenv('HTTP_ACCEPT_LANGUAGE'));

// Default Locale
Locale::setDefault($locale);
setlocale(LC_ALL, $locale . '.UTF-8');

// Default timezone of server
date_default_timezone_set('UTC');

// iconv encoding
iconv_set_encoding("internal_encoding", "UTF-8");

// multibyte encoding
mb_internal_encoding('UTF-8');

There are several things that need to be condered and detecting the timezone/locale and then using it to correctly parse and display input and output is important. There is a PHP I18N library that was just released which contains lookup tables for much of this information.

Processing User input is important to make sure you application has clean, well-formed UTF-8 strings from whatever input the user enters. iconv is great for this.

/**
 * Convert a string from one encoding to another encoding
 * and remove invalid bytes sequences.
 *
 * @param string $string to convert
 * @param string $to encoding you want the string in
 * @param string $from encoding that string is in
 * @return string
 */
function encode($string, $to = 'UTF-8', $from = 'UTF-8')
{
    // ASCII is already valid UTF-8
    if($to == 'UTF-8' AND is_ascii($string))
    {
        return $string;
    }

    // Convert the string
    return @iconv($from, $to . '//TRANSLIT//IGNORE', $string);
}


/**
 * Tests whether a string contains only 7bit ASCII characters.
 *
 * @param string $string to check
 * @return bool
 */
function is_ascii($string)
{
    return ! preg_match('/[^\x00-\x7F]/S', $string);
}

Then just run the input through these functions.

$utf8_string = normalizer_normalize(encode($_POST['text']), Normalizer::FORM_C);

Translations

As Andre said, It seems gettext is the smart default choice for writing applications that can be translated.

  1. Gettext uses a binary protocol that is quite quick.
  2. The gettext implementation is usually simpler as it only requires _('Text to translate')
  3. Existing tools for translators to use and they're proven to work well.

When you reach facebook size then you can work on implementing RAM-cached, alternative methods like the one I mentioned in the question. However, nothing beats "simple, fast, and works" for most projects.

However, there are also addition things that gettext cannot handle. Things like displaying dates, money, and numbers. For those you need the INTL extionsion.

/**
 * Return an IntlDateFormatter object using the current system locale
 *
 * @param string $locale string
 * @param integer $datetype IntlDateFormatter constant
 * @param integer $timetype IntlDateFormatter constant
 * @param string $timezone Time zone ID, default is system default
 * @return IntlDateFormatter
 */
function __date($locale = NULL, $datetype = IntlDateFormatter::MEDIUM, $timetype = IntlDateFormatter::SHORT, $timezone = NULL)
{
    return new IntlDateFormatter($locale ?: setlocale(LC_ALL, 0), $datetype, $timetype, $timezone);
}

$now = new DateTime();
print __date()->format($now);
$time = __date()->parse($string);

In addition you can use strftime to parse dates taking the current locale into consideration.

Sometimes you need the values for numbers and dates inserted correctly into locale messages

/**
 * Format the given string using the current system locale
 * Basically, it's sprintf on i18n steroids.
 *
 * @param string $string to parse
 * @param array $params to insert
 * @return string
 */
function __($string, array $params = NULL)
{
    return msgfmt_format_message(setlocale(LC_ALL, 0), $string, $params);
}

// Multiple choices (can also just use ngettext)
print __(_("{1,choice,0#no errors|1#single error|1<{1, number} errors}"), array(4));

// Show time in the correct way
print __(_("It is now {0,time,medium}), time());

See the ICU format details for more information.

Database

Make sure your connection to the database is using the correct charset so that nothing gets currupted on storage.

String Functions

You need to understand the difference between the string, mb_string, and grapheme functions.

// 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5) normalization form "D"
$char_a_ring_nfd = "a\xCC\x8A";

var_dump(grapheme_strlen($char_a_ring_nfd));
var_dump(mb_strlen($char_a_ring_nfd));
var_dump(strlen($char_a_ring_nfd));

// 'LATIN CAPITAL LETTER A WITH RING ABOVE' (U+00C5)
$char_A_ring = "\xC3\x85";

var_dump(grapheme_strlen($char_A_ring));
var_dump(mb_strlen($char_A_ring));
var_dump(strlen($char_A_ring));

Domain name TLD's

The IDN functions from the INTL library are a big help processing non-ascii domain names.

like image 26
Xeoncross Avatar answered Oct 11 '22 18:10

Xeoncross