Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

i18n - best practices for internationalization - XLIFF, gettext, INI, ...? [closed]

POEdit isn't really hard to get a hang of. Just create a new .po file, then tell it to import strings from source files. The program scans your PHP files for any function calls matching _("Text"), gettext("Text"), etc. You can even specify your own functions to look for.

You then enter a translation in the appropriate box. When you save your .po file, a .mo file is automatically generated. That's just a binary version of the translations that gettext can easily parse.

In your PHP script make a call to bindtextdomain() telling it where your .mo file is located. Now any strings passed to gettext (or the underscore function) will be translated.

It makes it really easy to keep your translation files up to date. POEdit also has some neat features like allowing comments, showing changed and dropped strings and allowing fuzzy matches, which means you don't have to re-translate strings that have been slightly modified.


There is always Translate Toolkit which allow translating between I think all mentioned formats, and preferred gettext (po) and XLIFF.


you can use INI if you want, it's just that INI doesn't have a way to tell anyone that it is in UTF8, so if someone opens your INI with an editor, it might corrupt yout file.

So the idea is that, if you can trust the user to edit it with a UTF8 encoding.

You can add a BOM at the start of the file, some editors knows about it.

What do you want it to store ? user generated content or your application ressources ?


I worked with two of these formats on the l18n side: TMX and XLIFF. They are pretty similar. TMX is more popular nowdays, but XLIFF is gaining support quickly. There was at least one free XLIFF editor when I last looked into it: Transolution but it is not being developed now.


I do the data storage myself using a custom design - All displayed text is stored in the DB.

I have two tables. The first table has an identity value, a 32 character varchar field (indexed on this field) and a 200 character english description of the phrase.

My second table has the identity value from the first table, a language code (EN_UK,EN_US,etc) and an NVARCHAR column for the text.

I use an nvarchar for the text because it supports other character sets which I don't yet use.

The 32 character varchar in the first table stores something like 'pleaselogin' while the second table actually stores the full "Please enter your login and password below".

I have created a huge list of dynamic values which I replace at runtime. An example would be "You have {[dynamic:passworddaysremain]} days to change your password." - this allows me to work around the word ordering in different languages.

I have only had to deal with Arabic numerals so far but will have to work something out for the first user who requires non arabic numbers.

I actually pull this information out of the database on a 2 hourly interval and cache it to the disk in a file for each language in XML. Extensive use of the CDATA is used.

There are many options available, for performance you could use html templates for each language - My method works well but does use the XML DOM a lot at runtime to create the pages.