Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What should I know to make my I18N application work in Japanese?

I'm working on a I18N application which will be located in Japanese, I don't know any word in Japanese, and I'm first wondering if utf8 is enough for that language.

Usually, for European language, utf8 is enough, and I've to set up my database charset/collation to use utf8_general_ci (in MySQL) and my html views in utf8, and it's enough.

But what about Japanese, is there something else to do?

By the way my application would be able to handle English, French, Japanese, but later on, it may be needed to add some languages, let's say, Russian.

How could I set up my I18N application to be available widely without having to change much configurations on deployment?

Is there any best practices?

By the way, I'm planning to use gettext, I'm pretty sure it supports such languages without any problems as it is the de facto standard for almost all GNU softwares, but any feedback?

like image 815
Boris Guéry Avatar asked Jun 01 '11 10:06

Boris Guéry


2 Answers

A couple of points:

  • UTF-8 is fine for your app-internal data, but if you need to process user-supplied documents (e.g. uploads), those may use other encodings like Shift-JIS or ISO-2022-JP
  • Japanese text does not use whitespace between words. If your app needs to split text into words somewhere, you've got a problem.
  • Apart from text, date and number formats differ
  • The generic collation may not lead to a useful sort order for Japanese text - if your app involves large lists that people have to find things in, this can be a problem.
like image 82
Michael Borgwardt Avatar answered Sep 30 '22 21:09

Michael Borgwardt


Yep, Unicode contains all the code points you need to display English, French, Japanese, Russian, and pretty much any language in the world (including Taiwanese, Cherokee, Esperanto, really anything but Elfish). That's what it's for. Due to the nature of UTF8, though, text in more esoteric languages will take a few bytes more to store.

Gettext is widely used and your PHP build probably even includes it. See http://php.net/gettext for usage details.

like image 22
Wander Nauta Avatar answered Sep 30 '22 19:09

Wander Nauta