Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any good resources or advice for working with languages with different orientations? (such as Japanese or Chinese)

We have an enterprise web application where every bit of text in the system is localised to the user's browser's culture setting.

So far we have only supported English, American (similar but mis-spelt ;-) and French (for the Canadian Gov't - app in English or French depending on user preference). During development we also had some European languages in mind like Dutch and German that tend to concatenate words into very long ones.

We're currently investigating support for eastern languages: Chinese, Japanese, and so on. I understand that these use phonetic input converted to written characters. How does that work on the web? Do the same events fire while inputs and textareas are being edited (we're quite Ajax heavy).

What conventions do users of these top-down languages expect online?

What effect does their dual-input (phonetic typing + conversion) have on web controls?

With RTL languages like Arabic do users expect the entire interface to be mirrored? For instance should things like OK/Cancel buttons be swapped and on the left?

like image 253
Keith Avatar asked Aug 16 '08 18:08

Keith


3 Answers

As an Arabic speaker, when I do look at Arabic websites, I do expect things like OK/Cancel to be swapped. When reading Arabic, my eyes read from right to left. So, in situations where you'd want to reader to view an affirmative/action button (e.g. OK, Submit, Yes, etc.) before a negative/inaction button (Cancel, Clear, No, etc.), you'd probably want to put the former on the right.

Caveat: As weird as it sounds, the above only applies (to me personally) when the button text is in Arabic. If the button text is in English (in a mixed-language web page), I'd prefer to see the OK button on the left.

Hope that helps.

like image 185
Bassam Avatar answered Oct 07 '22 02:10

Bassam


Read Globalization Step-by-Step by Microsoft.

I can answer the specifics on CJKV, but you probably want a book on this topic. I haven't read it but CJKV Information Processing is from O'Reilly (2nd ed due Dec, 2008).

I understand that these use phonetic input converted to written characters. How does that work on the web?

The input is done by a class of software called an IME (Input Method Editor) on Windows, Mac, and Linux (e.g. SCIM). When an IME is turned on, the input from the keyboard first goes to the IME, and the user gets to pick the correct kanji/hiragana combo. When the user commits by hitting return key, the IME types in the kanji/hiragana into the web browser using the current encoding. Encoding situation was a big mess, but if you are writing a web app, go with an encoding of Unicode. I suggest UTF-8.

Do the same events fire while inputs and textareas are being edited?

A Unicode savvy web browser and OS combo handles multiple languages. For example, one can use English normal version of Firefox to browse and post to a Japanese website. From the browsers point of view, it's just an array of "bla bla bla" in Unicode. In other words, if the event fires up in English, the same event should fire up in CJKV if you use a Unicode variant.

What conventions do users of these top-down languages expect online?

CJKV readers expect left-to-right online. Math and science textbooks are written from left-to-right. Most word processors, including localized version of Word, write left-to-right.

What effect does their dual-input (phonetic typing + conversion) have on web controls?

For the most part you should not have to worry about it, unless you are trapping keyboard events. For example, I hate using Japanese keyboard with bunch of extra keyboard. So, when I have to assign IME on/off command to some key on US keyboard. I personally use right-Alt. Also, spacebar and enter key is used during conversion, but not sure if these events are passed to browser.

like image 6
Eugene Yokota Avatar answered Oct 07 '22 00:10

Eugene Yokota


The directionality question is easy to answer for East Asian languages: websites are left-to-right, top-to-bottom as per usual.

In fact, the general web design layout principles much the same. Have a look at the websites of a newspaper (name top left, navigation bar under with "Home" on the left, headline links below with most important at the top) or a search engine (don't think I need to say which US site you should compare that layout to).

However, just as Arabic/Hebrew/etc right-to-left language users will expect left-to-right progression in some contexts (embedded English fragments and so on), there are situations, even on the web, where top-to-bottom layout is preferred. This is generally done by including an image with the text layout and font desired, or using flash.

Internet Explorer has actually offered tb-rl layout with the CSS writing-mode property since version 5.5 however none of the other browsers have bothered implementing it (or ruby, which is useful for sites aimed at a young audience). IE 5.5 was released in 2000, so that's eight years of support, and there was a W3C candidate recommendation in 2003 but text layout in CSS still being poked around.

As for your worries with text input and IMEs, as long as you're not doing something bogus like trying to manually translate the virtual keys given by keydown events into text strings, you're unlikely to run into problems.

There are some additional issues you've not mentioned however. The minimum comfortably readable font size is larger than for languages written with the Latin script. Bold and italic for emphasis in flow are generally not appropriate. Han unification means to need to be picky about specifying the right fonts for the different CJK languages when working with unicode. You may want to provide both traditional and simplified interfaces for Chinese, depending on what audience you are expecting.

I've been meaning to write up a more comprehensive guide along these lines for a while, if you need more information feel free to kick me.

like image 2
gz. Avatar answered Oct 07 '22 00:10

gz.