Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Language codes for simplified Chinese and traditional Chinese?

We are creating multi-language subsites on our website.

I would like to use the 2-letter language codes. Spanish and French are easy. They will get URLs like:

mydomain.com/es mydomain.com/fr 

but I run into a problem with Traditional and Simplified chinese. Are there standards for which 2 letter codes to use for these languages?

mydomain.com/zh mydomain.com/? 
like image 381
jeph perro Avatar asked Feb 03 '11 22:02

jeph perro


People also ask

Is zh traditional or simplified?

From various sources it looks like zh-CN is Simplified Chinese and zh-TW is Traditional Chinese.

Is zh-TW Traditional Chinese?

zh-TW is an IETF language tag for the Chinese language as used in Taiwan, meaning any of: Taiwanese Mandarin. the use of traditional Chinese characters in writing, as done in Taiwan. Taiwanese Hokkien, a variety of Min Nan Chinese, which could be indicated more specifically by nan-TW .


2 Answers

@dkarp gives an excellent general answer. I will add some additional specifics regarding Chinese:

There are several countries where Chinese is the main written language. The major difference between them is whether they use simplified or traditional characters, but there are also minor regional differences (in vocabulary, etc). The standard way to distinguish these would be with a country code, e.g. zh_CN for mainland China, zh_SG for Singapore, zh_TW for Taiwan, or zh_HK for Hong Kong.

Mainland China and Singapore both use simplified characters, and the others use traditional characters. Since China and Taiwan are the two with the biggest populations, just zh_CN and zh_TW are often used to distinguish the simplified and traditional character versions of a website.

More technically correct but not commonly used in practice, however, would be to use zh_HANS for (generic) simplified Chinese characters, and zh_HANT for traditional Chinese characters, except for rare cases when it is meaningful to distinguish different countries.

like image 170
Todd Owen Avatar answered Sep 27 '22 21:09

Todd Owen


There is indeed a standard representation for this. As people have run into the exact same problem you are seeing -- same language, but different dialects or characters -- they've extended the two-letter language code with a two-letter region code. So you might have a universal French page at mydomain.com/fr, but internationalizing for French Canadian readers might leave you with a mydomain.com/fr_CA (Canada) and mydomain.com/fr_FR (France). Some platforms use a dash instead of an underscore to separate the language and region codes (hence fr-CA and fr-FR).

The standard locale for simplified Chinese is zh_CN. The standard locale for traditional Chinese is zh_TW.

I hesitate to point you towards the actual BCP 47 standards documents, as they're, uh, a little heavy on the detail and a little light on the readability. Just go with standard locale identifiers, like the ones in used by Java, and you'll be fine.

like image 29
dkarp Avatar answered Sep 27 '22 21:09

dkarp