We are creating multi-language subsites on our website.
I would like to use the 2-letter language codes. Spanish and French are easy. They will get URLs like:
mydomain.com/es mydomain.com/fr
but I run into a problem with Traditional and Simplified chinese. Are there standards for which 2 letter codes to use for these languages?
mydomain.com/zh mydomain.com/?
From various sources it looks like zh-CN is Simplified Chinese and zh-TW is Traditional Chinese.
zh-TW is an IETF language tag for the Chinese language as used in Taiwan, meaning any of: Taiwanese Mandarin. the use of traditional Chinese characters in writing, as done in Taiwan. Taiwanese Hokkien, a variety of Min Nan Chinese, which could be indicated more specifically by nan-TW .
@dkarp gives an excellent general answer. I will add some additional specifics regarding Chinese:
There are several countries where Chinese is the main written language. The major difference between them is whether they use simplified or traditional characters, but there are also minor regional differences (in vocabulary, etc). The standard way to distinguish these would be with a country code, e.g. zh_CN
for mainland China, zh_SG
for Singapore, zh_TW
for Taiwan, or zh_HK
for Hong Kong.
Mainland China and Singapore both use simplified characters, and the others use traditional characters. Since China and Taiwan are the two with the biggest populations, just zh_CN
and zh_TW
are often used to distinguish the simplified and traditional character versions of a website.
More technically correct but not commonly used in practice, however, would be to use zh_HANS
for (generic) simplified Chinese characters, and zh_HANT
for traditional Chinese characters, except for rare cases when it is meaningful to distinguish different countries.
There is indeed a standard representation for this. As people have run into the exact same problem you are seeing -- same language, but different dialects or characters -- they've extended the two-letter language code with a two-letter region code. So you might have a universal French page at mydomain.com/fr
, but internationalizing for French Canadian readers might leave you with a mydomain.com/fr_CA
(Canada) and mydomain.com/fr_FR
(France). Some platforms use a dash instead of an underscore to separate the language and region codes (hence fr-CA
and fr-FR
).
The standard locale for simplified Chinese is zh_CN
. The standard locale for traditional Chinese is zh_TW
.
I hesitate to point you towards the actual BCP 47 standards documents, as they're, uh, a little heavy on the detail and a little light on the readability. Just go with standard locale identifiers, like the ones in used by Java, and you'll be fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With