Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which ISO format should I use to store a user's language code?

Tags:

Should I use ISO 639-1 (2-letter abbreviation) or ISO 639-2 (3 letter abbrv) to store a user's language code? Both are official standards, but which is the de facto standard in the development community? I think ISO 639-1 would be easier to remember, and is probably more popular for that reason, but thats just a guess.

The site I'm building will have a separate site for the US, Brazil, Russia, China, & the UK.

http://en.wikipedia.org/wiki/ISO_639

like image 733
John Himmelman Avatar asked Mar 24 '10 20:03

John Himmelman


People also ask

What is an ISO language code?

ISO 639 is a standardized nomenclature used to classify languages. Each language is assigned a two-letter (639-1) and three-letter (639-2 and 639-3) lowercase abbreviation, amended in later versions of the nomenclature.

How many languages are used in ISO?

There are 58 languages in ISO 639-2 which are considered, for the purposes of the standard, to be "macrolanguages" in ISO 639-3. Some of these macrolanguages had no individual language as defined by ISO 639–3 in the code set of ISO 639-2, e.g. 'ara' (Generic Arabic).


2 Answers

You should use IETF language tags because they are already used for HTTP/HTML/XML and many other technologies. They are based on several standards including the ISO-639 collection (yes language, region and culture selection are not so simple to define).

I wrote a more detailed article regarding the proper language code selection and usage. The idea is to use the simplest/shorter ISO-639-1 codes and specify more only for special cases. Inside the article there are codes for ~30 most used languages with reasons why I consider one alternative better than another.

In case you want to skip reading the entire article here is a short list of language codes (not to be confused with country codes): ar, cs, da, de, el, en, en-gb, es, fr, fi, he, hu, it, ja, ko, nb, nl, pl, pt, pt-pt, ro, ru, sv, tr, uk, zh, zh-hant

The following points may not be obvious but should be borne in mind:

  • en is used for en-us - American English, and for British English is used en-gb
  • pt is used for pt-br, and not pt-pt witch has much less speakers
  • zh is used instead of zh-hans, zh-CN,...
  • zh-hant (Traditional Chinese) is used instead of more specific codes like zh-hant-TW or zh-TW

You can find more explanations inside the article.

like image 164
sorin Avatar answered Oct 11 '22 18:10

sorin


I would go with a derivative of ISO 639. Specifically I like to use this: http://en.wikipedia.org/wiki/IETF_language_tag

like image 21
Ben Avatar answered Oct 11 '22 18:10

Ben