Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good definition for language code and locale codes?

  • When to use en_GB and en-GB ?
  • What is the difference ?
  • Is there an ISO name for this ISO 639-1 (language) and ISO 3166 (country) combination ?
  • like image 291
    johnlemon Avatar asked May 27 '11 12:05

    johnlemon


    People also ask

    What is a locale language code?

    In computing, a locale is a set of parameters that defines the user's language, region and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language code and a country/region code.

    What is the difference between language and locale?

    Setting a thread's locale or changing your system locale will change how numbers, dates, and times are displayed for controls created on that thread or running on your system, respectively. A language, on the other hand, is what we speak, read, and write.

    What does code mean in language?

    A system of symbols, letters, or words given certain arbitrary meanings, used for transmitting messages requiring secrecy or brevity. noun. 1.


    1 Answers

    There are several systems for locale identifiers. Many of them are similar at the first glance, but not when you go deeper:

    Some examples (Serbian-Serbia with Latin Script, Japanese-Japan with radical sorting):

    • UTS-35, ICU, Mac OS X, Flash: sr-Latn-RS, ja-JP@collation=radical
    • Newer UTS-35, BCP 47 extension U: sr-Latn-RS, ja-JP-u-co-unihan
    • Win 2000, XP: 0x81a, 0x10411
    • Vista, Win 7: sr-Latn-CS, ja-JP_radical
    • Java: sr_CS, ja_JP
    • Java 7: sr_RS, ja_JP
    • Linux: sr_RS@latin, ja_JP.utf8

    Think of it like different ways to talk about colors (RGB, CMYB, HSV, Pantone, etc.)

    So - vs. _ does not make sense unless you specify what the is the environment you are using. Use - and Java will not understand it, use _ and Windows will not understand it. ICU (and systems build on top of it) accept both - and _, but produce the _ style.

    There is no ISO that covers the combination of language-country. But there are ISOs that cover the various parts (language, country, script). The exact version of the ISO also depends on the system used for locale identifiers.


    In general you should accept both _ and -, and generate only one ("be liberal in what you accept and strict in what you emit") (like ICU).

    If you communicate with systems using another type of locale identifier, you will have to map to/from your system. That will force you to use _ or -. Some of the mappings will be lossy (there is no way to specify alternate calendars in Windows, Linux; or alternate sorting or scripts in Java older than 7, etc.) and round-tripping might not be possible (somewhat similar to conversions RGB-CMYK).

    Addition: things are different not only between systems, but they can change in time. For instance Java 7 added support for sr_RS and for scripts, Windows keeps adding support for more locales, new countries get created (Sudan split, Russia, Serbia) or disappear (East Germany, U.S.S.R, Yugoslavia) and so on.

    For internal representation you might want to choose the most powerful one, that can represent everything, and that is UTS-35 / BCP 47 (also used by CLDR and ICU).

    like image 184
    Mihai Nita Avatar answered Oct 04 '22 00:10

    Mihai Nita