Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple way to get the language code from a country code in PHP

Tags:

php

locale

I'm using ISO 3166-1-alpha 2 codes to pass to an application to retrieve a localised feed e.g. /feeds/us for the USA. I have a switch statement which serves a feed based on that country_code.

Is there a way to convert that two digit code to the language code e.g. en_US ? I'm wondering if there is a standard / function / library for doing this in PHP or whether I need to build my own array?

like image 628
codecowboy Avatar asked Apr 16 '12 14:04

codecowboy


5 Answers

As other have pointed out, there is no built-in function as this likely due to the reality of many countries having multiple languages. So unfortunately, I can't point you to a library that does this, but I did go ahead and write a little function which does what you want.

There are two caveats, one being if it isn't provided a language it will just pick the first locale in the list. To get around this, you'd have to put some logic around the function call to provide it with the appropriate language. The other is that it needs to have php5-intl installed.

<?php

/**
/* Returns a locale from a country code that is provided.
/*
/* @param $country_code  ISO 3166-2-alpha 2 country code
/* @param $language_code ISO 639-1-alpha 2 language code
/* @returns  a locale, formatted like en_US, or null if not found
/**/
function country_code_to_locale($country_code, $language_code = '')
{
    // Locale list taken from:
    // http://stackoverflow.com/questions/3191664/
    // list-of-all-locales-and-their-short-codes
    $locales = array('af-ZA',
                    'am-ET',
                    'ar-AE',
                    'ar-BH',
                    'ar-DZ',
                    'ar-EG',
                    'ar-IQ',
                    'ar-JO',
                    'ar-KW',
                    'ar-LB',
                    'ar-LY',
                    'ar-MA',
                    'arn-CL',
                    'ar-OM',
                    'ar-QA',
                    'ar-SA',
                    'ar-SY',
                    'ar-TN',
                    'ar-YE',
                    'as-IN',
                    'az-Cyrl-AZ',
                    'az-Latn-AZ',
                    'ba-RU',
                    'be-BY',
                    'bg-BG',
                    'bn-BD',
                    'bn-IN',
                    'bo-CN',
                    'br-FR',
                    'bs-Cyrl-BA',
                    'bs-Latn-BA',
                    'ca-ES',
                    'co-FR',
                    'cs-CZ',
                    'cy-GB',
                    'da-DK',
                    'de-AT',
                    'de-CH',
                    'de-DE',
                    'de-LI',
                    'de-LU',
                    'dsb-DE',
                    'dv-MV',
                    'el-GR',
                    'en-029',
                    'en-AU',
                    'en-BZ',
                    'en-CA',
                    'en-GB',
                    'en-IE',
                    'en-IN',
                    'en-JM',
                    'en-MY',
                    'en-NZ',
                    'en-PH',
                    'en-SG',
                    'en-TT',
                    'en-US',
                    'en-ZA',
                    'en-ZW',
                    'es-AR',
                    'es-BO',
                    'es-CL',
                    'es-CO',
                    'es-CR',
                    'es-DO',
                    'es-EC',
                    'es-ES',
                    'es-GT',
                    'es-HN',
                    'es-MX',
                    'es-NI',
                    'es-PA',
                    'es-PE',
                    'es-PR',
                    'es-PY',
                    'es-SV',
                    'es-US',
                    'es-UY',
                    'es-VE',
                    'et-EE',
                    'eu-ES',
                    'fa-IR',
                    'fi-FI',
                    'fil-PH',
                    'fo-FO',
                    'fr-BE',
                    'fr-CA',
                    'fr-CH',
                    'fr-FR',
                    'fr-LU',
                    'fr-MC',
                    'fy-NL',
                    'ga-IE',
                    'gd-GB',
                    'gl-ES',
                    'gsw-FR',
                    'gu-IN',
                    'ha-Latn-NG',
                    'he-IL',
                    'hi-IN',
                    'hr-BA',
                    'hr-HR',
                    'hsb-DE',
                    'hu-HU',
                    'hy-AM',
                    'id-ID',
                    'ig-NG',
                    'ii-CN',
                    'is-IS',
                    'it-CH',
                    'it-IT',
                    'iu-Cans-CA',
                    'iu-Latn-CA',
                    'ja-JP',
                    'ka-GE',
                    'kk-KZ',
                    'kl-GL',
                    'km-KH',
                    'kn-IN',
                    'kok-IN',
                    'ko-KR',
                    'ky-KG',
                    'lb-LU',
                    'lo-LA',
                    'lt-LT',
                    'lv-LV',
                    'mi-NZ',
                    'mk-MK',
                    'ml-IN',
                    'mn-MN',
                    'mn-Mong-CN',
                    'moh-CA',
                    'mr-IN',
                    'ms-BN',
                    'ms-MY',
                    'mt-MT',
                    'nb-NO',
                    'ne-NP',
                    'nl-BE',
                    'nl-NL',
                    'nn-NO',
                    'nso-ZA',
                    'oc-FR',
                    'or-IN',
                    'pa-IN',
                    'pl-PL',
                    'prs-AF',
                    'ps-AF',
                    'pt-BR',
                    'pt-PT',
                    'qut-GT',
                    'quz-BO',
                    'quz-EC',
                    'quz-PE',
                    'rm-CH',
                    'ro-RO',
                    'ru-RU',
                    'rw-RW',
                    'sah-RU',
                    'sa-IN',
                    'se-FI',
                    'se-NO',
                    'se-SE',
                    'si-LK',
                    'sk-SK',
                    'sl-SI',
                    'sma-NO',
                    'sma-SE',
                    'smj-NO',
                    'smj-SE',
                    'smn-FI',
                    'sms-FI',
                    'sq-AL',
                    'sr-Cyrl-BA',
                    'sr-Cyrl-CS',
                    'sr-Cyrl-ME',
                    'sr-Cyrl-RS',
                    'sr-Latn-BA',
                    'sr-Latn-CS',
                    'sr-Latn-ME',
                    'sr-Latn-RS',
                    'sv-FI',
                    'sv-SE',
                    'sw-KE',
                    'syr-SY',
                    'ta-IN',
                    'te-IN',
                    'tg-Cyrl-TJ',
                    'th-TH',
                    'tk-TM',
                    'tn-ZA',
                    'tr-TR',
                    'tt-RU',
                    'tzm-Latn-DZ',
                    'ug-CN',
                    'uk-UA',
                    'ur-PK',
                    'uz-Cyrl-UZ',
                    'uz-Latn-UZ',
                    'vi-VN',
                    'wo-SN',
                    'xh-ZA',
                    'yo-NG',
                    'zh-CN',
                    'zh-HK',
                    'zh-MO',
                    'zh-SG',
                    'zh-TW',
                    'zu-ZA',);

    foreach ($locales as $locale)
    {
        $locale_region = locale_get_region($locale);
        $locale_language = locale_get_primary_language($locale);
        $locale_array = array('language' => $locale_language,
                             'region' => $locale_region);

        if (strtoupper($country_code) == $locale_region &&
            $language_code == '')
        {
            return locale_compose($locale_array);
        }
        elseif (strtoupper($country_code) == $locale_region &&
                strtolower($language_code) == $locale_language)
        {
            return locale_compose($locale_array);
        }
    }

    return null;
}
?>
like image 194
TheJF Avatar answered Sep 29 '22 03:09

TheJF


You cannot automatically convert country code to language code because some countries use multiple languages. On the other hand, OS localization system may support multiple variants of a single language for different countries (for example, en_GB vs en_US).

For example, Switzerland (CH) has both German and French commonly used (64% and 20% of the population, according to http://en.wikipedia.org/wiki/Switzerland). If you have to decide a single language for country code CH either of those languages could make sense for some people. Note that some parts of the Switzerland use only German or French as the official language (but not both, see http://en.wikipedia.org/wiki/File:Sprachen_CH_2000_EN.svg for details).

If you MUST select a single language for each country, I'd suggest doing the selection by hand for every country you support. For an half-assed automatic implementation, you could scan through your available localizations and select the first one that has the matching country code after the underscore.

Also note the corollary: languages cannot be represented by national flags because languages and countries do not have 1:1 relation. One to many relations can be found in both directions.

like image 27
Mikko Rantalainen Avatar answered Sep 29 '22 04:09

Mikko Rantalainen


As noted by other answers there is no one to one mapping between countries and languages. However, if you have the PHP Intl extension installed it should be possible to use the Unicode CLDR likely subtags data to get the “default” or “likely” language for a specific country:

function getLanguage(string $country): string {
    $subtags = \ResourceBundle::create('likelySubtags', 'ICUDATA', false);
    $country = \Locale::canonicalize('und_'.$country);
    $locale = $subtags->get($country) ?: $subtags->get('und');
    return \Locale::getPrimaryLanguage($locale);
}

Now when you call the getLanguage() function with a country code you get the according language code back:

getLanguage('US'); // "en"
getLanguage('GB'); // "en"
getLanguage('DE'); // "de"
getLanguage('CH'); // "de"
getLanguage('IN'); // "hi"
getLanguage('NO'); // "nb"
getLanguage('BR'); // "pt"

This also works fine for three letter country codes:

getLanguage('USA'); // "en"
getLanguage('GBR'); // "en"
getLanguage('AUT'); // "de"
getLanguage('FRA'); // "fr"

And even UN M49 codes:

getLanguage('003'); // "en"
getLanguage('013'); // "es"
getLanguage('039'); // "it"
getLanguage('155'); // "de"
like image 21
ausi Avatar answered Sep 29 '22 05:09

ausi


You will want to cross reference these files:

http://www.ethnologue.com/codes/LanguageIndex.tab http://www.ethnologue.com/codes/CountryCodes.tab http://www.ethnologue.com/codes/LanguageCodes.tab

..or get them all in one zip here: http://www.ethnologue.com/codes/Language_Code_Data_20110104.zip

There is no current set PHP function that returns this data that I'm aware of.

like image 36
AO_ Avatar answered Sep 29 '22 03:09

AO_


the answer from TheJF is pretty good, however there are a few (general) issues that I came across:

  • his code will return br-FR if you call country_code_to_locale("FR") - now br (Breton) is not even an offical language according to Wikipedia. Although fr-FR is in the list, br-FR is the first in the array. this happens with many other countries too.

  • many other locale lists are trying to be extremly complete and consider all possible languages

  • it is difficult to draw the line here, good examples where you certainly want to keep multiple languages for a country are: Canada and Switzerland

I went with a simple approach:

  • I kept only 1 language for most countries, and left multiple for some countries like BE, CA, CH, ZA. I kept es-US, but I am not sure about that (Wikipedia says: Official languages: None at federal level)

  • I also kept multiple languages for countries I was too lazy to research or that use both, Latin and Cyrillic

  • I added shuffle($locales); which will randomize the array, such that we get random locales for countries with multiple languages. It made sense for my use case, but you might want to remove that.

  • For my purpose, only languages that have relevant prevalence on the web are of interest. This list is by no means complete or correct, but pragmatic.

So here is my locale list:

$locales = array('af-ZA',
                'am-ET',
                'ar-AE',
                'ar-BH',
                'ar-DZ',
                'ar-EG',
                'ar-IQ',
                'ar-JO',
                'ar-KW',
                'ar-LB',
                'ar-LY',
                'ar-MA',
                'ar-OM',
                'ar-QA',
                'ar-SA',
                'ar-SY',
                'ar-TN',
                'ar-YE',
                'az-Cyrl-AZ',
                'az-Latn-AZ',
                'be-BY',
                'bg-BG',
                'bn-BD',
                'bs-Cyrl-BA',
                'bs-Latn-BA',
                'cs-CZ',
                'da-DK',
                'de-AT',
                'de-CH',
                'de-DE',
                'de-LI',
                'de-LU',
                'dv-MV',
                'el-GR',
                'en-AU',
                'en-BZ',
                'en-CA',
                'en-GB',
                'en-IE',
                'en-JM',
                'en-MY',
                'en-NZ',
                'en-SG',
                'en-TT',
                'en-US',
                'en-ZA',
                'en-ZW',
                'es-AR',
                'es-BO',
                'es-CL',
                'es-CO',
                'es-CR',
                'es-DO',
                'es-EC',
                'es-ES',
                'es-GT',
                'es-HN',
                'es-MX',
                'es-NI',
                'es-PA',
                'es-PE',
                'es-PR',
                'es-PY',
                'es-SV',
                'es-US',
                'es-UY',
                'es-VE',
                'et-EE',
                'fa-IR',
                'fi-FI',
                'fil-PH',
                'fo-FO',
                'fr-BE',
                'fr-CA',
                'fr-CH',
                'fr-FR',
                'fr-LU',
                'fr-MC',
                'he-IL',
                'hi-IN',
                'hr-BA',
                'hr-HR',
                'hu-HU',
                'hy-AM',
                'id-ID',
                'ig-NG',
                'is-IS',
                'it-CH',
                'it-IT',
                'ja-JP',
                'ka-GE',
                'kk-KZ',
                'kl-GL',
                'km-KH',
                'ko-KR',
                'ky-KG',
                'lb-LU',
                'lo-LA',
                'lt-LT',
                'lv-LV',
                'mi-NZ',
                'mk-MK',
                'mn-MN',
                'ms-BN',
                'ms-MY',
                'mt-MT',
                'nb-NO',
                'ne-NP',
                'nl-BE',
                'nl-NL',
                'pl-PL',
                'prs-AF',
                'ps-AF',
                'pt-BR',
                'pt-PT',
                'ro-RO',
                'ru-RU',
                'rw-RW',
                'sv-SE',
                'si-LK',
                'sk-SK',
                'sl-SI',
                'sq-AL',
                'sr-Cyrl-BA',
                'sr-Cyrl-CS',
                'sr-Cyrl-ME',
                'sr-Cyrl-RS',
                'sr-Latn-BA',
                'sr-Latn-CS',
                'sr-Latn-ME',
                'sr-Latn-RS',
                'sw-KE',
                'tg-Cyrl-TJ',
                'th-TH',
                'tk-TM',
                'tr-TR',
                'uk-UA',
                'ur-PK',
                'uz-Cyrl-UZ',
                'uz-Latn-UZ',
                'vi-VN',
                'wo-SN',
                'yo-NG',
                'zh-CN',
                'zh-HK',
                'zh-MO',
                'zh-SG',
                'zh-TW');

and the code:

function country_code_to_locale($country_code)
{
    $locales = ...

    // randomize the array, such that we get random locales
    // for countries with multiple languages (CA, CH)
    shuffle($locales);

    foreach ($locales as $locale) {
        $locale_region = locale_get_region($locale);

        if (strtoupper($country_code) == $locale_region) {
            return $locale;
        }
    }

    return "en-US";
}
like image 35
Eugen Avatar answered Sep 29 '22 04:09

Eugen