Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match language code with countries where this language is an official or commonly used language

Is there any python library to get a list of countries for a specific language code where it is an official or commonly used language?

For example, language code of "fr" is associated with 29 countries where French is an official language plus 8 countries where it's commonly used.

like image 836
jack Avatar asked Apr 21 '10 05:04

jack


People also ask

What is country language code?

Standard country-language codes are comprised of two arguments separated by a dash, for example "fr-CA" is French Canadian. The first argument is a valid ISO Language Code. These codes are the lower-case two-letter codes as defined by ISO-639. You can find a full list of these codes at a number of sites like here.

Which country has most official languages?

1. Zimbabwe. With 16 official languages, Zimbabwe won the Guinness World Record in 2013 as the country with the most official languages at a national level!

Which country has 37 language?

According to the 2005 Census of Colombia, the country has 37 major languages. More than 99.5% of Colombians speak Spanish. English has official status in the San Andrés, Providencia and Santa Catalina Islands.


1 Answers

Despite the accepted answer, as far as I can tell none of the xml files underlying pycountry contains a way to map languages to countries. It contains lists of languages and their iso codes, and lists of countries and their iso codes, plus other useful stuff, but not that.

Similarly, the Babel package is great but after digging around for a while I couldn't find any way to list all languages for a particular country. The best you can do is the 'most likely' language: https://stackoverflow.com/a/22199367/202168

So I had to get it myself...

import lxml.etree
import urllib.request

def get_territory_languages():
    url = "https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/supplementalData.xml"
    langxml = urllib.request.urlopen(url)
    langtree = lxml.etree.XML(langxml.read())

    territory_languages = {}
    for t in langtree.find('territoryInfo').findall('territory'):
        langs = {}
        for l in t.findall('languagePopulation'):
            langs[l.get('type')] = {
                'percent': float(l.get('populationPercent')),
                'official': bool(l.get('officialStatus'))
            }
        territory_languages[t.get('type')] = langs
    return territory_languages

You probably want to store the result of this in a file rather than calling across the web every time you need it.

This dataset contains 'unofficial' languages as well, you may not want to include those, here's some more example code:

TERRITORY_LANGUAGES = get_territory_languages()

def get_official_locale_ids(country_code):
    country_code = country_code.upper()
    langs = TERRITORY_LANGUAGES[country_code].items()
    # most widely-spoken first:
    langs.sort(key=lambda l: l[1]['percent'], reverse=True)
    return [
        '{lang}_{terr}'.format(lang=lang, terr=country_code)
        for lang, spec in langs if spec['official']
    ]

get_official_locale_ids('es')
>>> ['es_ES', 'ca_ES', 'gl_ES', 'eu_ES', 'ast_ES']
like image 181
Anentropic Avatar answered Oct 05 '22 02:10

Anentropic