Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what languages are supported in icu collation?

Tags:

collation

icu

I was browsing through the ICU source code (http://icu-project.org/), and I couldn't find what languages it supports out of the box for collation. Could someone help me?

like image 700
Dervin Thunk Avatar asked Oct 13 '22 22:10

Dervin Thunk


1 Answers

Edit: Note that this list was written a couple of years ago. Follow the links for updated lists. CLDR no longer advertises which sublocales are claimed to be supported implicitly.


colfiles.mk lists the tailorings and aliases.

Besides root (UCA) there are tailorings for: (COLLATION_SOURCE) af ar as az be bg bn bs ca cs cy da de el eo es et fa fa_AF fi fil fo fr gu ha haw he hi hr hu hy ig is ja kk kl km kn ko kok lt lv mk ml mr mt nb nn om or pa pl ps ro ru si sk sl sq sr sr_Latn sv ta te th to tr uk ur vi yo zh zh_Hant

However, many languages (such as English, Italian, Japanese, … ) are not listed, because the root (UCA, fallback) behavior is correct.

COLLATION_EMPTY_SOURCE has the list of additional locales which are considered to be valid: af_NA af_ZA ar_AE ar_BH ar_DZ ar_EG ar_IQ ar_JO ar_KW ar_LB ar_LY ar_MA ar_OM ar_QA ar_SA ar_SD ar_SY ar_TN ar_YE as_IN az_Latn az_Latn_AZ be_BY bg_BG bn_BD bn_IN bs_BA ca_ES chr chr_US cs_CZ cy_GB da_DK de_AT de_BE de_CH de_DE de_LI de_LU el_CY el_GR en en_AS en_AU en_BE en_BW en_BZ en_CA en_GB en_GU en_HK en_IE en_IN en_JM en_MH en_MP en_MT en_MU en_NA en_NZ en_PH en_PK en_SG en_TT en_UM en_US en_US_POSIX en_VI en_ZA en_ZW es_419 es_AR es_BO es_CL es_CO es_CR es_DO es_EC es_ES es_GQ es_GT es_HN es_MX es_NI es_PA es_PE es_PR es_PY es_SV es_US es_UY es_VE et_EE fa_IR fi_FI fil_PH fo_FO fr_BE fr_BF fr_BI fr_BJ fr_BL fr_CA fr_CD fr_CF fr_CG fr_CH fr_CI fr_CM fr_DJ fr_FR fr_GA fr_GN fr_GP fr_GQ fr_KM fr_LU fr_MC fr_MF fr_MG fr_ML fr_MQ fr_NE fr_RE fr_RW fr_SN fr_TD fr_TG ga ga_IE gu_IN ha_Latn ha_Latn_GH ha_Latn_NE ha_Latn_NG he_IL hi_IN hr_HR hu_HU hy_AM id id_ID ig_NG is_IS it it_CH it_IT ja_JP ka ka_GE kk_KZ kl_GL kn_IN ko_KR kok_IN lt_LT lv_LV mk_MK ml_IN mr_IN ms ms_BN ms_MY mt_MT nb_NO nl nl_BE nl_NL nn_NO om_ET om_KE or_IN pa_Arab pa_Arab_PK pa_Guru pa_Guru_IN pl_PL ps_AF pt pt_BR pt_PT ro_MD ro_RO ru_MD ru_RU ru_UA si_LK sk_SK sl_SI sq_AL sr_Cyrl sr_Cyrl_BA sr_Cyrl_ME sr_Cyrl_RS sr_Latn_BA sr_Latn_ME sr_Latn_RS sv_FI sv_SE sw sw_KE sw_TZ ta_IN ta_LK te_IN th_TH tr_TR uk_UA ur_IN ur_PK vi_VN yo_NG zh_Hans zh_Hans_CN zh_Hans_SG zh_Hant_HK zh_Hant_MO zh_Hant_TW zu zu_ZA

Hope this helps.

All of this data comes from Unicode CLDR.

like image 150
Steven R. Loomis Avatar answered Oct 28 '22 22:10

Steven R. Loomis