Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between C.UTF-8 and en_US.UTF-8 locales?

I'm migrating a python application from an ubuntu server with locale en_US.UTF-8 to a new debian server which comes with C.UTF-8 already set by default. I'm trying to understand if there would be any impact but couldn't find good resources on the internet to understand the difference between both.

like image 797
Marcelo Avatar asked Apr 14 '19 09:04

Marcelo


People also ask

What does en_US UTF-8 mean?

The en_US. UTF-8 locale is a significant Unicode locale in the Solaris 8 product. It supports and provides multiscript processing capability by using UTF-8 as its codeset. It can input and output text in multiple scripts. This was the first locale with this capability in the Solaris operating environment.

What is c UTF-8?

C. utf8 = POSIX standards-compliant locale, extended to allow the basic use of UTF-8. No character upper-lower case relationships and collation orders defined beyond ASCII. (In other words: this sorts non-ASCII characters strictly according to their Unicode character encoding value.

What is en_US?

For example, an English-speaking user in the United States can select the en_US. UTF-8 locale (English for the United States), while an English-speaking user in Great Britain can select en_GB. UTF-8 (English for Great Britain). Generally the locale name is specified by the LANG environment variable.

What is en_ US UTF-8 in linux?

en_US. utf8 means an english language interface/localization with unicode support.


2 Answers

In general C is for computer, en_US is for people in US who speak English (and other people who want the same behaviour).

The for computer means that the strings are sometime more standardized (but still in English), so an output of a program could be read from an other program. With en_US, strings could be improved, alphabetic order could be improved (maybe by new rules of Chicago rules of style, etc.). So more user-friendly, but possibly less stable. Note: locales are not just for translation of strings, but also for collation (alphabetic order, numbers (e.g. thousand separator), currency (I think it is safe to predict that $ and 2 decimal digits will remain), months, day of weeks, etc.

In your case, it is just the UTF-8 version of both locales.

In general it should not matter. I usually prefer en_US.UTF-8, but usually it doesn't matter, and in your case (server app), it should only change log and error messages (if you use locale.setlocale(). You should handle client locales inside your app. Programs that read from other programs should set C before opening the pipe, so it should not really matter.

As you see, probably it doesn't matter. You may also use POSIX locale, also define in Debian. You get the list of installed locales with locale -a.

Note: Micro-optimization will prescribe C/C.UTF-8 locale: no translation of files (gettext), and simple rules on collation and number formatting, but this should visible only on server side.

like image 88
Giacomo Catenazzi Avatar answered Sep 19 '22 12:09

Giacomo Catenazzi


Here are some reasons why I added LC_TIME=C.UTF-8 in /etc/default/locale, in case it helps someone:

It provides a 24-hour clock instead of AM/PM in Firefox for HTML5 input type=time (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/time) and uses a datepicker in the format DD/MM/YYYY instead of MM/DD/YYYY for HTML5 input type=date (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/date).

It allows to use YYYY-MM-DD international date format (ISO 8601) with a 24-hour clock when replying to emails in Thunberbird.

Previously, it was possible with LC_TIME=en_DK.UTF-8 (http://kb.mozillazine.org/Date_display_format) but there is a bug currently and it stopped working (https://bugzilla.mozilla.org/show_bug.cgi?id=1426907#c155).

Edit: Now even the LC_TIME=C.UTF-8 workaround does not work for Thunberbird: https://bugzilla.mozilla.org/show_bug.cgi?id=1426907#c197

like image 37
baptx Avatar answered Sep 19 '22 12:09

baptx