Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does QCoreApplication call `setlocale(LC_ALL, "")` by default on Unix/Linux?

I think that it's safe to say that C locales are universally recognized as a bad idea.

Writing an application that tries to parse or write text-based machine formats (which happens quite often) with C standard library functions gets near-impossible if you have to account for locale being set to anything different than "C". Since locale is normally per-process (and setlocale is often not thread-safe), if you are writing a library or you have a multithreaded program it's not safe even to do setlocale(LC_ALL, "C") and restore it after doing your stuff.

Now, for these reasons the rule is normally "avoid setlocale, period"; but: we've been bitten several times in the past by the peculiar behavior of QCoreApplication and derived classes; the documentation says:

On Unix/Linux Qt is configured to use the system locale settings by default. This can cause a conflict when using POSIX functions, for instance, when converting between data types such as floats and strings, since the notation may differ between locales. To get around this problem, call the POSIX function setlocale(LC_NUMERIC,"C") right after initializing QApplication or QCoreApplication to reset the locale that is used for number formatting to "C"-locale.

This behavior has been described in another question; my question is: what could be the rationale of this apparently foolish behavior? In particular, what's so peculiar about Unix and Linux that prompted such decision only on these platforms?

(Incidentally, will everything break if I just do setlocale(LC_ALL, "C"); after creating the QApplication? If it's fine, why don't they just remove their setlocale(LC_ALL, "");?)

like image 537
Matteo Italia Avatar asked Sep 04 '14 09:09

Matteo Italia


3 Answers

From investigations through the Qt source code conducted by @Phil Armstrong and me (see the chat log), it seems that the setlocale call is there since version 1 for several reasons:

  • XIM, at least in ancient times, didn't correctly "get" the current locale without such a call.
  • On Solaris, it even crashed with the default C locale.
  • On Unix systems, it's used (among other systems, in a complex game of fallbacks) to "sniff" the "system character set" (whatever that means on Unix), and thus be able to convert between the QString representation and the "local" 8 bit encoding (this is particularly critical for file paths).

It's true that it already checks the LC_* environment variables, as it does with QLocale, but I suppose that it may be useful to have nl_langinfo decode the current LC_CTYPE if the application explicitly changed it (but to see if there is an explicit change, it has to start with system defaults).

It's interesting that they did a setlocale(LC_NUMERIC, "C") immediately after the setlocale(LC_ALL, ""), but this was removed in Qt 4.4. The rationale for this decision seems to lie in the task #132859 of the old Qt bugtracker (which moved between TrollTech, Nokia and QtSoftware.com before vanishing without leaving any track, not even in the Wayback Machine), and it's referenced in two bugs regarding this topic. I think that an authoritative answer on the topic was there, but I can't find a way to recover it.

My guess is that it introduced subtle bugs, since the environment seemed pristine, but it was in fact touched by the setlocale call in all but the LC_NUMERIC category (which is the most evident); probably they removed the call to make the locale setting more evident and have application developers act accordingly.

like image 108
Matteo Italia Avatar answered Nov 01 '22 16:11

Matteo Italia


Qt calls setlocale(LC_ALL, ""), because it's the right thing to do: Every standard Unix program from cat on up calls setlocale(LC_ALL, ""). The consequence of that call is that the program locale is set to that specified by the user. See the setlocale() manpage:

On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:

setlocale(LC_ALL, "");

after program initialization...

Given that Qt both generates text to be read by the user and parses input generated by the user, it would be very unfriendly to refuse to let the user communicate with the user in their own locale-specific ways. Hence the call to setlocale().

I would hope that being user friendly would be uncontroversial! The problem of course comes when you try to parse data files that were created by your program running under a different locale. Clearly, if you're using an ad-hoc text-based format with a parser based on sscanf and friends, rather than a specified data format with a "real" parser then this is a recipe for data corruption if done without consideration of the locale settings. The solution is to either a) use a real serialisation library that handles this stuff for you or b) set the locale to something specific ("C" perhaps) when writing and reading data.

If thread safety is an issue then on modern POSIX implementations (or any Linux system with GNU libc version >= 2.3, which is pretty much "all of them" at this point in time) you can call uselocale() to set a thread-local locale for all I/O. Alternately you can call _l versions of the usual functions that take a locale object as a supplementary argument.

Will everything break if you call setlocale(LC_ALL, "C");? No, but the right thing is to let the user set the locale they prefer and either save your data in a well specified format or specify the locale in which your data is to be read and written at runtime.

like image 35
Phil Armstrong Avatar answered Nov 01 '22 17:11

Phil Armstrong


What is so peculiar about POSIX systems (which includes the Unix/Linux systems you mention) is that the OS interface and the C interface are mixed up. The C setlocale call in particular interferes with the OS.

On Windows, in comparison, the locale is explicitly a per-thread property (SetThreadLocale), but more importantly, functions such as GetNumberFormat accept a locale parameter.

Note that your problem is fairly easily solved: When using Qt, use Qt. So that means reading your text input into a QString, processing it, and then writing it back.

like image 3
MSalters Avatar answered Nov 01 '22 15:11

MSalters