I think that it's safe to say that C locales are universally recognized as a bad idea.
Writing an application that tries to parse or write text-based machine formats (which happens quite often) with C standard library functions gets near-impossible if you have to account for locale being set to anything different than "C"
. Since locale is normally per-process (and setlocale
is often not thread-safe), if you are writing a library or you have a multithreaded program it's not safe even to do setlocale(LC_ALL, "C")
and restore it after doing your stuff.
Now, for these reasons the rule is normally "avoid setlocale
, period"; but: we've been bitten several times in the past by the peculiar behavior of QCoreApplication
and derived classes; the documentation says:
On Unix/Linux Qt is configured to use the system locale settings by default. This can cause a conflict when using POSIX functions, for instance, when converting between data types such as floats and strings, since the notation may differ between locales. To get around this problem, call the POSIX function
setlocale(LC_NUMERIC,"C")
right after initializingQApplication
orQCoreApplication
to reset the locale that is used for number formatting to "C"-locale.
This behavior has been described in another question; my question is: what could be the rationale of this apparently foolish behavior? In particular, what's so peculiar about Unix and Linux that prompted such decision only on these platforms?
(Incidentally, will everything break if I just do setlocale(LC_ALL, "C");
after creating the QApplication
? If it's fine, why don't they just remove their setlocale(LC_ALL, "");
?)
From investigations through the Qt source code conducted by @Phil Armstrong and me (see the chat log), it seems that the setlocale
call is there since version 1 for several reasons:
QString
representation and the "local" 8 bit encoding (this is particularly critical for file paths).It's true that it already checks the LC_*
environment variables, as it does with QLocale
, but I suppose that it may be useful to have nl_langinfo
decode the current LC_CTYPE
if the application explicitly changed it (but to see if there is an explicit change, it has to start with system defaults).
It's interesting that they did a setlocale(LC_NUMERIC, "C")
immediately after the setlocale(LC_ALL, "")
, but this was removed in Qt 4.4. The rationale for this decision seems to lie in the task #132859 of the old Qt bugtracker (which moved between TrollTech, Nokia and QtSoftware.com before vanishing without leaving any track, not even in the Wayback Machine), and it's referenced in two bugs regarding this topic. I think that an authoritative answer on the topic was there, but I can't find a way to recover it.
My guess is that it introduced subtle bugs, since the environment seemed pristine, but it was in fact touched by the setlocale
call in all but the LC_NUMERIC
category (which is the most evident); probably they removed the call to make the locale setting more evident and have application developers act accordingly.
Qt calls setlocale(LC_ALL, "")
, because it's the right thing to do: Every standard Unix program from cat
on up calls setlocale(LC_ALL, "")
. The consequence of that call is that the program locale is set to that specified by the user. See the setlocale() manpage:
On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:
setlocale(LC_ALL, "");
after program initialization...
Given that Qt both generates text to be read by the user and parses input generated by the user, it would be very unfriendly to refuse to let the user communicate with the user in their own locale-specific ways. Hence the call to setlocale().
I would hope that being user friendly would be uncontroversial! The problem of course comes when you try to parse data files that were created by your program running under a different locale. Clearly, if you're using an ad-hoc text-based format with a parser based on sscanf and friends, rather than a specified data format with a "real" parser then this is a recipe for data corruption if done without consideration of the locale settings. The solution is to either a) use a real serialisation library that handles this stuff for you or b) set the locale to something specific ("C" perhaps) when writing and reading data.
If thread safety is an issue then on modern POSIX implementations (or any Linux system with GNU libc version >= 2.3, which is pretty much "all of them" at this point in time) you can call uselocale()
to set a thread-local locale for all I/O. Alternately you can call _l
versions of the usual functions that take a locale object as a supplementary argument.
Will everything break if you call setlocale(LC_ALL, "C");
? No, but the right thing is to let the user set the locale they prefer and either save your data in a well specified format or specify the locale in which your data is to be read and written at runtime.
What is so peculiar about POSIX systems (which includes the Unix/Linux systems you mention) is that the OS interface and the C interface are mixed up. The C setlocale
call in particular interferes with the OS.
On Windows, in comparison, the locale is explicitly a per-thread property (SetThreadLocale
), but more importantly, functions such as GetNumberFormat
accept a locale parameter.
Note that your problem is fairly easily solved: When using Qt, use Qt. So that means reading your text input into a QString
, processing it, and then writing it back.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With