I am in the process of internationalizing a large legacy codebase in C++, and I am faced with a difficult decision: should I use boost::locale's or std c++ locales?
I am commited to using utf-8. We have to do a reasonably broad range of text processing, although it is not the core of what our code does, it is important. We can expect to do most of what one might need to do: time, date, number, and money formatting, collation, regexp, substring isolation, interaction with boost::filesystem, DB access, etc.
The introduction to boost::locale I get that
I have trouble evaluating the impact of point 1 I guess point 2 is pretty severe if it affects us, ad 3 and 4 won't be a big deal for us.
Is there a consensus in the community that Boost::locale is the better alternative? Is there any motion in the standard commity to address the issues with std::locale's? Can anyone help me make a more informed decision?
Perhaps most importantly, is it simple to migrate from one to the other? How well do the two play with one another? Is it legitimate to set the global locale with a boost locale, and then use std facilities?
In the end, the boost documentation does a good job of answering my question, but you have to do some reading, and it helps to understand std::locale
better than I did at the time of posting.
Plays nicely with the std
A std::locale
is a collection of facet
s. The standard defines a set of facets which each locale must provide, but other than that it seems most is left to the implementation. This includes locale behavior, and the names of the locales.
What boost::locale does is provide a bunch of facets, collected into locales, that behave the same way regardless of platform (at least if you are using the default ICU backend).
So boost::locale
provides a standardized set of std::locale's which can behave consistently across platforms, provides full Unicode support for a wide range of cultural norms, and with consistent naming. Switching between use of a non boost std::locale
(i.e. an implementation provided locale) and a boost::locale
is trivial since they are the same types -- both are collections of std::facets
, although implementations are different. Chances are the boost::locale
s do a better job of doing what you want.
Complete Unicode support, for all encodings, on all platforms
Further, boost::locale
provides a way of accessing complete unicode support through ICU, which allows you to gain the benefits of ICU, without the poor (not C++ish) interface of ICU.
This is advantageous, since any standard support of Unicode is very likely to come through the locale frameork, and any unicode aware program is likely going to need to locale aware as well (for collation for example).
Saner behavior regarding numbers
Finally, boost::locale
addresses what could legitimately be called a significant flaw in the usual implementations of the std::locales -- any stream formatted number will be affected by locale, regardless of whether this is desirable -- see the boost documentation for a detailed discussion.
So if you are using an ofstream to read or write a file, and you have set the globale locale
to your platform's german locale, you'll have commas separating the decimal part of your floats. If you're reading/writing a csv file, that might be a problem. If you used a boost::locale
as your global locale, this will only happen if you explicitly tell it to use locale conventions for your numeric input/output. Note that many libraries use locale info in the background, including boost::lexical_cast. So does std::to_string, for that matter. So consider the following example:
std::locale::global(std::locale("de_DE"));
auto demo = [](const std::string& label)
{
std::cout.imbue(std::locale()); // imbue cout with the global locale.
float f = 1234.567890;
std::cout << label << "\n";
std::cout << "\t streamed: " << f << "\n";
std::cout << "\t to_string: " << std::to_string(f) << "\n";
};
std::locale::global(std::locale("C"));//default.
demo("c locale");
std::locale::global(std::locale("de_DE"));//default.
demo("std de locale");
boost::locale::generator gen;
std::locale::global(gen("de_DE.UTF-8"));
demo("boost de locale");
Gives the following output:
c locale
streamed: 1234.57
to_string: 1234.567871
std de locale
streamed: 1.234,57
to_string: 1234,567871
boost de locale
streamed: 1234.57
to_string: 1234,567871
In code that implements both human communication (output to gui or terminal) and inter-machine communication (csv files, xml, etc) this is likely undesireable behavior. When using a boost locale, you explicitly specify when you want locale formatting, ala:
cout << boost::locale::as::currency << 123.45 << "\n";
cout << boost::locale::as::number << 12345.666 << "\n"
Conclusion
It would seem that boost::locale's should be preferred over the system provided locales.
Boost.Locale is based on std::locale framework but provides much more options in more linguistically correct way.
Also if you want to use utf-8 on windows/MSVC, std::locale is no-go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With