Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the tradeoffs between boost::locale and std::locale?

I am in the process of internationalizing a large legacy codebase in C++, and I am faced with a difficult decision: should I use boost::locale's or std c++ locales?

I am commited to using utf-8. We have to do a reasonably broad range of text processing, although it is not the core of what our code does, it is important. We can expect to do most of what one might need to do: time, date, number, and money formatting, collation, regexp, substring isolation, interaction with boost::filesystem, DB access, etc.

The introduction to boost::locale I get that

  1. Setting the global locale has side effects (csv example). It affects printf and boolst lexical_cast. Some third party libraries can break.
  2. Number formatting is broken on some locale's.
  3. Locale names are not standardized.
  4. Many vendors only provide C and POSIX, so GCC supports localization only under Linux.

I have trouble evaluating the impact of point 1 I guess point 2 is pretty severe if it affects us, ad 3 and 4 won't be a big deal for us.

Is there a consensus in the community that Boost::locale is the better alternative? Is there any motion in the standard commity to address the issues with std::locale's? Can anyone help me make a more informed decision?

Perhaps most importantly, is it simple to migrate from one to the other? How well do the two play with one another? Is it legitimate to set the global locale with a boost locale, and then use std facilities?

like image 306
Spacemoose Avatar asked Aug 06 '15 15:08

Spacemoose


Video Answer


2 Answers

In the end, the boost documentation does a good job of answering my question, but you have to do some reading, and it helps to understand std::locale better than I did at the time of posting.

Plays nicely with the std

A std::locale is a collection of facets. The standard defines a set of facets which each locale must provide, but other than that it seems most is left to the implementation. This includes locale behavior, and the names of the locales.

What boost::locale does is provide a bunch of facets, collected into locales, that behave the same way regardless of platform (at least if you are using the default ICU backend).

So boost::locale provides a standardized set of std::locale's which can behave consistently across platforms, provides full Unicode support for a wide range of cultural norms, and with consistent naming. Switching between use of a non boost std::locale (i.e. an implementation provided locale) and a boost::locale is trivial since they are the same types -- both are collections of std::facets, although implementations are different. Chances are the boost::locales do a better job of doing what you want.

Complete Unicode support, for all encodings, on all platforms
Further, boost::locale provides a way of accessing complete unicode support through ICU, which allows you to gain the benefits of ICU, without the poor (not C++ish) interface of ICU.

This is advantageous, since any standard support of Unicode is very likely to come through the locale frameork, and any unicode aware program is likely going to need to locale aware as well (for collation for example).

Saner behavior regarding numbers Finally, boost::locale addresses what could legitimately be called a significant flaw in the usual implementations of the std::locales -- any stream formatted number will be affected by locale, regardless of whether this is desirable -- see the boost documentation for a detailed discussion.

So if you are using an ofstream to read or write a file, and you have set the globale locale to your platform's german locale, you'll have commas separating the decimal part of your floats. If you're reading/writing a csv file, that might be a problem. If you used a boost::locale as your global locale, this will only happen if you explicitly tell it to use locale conventions for your numeric input/output. Note that many libraries use locale info in the background, including boost::lexical_cast. So does std::to_string, for that matter. So consider the following example:

std::locale::global(std::locale("de_DE"));

auto demo = [](const std::string& label)
{
    std::cout.imbue(std::locale()); // imbue cout with the global locale.
    float f = 1234.567890;
    std::cout << label << "\n";
    std::cout << "\t streamed:  " << f << "\n";
    std::cout << "\t to_string: " << std::to_string(f) << "\n";
};

std::locale::global(std::locale("C"));//default.
demo("c locale");

std::locale::global(std::locale("de_DE"));//default.
demo("std de locale");

boost::locale::generator gen;
std::locale::global(gen("de_DE.UTF-8"));
demo("boost de locale");

Gives the following output:

c locale
     streamed:  1234.57
     to_string: 1234.567871
std de locale
     streamed:  1.234,57
     to_string: 1234,567871
boost de locale
     streamed:  1234.57
     to_string: 1234,567871

In code that implements both human communication (output to gui or terminal) and inter-machine communication (csv files, xml, etc) this is likely undesireable behavior. When using a boost locale, you explicitly specify when you want locale formatting, ala:

cout << boost::locale::as::currency << 123.45 << "\n";
cout << boost::locale::as::number << 12345.666 << "\n"

Conclusion

It would seem that boost::locale's should be preferred over the system provided locales.

like image 50
Spacemoose Avatar answered Oct 11 '22 12:10

Spacemoose


Boost.Locale is based on std::locale framework but provides much more options in more linguistically correct way.

Also if you want to use utf-8 on windows/MSVC, std::locale is no-go.

like image 32
Artyom Avatar answered Oct 11 '22 14:10

Artyom