Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between std::wstring_convert and std::wbuffer_convert?

There are two convenience interfaces declared in header file locale : std::wstring_convert and std::wbuffer_convert. However, the usage examples are absent.

Are there any concise examples to illustrate their usages and differences?

like image 406
xmllmx Avatar asked Feb 11 '13 05:02

xmllmx


1 Answers

std::wstring_convert

Given an std::u32string (a.k.a. std::basic_string<char32_t>) that holds UTF-32 code units in the form of char32_t elements, here's how to convert it to a sequence of UTF-8 code units in the form of bytes:

// Both <locale> and <codecvt> required

std::u32string input = U"Hello, World";

using Codecvt = std::codecvt_utf8<char32_t>;
std::wstring_convert<Codecvt, char32_t> converter;

// throws std::range_error if the conversion fails
std::string result = converter.to_bytes(input);

Take note that a quirk of std::wstring_convert is that is always converts what the Standard calls a wide string (which is in fact any kind of specialization of std::basic_string, including std::string) to or from a byte string, which is a specialization of the form std::basic_string<char, std::char_traits<char>, Allocator>.

What the source and target encodings will be depends on what code conversion facet is used -- here I am using one of the stock facets that come from <codecvt>. Any code conversion facet will do as long as it is Destructible, which is not the case for e.g. std::codecvt<wchar_t> -- it has a protected destructor.

std::wbuffer_convert

Here's a hopefully compelling use case: you have an out object which is an instance of std::ostream (a.k.a std::basic_ostream<char>) that expects UTF-8 encoded text. So for instance out << u8"Hello" should work just fine. As it so happens though, you have a lot of UTF-32 encoded wide-strings (best candidate for that job would be std::u32string) coming from somewhere else in your program, which you need to pass to out. You could use std::wstring_convert repeatedly, but that can get old quickly.

Here's another way:

std::wbuffer<std::codecvt_utf8<char32_t>, char32_t> wout { out.rdbuf() };
std::u32string input = U"Hello";
wout << input;

That is, we can get a view of out that behaves as if it were an instance of std::basic_stream<char32_t> and that expects UTF-32 encoded text, and we didn't alter locales (that last bit being a big reason those convenience interfaces exist in the first place).

I'd like to think that std::wbuffer_convert is complementary to std::wstring_convert rather than a competitor.

As a disclaimer, because I haven't laid my hands on an implementation that supports either of those features or <codecvt>, the code here is completely untested :(.

like image 87
Luc Danton Avatar answered Oct 15 '22 00:10

Luc Danton