There are two convenience interfaces declared in header file locale : std::wstring_convert
and std::wbuffer_convert
. However, the usage examples are absent.
Are there any concise examples to illustrate their usages and differences?
std::wstring_convert
Given an std::u32string
(a.k.a. std::basic_string<char32_t>
) that holds UTF-32 code units in the form of char32_t
elements, here's how to convert it to a sequence of UTF-8 code units in the form of bytes:
// Both <locale> and <codecvt> required
std::u32string input = U"Hello, World";
using Codecvt = std::codecvt_utf8<char32_t>;
std::wstring_convert<Codecvt, char32_t> converter;
// throws std::range_error if the conversion fails
std::string result = converter.to_bytes(input);
Take note that a quirk of std::wstring_convert
is that is always converts what the Standard calls a wide string (which is in fact any kind of specialization of std::basic_string
, including std::string
) to or from a byte string, which is a specialization of the form std::basic_string<char, std::char_traits<char>, Allocator>
.
What the source and target encodings will be depends on what code conversion facet is used -- here I am using one of the stock facets that come from <codecvt>
. Any code conversion facet will do as long as it is Destructible, which is not the case for e.g. std::codecvt<wchar_t>
-- it has a protected destructor.
std::wbuffer_convert
Here's a hopefully compelling use case: you have an out
object which is an instance of std::ostream
(a.k.a std::basic_ostream<char>
) that expects UTF-8 encoded text. So for instance out << u8"Hello"
should work just fine. As it so happens though, you have a lot of UTF-32 encoded wide-strings (best candidate for that job would be std::u32string
) coming from somewhere else in your program, which you need to pass to out
. You could use std::wstring_convert
repeatedly, but that can get old quickly.
Here's another way:
std::wbuffer<std::codecvt_utf8<char32_t>, char32_t> wout { out.rdbuf() };
std::u32string input = U"Hello";
wout << input;
That is, we can get a view of out
that behaves as if it were an instance of std::basic_stream<char32_t>
and that expects UTF-32 encoded text, and we didn't alter locales (that last bit being a big reason those convenience interfaces exist in the first place).
I'd like to think that std::wbuffer_convert
is complementary to std::wstring_convert
rather than a competitor.
As a disclaimer, because I haven't laid my hands on an implementation that supports either of those features or <codecvt>
, the code here is completely untested :(.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With