C++20 introduced char8_t and accordingly u8string, u8string_view etc. for mainly supporting cleaner interfaces and better differentiation between narrow execution and utf-8 character sets.
One of the downsides is that the old code might not work any more.
Say I have interfaces which work with utf-8 encoded std::string / std::string_view (from C++17).
If I want to adopt the implementation to the C++20 using std::u8string / std::u8string_view but leave the interfaces at the moment to the std::string, the easiest way to convert back and fort between string/string_view and u8string/u8string_view would be using reinterpret_cast, for ex:
#include <iostream>
#include <string>
#include <windows.h>
using namespace std;
int main()
{
SetConsoleOutputCP(CP_UTF8);
u8string u8s = u8"ä";
// string s = u8"ä"; OK in C++17, NOK in C++20
string s(reinterpret_cast<const char*>(u8s.c_str()));
// or string s(u8s.cbegin(), u8s.cend());
cout << s << endl;
u8string u8s2(reinterpret_cast<const char8_t*>(s.c_str()));
// or u8string u8s2(s.begin(), s.end())
// string_view
u8string_view u8sv = u8"ö"sv;
string_view sv(reinterpret_cast<const char*>(u8sv.data()), u8sv.size());
cout << sv << endl;
}
Do you see some problem with this approach, or have some better suggestion ?
char8_t has the same size and alignment as char, and is implicitly convertible.
Instead of casts and c_str(), just use the iterator constructor.
u8string u8s = u8"test";
string s(u8s.cbegin(), u8s.cend());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With