Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++20 converting between string/u8string and string_view/u8string_view

Tags:

c++

c++17

c++20

C++20 introduced char8_t and accordingly u8string, u8string_view etc. for mainly supporting cleaner interfaces and better differentiation between narrow execution and utf-8 character sets.

One of the downsides is that the old code might not work any more.

Say I have interfaces which work with utf-8 encoded std::string / std::string_view (from C++17).

If I want to adopt the implementation to the C++20 using std::u8string / std::u8string_view but leave the interfaces at the moment to the std::string, the easiest way to convert back and fort between string/string_view and u8string/u8string_view would be using reinterpret_cast, for ex:

#include <iostream>
#include <string>
#include <windows.h>
using namespace std;

int main()
{
    SetConsoleOutputCP(CP_UTF8);

    u8string u8s = u8"ä";
    // string s = u8"ä"; OK in C++17, NOK in C++20
    string s(reinterpret_cast<const char*>(u8s.c_str()));
    // or string s(u8s.cbegin(), u8s.cend());
    cout << s << endl;
    u8string u8s2(reinterpret_cast<const char8_t*>(s.c_str()));
    // or u8string u8s2(s.begin(), s.end())

    // string_view
    u8string_view u8sv = u8"ö"sv;
    string_view sv(reinterpret_cast<const char*>(u8sv.data()), u8sv.size());
    cout << sv << endl;
}

Do you see some problem with this approach, or have some better suggestion ?

like image 267
StPiere Avatar asked Jun 26 '26 22:06

StPiere


1 Answers

char8_t has the same size and alignment as char, and is implicitly convertible.

Instead of casts and c_str(), just use the iterator constructor.

u8string u8s = u8"test";
string s(u8s.cbegin(), u8s.cend());
like image 158
Filipp Avatar answered Jun 28 '26 21:06

Filipp