Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++20 with u8, char8_t and std::string

C++11 brought us the u8 prefix for UTF-8 literals and I thought that was pretty cool a few years ago and peppered my code with things like this:

std::string myString = u8"●"; 

This is all fine and good, but the issue comes up in C++20 it doesn't seem to compile anymore because u8 creates a char8_t* and this is incompatible with std::string which just uses char.

Should I be creating a new utf8string? What's the consistent and correct way to do this kind of thing in a C++20 world where we have more explicit types that don't really match with the standard std::string?

like image 202
M2tM Avatar asked Jul 01 '19 09:07

M2tM


People also ask

Can you use std::string in C?

The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.

What does std::string () do?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.

What is the difference between a C++ string and an std::string?

There is no functionality difference between string and std::string because they're the same type.

Is std::string utf8?

Both std::string and std::wstring must use UTF encoding to represent Unicode. On macOS specifically, std::string is UTF-8 (8-bit code units), and std::wstring is UTF-32 (32-bit code units); note that the size of wchar_t is platform-dependent.


1 Answers

In addition to @lubgr's answer, the paper char8_t backward compatibility remediation (P1423) discusses several ways how to make std::string with char8_t character arrays.

Basically the idea is that you can cast the u8 char array into a "normal" char array to get the same behaviour as C++17 and before, you just have to be a bit more explicit. The paper discusses various ways to do this.

The most simple (but not fully zero overhead, unless you add more overloads) method that fits your usecase is probably the last one, i.e. introduce explicit conversion functions:

std::string from_u8string(const std::string &s) {   return s; } std::string from_u8string(std::string &&s) {   return std::move(s); } #if defined(__cpp_lib_char8_t) std::string from_u8string(const std::u8string &s) {   return std::string(s.begin(), s.end()); } #endif 
like image 151
Fabio Fracassi Avatar answered Oct 04 '22 17:10

Fabio Fracassi