Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What encoding does std::string.c_str() use?

Tags:

c++

string

utf-8

I am trying to convert a C++ std::string to UTF-8 or std::wstring without losing information (consider a string that contains non-ASCII characters).

According to http://forums.sun.com/thread.jspa?threadID=486770&forumID=31:

If the std::string has non-ASCII characters, you must provide a function that converts from your encoding to UTF-8 [...]

What encoding does std::string.c_str() use? How can I convert it to UTF-8 or std::wstring in a cross-platform fashion?

like image 805
Gili Avatar asked Jun 18 '09 04:06

Gili


People also ask

What is the encoding of std::string?

std::string doesn't have the concept of encodings. It just stores whatever is passed to it. cout <<'è';

Is std::string utf8?

Both std::string and std::wstring must use UTF encoding to represent Unicode. On macOS specifically, std::string is UTF-8 (8-bit code units), and std::wstring is UTF-32 (32-bit code units); note that the size of wchar_t is platform-dependent.

What is the type of c_str ()?

c_str() converts a C++ string into a C-style string which is essentially a null terminated array of bytes.

Is std::string ascii?

In short std::string can contain ASCII character encodings, as well as EBCDIC, or any others. But it should be transparent as how you're using it.


2 Answers

std::string per se uses no encoding -- it will return the bytes you put in it. For example, those bytes might be using ISO-8859-1 encoding... or any other, really: the information about the encoding is just not there -- you have to know where the bytes were coming from!

like image 162
Alex Martelli Avatar answered Sep 30 '22 21:09

Alex Martelli


std::string contains any sequence of bytes, so the encoding is up to you. You must know how it is encoded. However, if you don't know that it is something else, it's probably just ASCII. In which case, it's already UTF-8 compatible.

like image 42
Naaff Avatar answered Sep 30 '22 20:09

Naaff