I am trying to convert a C++ std::string
to UTF-8 or std::wstring
without losing information (consider a string that contains non-ASCII characters).
According to http://forums.sun.com/thread.jspa?threadID=486770&forumID=31:
If the std::string has non-ASCII characters, you must provide a function that converts from your encoding to UTF-8 [...]
What encoding does std::string.c_str()
use? How can I convert it to UTF-8 or std::wstring
in a cross-platform fashion?
std::string doesn't have the concept of encodings. It just stores whatever is passed to it. cout <<'è';
Both std::string and std::wstring must use UTF encoding to represent Unicode. On macOS specifically, std::string is UTF-8 (8-bit code units), and std::wstring is UTF-32 (32-bit code units); note that the size of wchar_t is platform-dependent.
c_str() converts a C++ string into a C-style string which is essentially a null terminated array of bytes.
In short std::string can contain ASCII character encodings, as well as EBCDIC, or any others. But it should be transparent as how you're using it.
std::string
per se uses no encoding -- it will return the bytes you put in it. For example, those bytes might be using ISO-8859-1 encoding... or any other, really: the information about the encoding is just not there -- you have to know where the bytes were coming from!
std::string
contains any sequence of bytes, so the encoding is up to you. You must know how it is encoded. However, if you don't know that it is something else, it's probably just ASCII. In which case, it's already UTF-8 compatible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With