Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between "UTF-16" and "std::wstring"?

Is there any difference between these two string storage formats?

like image 566
hkBattousai Avatar asked Nov 22 '10 15:11

hkBattousai


3 Answers

std::wstring is a container of wchar_t. The size of wchar_t is not specified—Windows compilers tend to use a 16-bit type, Unix compilers a 32-bit type.

UTF-16 is a way of encoding sequences of Unicode code points in sequences of 16-bit integers.

Using Visual Studio, if you use wide character literals (e.g. L"Hello World") that contain no characters outside of the BMP, you'll end up with UTF-16, but mostly the two concepts are unrelated. If you use characters outside the BMP, std::wstring will not translate surrogate pairs into Unicode code points for you, even if wchar_t is 16 bits.

like image 143
JoeG Avatar answered Oct 12 '22 22:10

JoeG


UTF-16 is a specific Unicode encoding. std::wstring is a string implementation that uses wchar_t as its underlying type for storing each character. (In contrast, regular std::string uses char).

The encoding used with wchar_t does not necessarily have to be UTF-16—it could also be UTF-32 for example.

like image 24
ThiefMaster Avatar answered Oct 12 '22 23:10

ThiefMaster


UTF-16 is a concept of text represented in 16-bit elements but an actual textual character may consist of more than one element

std::wstring is just a collection of these elements, and is a class primarily concerned with their storage.

The elements in a wstring, wchar_t is at least 16-bits but could be 32 bits.

like image 36
CashCow Avatar answered Oct 13 '22 00:10

CashCow