Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an STL string class that properly handles Unicode?

I know all about std::string and std::wstring but they don't seem to fully pay attention to extended character encoding of UTF-8 and UTF-16 (On windows at least). There is also no support for UTF-32.

So does anyone know of cross-platform drop-in replacement classes that provide full UTF-8, UTF-16 and UTF-32 support?

like image 986
Goz Avatar asked Feb 01 '11 11:02

Goz


People also ask

Can std::string hold Unicode?

@MSalters: std::string can hold 100% of all Unicode characters, even if CHAR_BIT is 8. It depends on the encoding of std::string, which may be UTF-8 on the system level (like almost everywhere except for windows) or on your application level.

Does std::string support UTF-8?

std::string doesn't "use" any encoding, neither UTF-8 nor EBCDIC. std::string is just a container for bytes of types char . You can put UTF-8 strings in there, or ASCII strings, or EBCDIC strings, or even binary data.

Does C++ string support Unicode?

C++ provides a wide-character type, wchar_t , which can store Unicode strings. The exact implementation of wchar_t is implementation defined, but it is often UTF-32. The class wstring , defined in <string> , is a sequence of wchar_t s, just like the string class is a sequence of char s.

Should I use Wstring or string?

These are the two classes that you will actually use. std::string is used for standard ascii and utf-8 strings. std::wstring is used for wide-character/unicode (utf-16) strings. There is no built-in class for utf-32 strings (though you should be able to extend your own from basic_string if you need one).


1 Answers

And let's not forget the lightweight, very user-friendly, header-only UTF-8 library UTF8-CPP. Not a drop-in replacement, but can easily be used in conjunction with std::string and has no external dependencies.

like image 173
Jon Purdy Avatar answered Sep 20 '22 17:09

Jon Purdy