Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that.

I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that information to my problem.

The program I'm working on displays data in a Windows GUI. That data is persisted as XML. We often transform that XML using XSLT into HTML or XSL:FO for reporting purposes.

My feeling based on what I have read is that the HTML should be encoded as UTF-8. I know very little about GUI development, but the little bit I have read indicates that the GUI stuff is all based on UTF-16 encoded strings.

I'm trying to understand where this leaves me. Say we decide that all of our persisted data should be UTF-8 encoded XML. Does this mean that in order to display persisted data in a UI component, I should really be performing some sort of explicit UTF-8 to UTF-16 transcoding process?

I suspect my explanation could use clarification, so I'll try to provide that if you have any questions.

like image 480
Dave Avatar asked Mar 27 '10 00:03

Dave


People also ask

What is the difference between string and Wstring?

String overview std::string is used for standard ascii and utf-8 strings. std::wstring is used for wide-character/unicode (utf-16) strings. There is no built-in class for utf-32 strings (though you should be able to extend your own from basic_string if you need one).

Does std::string support UTF-8?

UTF-8 actually works quite well in std::string . Most operations work out of the box because the UTF-8 encoding is self-synchronizing and backward compatible with ASCII.

What is the encoding of std::string?

std::string doesn't have the concept of encodings. It just stores whatever is passed to it. cout <<'è';

What is string utf8?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”


1 Answers

Windows from NT4 onwards is based on Unicode encoded strings, yes. Early versions were based on UCS-2, which is the predecessor of UTF-16, and thus does not support all of the characters that UTF-16 does. Later versions are based on UTF-16. Not all OSes are based on UTF-16/UCS-2, though. *nix systems, for instance, are based on UTF-8 instead.

UTF-8 is a very good choice for storing data persistently. It is a universally supported encoding in all Unicode environments, and it is a good balance between data size and loss-less data compatibility.

Yes, you would have to parse the XML, extract the necessary information from it, and decode and transform it into something the UI can use.

like image 55
Remy Lebeau Avatar answered Oct 23 '22 11:10

Remy Lebeau