Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

Q: What is string utf8?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”

Tags:

c++

unicode

utf-8

wstring

utf-16

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that.

I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that information to my problem.

The program I'm working on displays data in a Windows GUI. That data is persisted as XML. We often transform that XML using XSLT into HTML or XSL:FO for reporting purposes.

My feeling based on what I have read is that the HTML should be encoded as UTF-8. I know very little about GUI development, but the little bit I have read indicates that the GUI stuff is all based on UTF-16 encoded strings.

I'm trying to understand where this leaves me. Say we decide that all of our persisted data should be UTF-8 encoded XML. Does this mean that in order to display persisted data in a UI component, I should really be performing some sort of explicit UTF-8 to UTF-16 transcoding process?

I suspect my explanation could use clarification, so I'll try to provide that if you have any questions.

480

asked Mar 27 '10 00:03

Dave

1 Answers

Windows from NT4 onwards is based on Unicode encoded strings, yes. Early versions were based on UCS-2, which is the predecessor of UTF-16, and thus does not support all of the characters that UTF-16 does. Later versions are based on UTF-16. Not all OSes are based on UTF-16/UCS-2, though. *nix systems, for instance, are based on UTF-8 instead.

UTF-8 is a very good choice for storing data persistently. It is a universally supported encoding in all Unicode environments, and it is a good balance between data size and loss-less data compatibility.

Yes, you would have to parse the XML, extract the necessary information from it, and decode and transform it into something the UI can use.

answered Oct 23 '22 11:10

Remy Lebeau

Related questions
                            
                                Is public usage of private typedef portable?
                            
                                ISO C++ forbids comparison between pointer and integer [-fpermissive]| [c++]
                            
                                expected identifier before string constant
                            
                                What is the theoretical reason for C++ dependency production not being automated?
                            
                                boost::filesystem::recursive_directory_iterator with filter
                            
                                SDL2 C++ Taking a screenshot
                            
                                When do I need to declare my own destructor?
                            
                                Why doesn't std::sort accept comparator by reference?
                            
                                How do I let vim wrap triple-slash comments?
                            
                                C++ Copy constructor gets called instead of initializer_list<>
                            
                                Is it correct to return null shared_ptr?
                            
                                Does alignas affect the value of sizeof?
                            
                                Why does "std::begin()" always return "const_iterator" in such a case?
                            
                                C++ error: definition of implicitly-declared
                            
                                Nested list (vector of vectors of strings) initialization fails
                            
                                What's the difference between an ordinary rvalue reference and one returned by std::forward?
                            
                                Three-way comparison operator with inconsistent ordering deduction
                            
                                Can`t really understand what the parameters for constructing tcp::resolver::query
                            
                                Using static variable along with templates
                            
                                Colors in C++ win32 console

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With