My C++ project currently is about 16K lines of code big, and I admit having completely not thought about unicode support in the first place.
All I have done was a custom typedef for std::string
as String
and jump into coding.
I have never really worked with unicode myself in programs I wrote.
How hard is it to switch my project to unicode now? Is it even a good idea?
Can I just switch to std::wchar
without any major problems?
Probably the most important part of making an application unicode aware is to track the encoding of your strings and to make sure that your public interfaces are well specified and easy to use with the encodings that you wish to use.
Switching to a wider character (in c++ wchar_t
) is not necessarily the correct solution. In fact, I would say it is usually not the simplest solution. Some applications can get away with specifying that all strings and interfaces use UTF-8 and not need to change at all. std::string
can perfectly well be used for UTF-8 encoded strings.
However, if you need to interpret the characters in a string or interface with non-UTF-8 interfaces then you will have to put more work in but without knowing more about your application it is impossible to recommend a single best approach.
There are some issues with using std::wstring
. If your application will be storing text in Unicode, and it will be running on different platforms, you may run into trouble. std::wstring
relies on wchar_t
, which is compiler dependent. In Microsoft Visual C++, this type is 16 bits wide, and will thus only support UTF-16 encodings. The GNU C++ compiler specifes this type to be 32 bits wide, and will thus only support UTF-32 encodings. If you then store the text in a file from one system (say Windows/VC++), and then read the file from another system (Linux/GCC), you will have to prepare for this (in this case convert from UTF-16 to UTF-32).
Can I just switch to [
std::wchar_t
] without any major problems?
No, it's not that simple.
wchar_t
string is platform-dependent. Windows uses UTF-16. Linux usually uses UTF-32. (C++0x will mitigate this difference by introducing separate char16_t
and char32_t
types.)_wfopen
, etc.wchar_t
?char
-based encoding) than UTF-16/32. You'd have to convert these.char
with wchar_t
because C++ confounds "character" and "byte", and you have to determine which char
s are characters and which char
s are bytes.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With