As far as I know Linux chose backward compatibility of UTF-8, whereas Windows added completely new API functions for UTF-16 (ending with "W"). Could these decisions be different? Which one proved better?
UTF-16 is pretty much a loss, the worst of both worlds. It is neither compact (for the typical case of ASCII characters), nor does it map each code unit to a character. This hasn't really bitten anyone too badly yet since characters outside of the Basic Multilingual Plane are still rarely-used, but it sure is ugly.
POSIX (Linux et al) has some w
APIs too, based on the wchar_t
type. On platforms other than Windows this typically corresponds to UTF-32 rather than UTF-16. Which is nice for easy string manipulation, but is incredibly bloated.
But in-memory APIs aren't really that important. What causes much more difficulty is file storage and on-the-wire protocols, where data is exchanged between applications with different charset traditions.
Here, compactness beats ease-of-indexing; UTF-8 is clearly proven the best format for this by far, and Windows's poor support of UTF-8 causes real difficulties. Windows is the last modern operating system to still have locale-specific default encodings; everyone else has moved to UTF-8 by default.
Whilst I seriously hope Microsoft will reconsider this for future versions, as it causes huge and unnecessary problems even within the Windows-only world, it's understandable how it happened.
The thinking back in the old days when WinNT was being designed was that UCS-2 was it for Unicode. There wasn't going to be anything outside the 16-bit character range. Everyone would use UCS-2 in-memory and naturally it would be easiest to save this content directly from memory. This is why Windows called that format “Unicode”, and still to this day calls UTF-16LE just “Unicode” in UI like save boxes, despite it being totally misleading.
UTF-8 wasn't even standardised until Unicode 2.0 (along with the extended character range and the surrogates that made UTF-16 what it is today). By then Microsoft were on to WinNT4, at which point it was far too late to change strategy. In short, Microsoft were unlucky to be designing a new OS from scratch around the time when Unicode was in its infancy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With