I've found answers to this question for many programming languages, except for C, using the Windows API. No C++ answers please. Consider the following:
#include <windows.h>
char *string = "The quick brown fox jumps over the lazy dog";
WCHAR unistring[strlen(string)+1];
What function can I use to fill unistring with the characters from string?
As far as I know, the standard C's char data type is ASCII, 1 byte (8 bits).
It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII.
UTF-8 is the only text encoding mandated to be supported by the C standard for which there is no distinctly named code unit type.
By default, C language only prints 8 Bit characters. Note: Unicode is not a function or method in C, so there is no specific syntax to it.
MultiByteToWideChar
:
#include <windows.h>
char *string = "The quick brown fox jumps over the lazy dog";
size_t len = strlen(string);
WCHAR unistring[len + 1];
int result = MultiByteToWideChar(CP_OEMCP, 0, string, -1, unistring, len + 1);
If you are really serious about Unicode, you should refer to International Components for Unicode, which is a cross-platform solution for handling Unicode conversions and storage in either C or C++.
Your WCHAR
, for example, is not Unicode to begin with, because Microsoft somewhat prematurely defined wchar_t
to be 16bit (UCS-2), and got stuck in backward compatibility hell when Unicode became 32bit: UCS-2 is almost, but not quite identical to UTF-16, the latter being in fact a multibyte encoding just like UTF-8. "Wide" format in Unicode means 32 bit (UTF-32), and even then you don't have a 1:1 relationship between code points (i.e. 32bit-values) and abstract characters (i.e. a printable glyph).
Gratuituous, losely related list of links:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With