Why does the following program
#include <stdio.h> #include <wchar.h> int main() { wprintf(L"Привет, мир!"); }
print "Privet, mir!" on Linux? Specifically, why does it transliterate Russian text in Unicode into Latin as opposed to transcoding it into UTF-8 or using replacement characters?
Demonstration of this behavior on Godbolt: https://godbolt.org/z/36zEcG
The non-wide version printf("Привет, мир!")
prints this text as expected ("Привет, мир!").
Because conversion of wide characters is done according to the currently set locale. By default a C program always starts with a "C" locale which only supports ASCII characters.
You have to switch to any Russian or UTF-8 locale first:
setlocale(LC_ALL, "ru_RU.utf8"); // Russian Unicode setlocale(LC_ALL, "en_US.utf8"); // English US Unicode
Or to a current system locale (which is likely what you need):
setlocale(LC_ALL, "");
The full program will be:
#include <stdio.h> #include <wchar.h> #include <locale.h> int main() { setlocale(LC_ALL, "ru_RU.utf8"); wprintf(L"Привет, мир!\n"); }
As for your code working as-is on other machines - this is due to how libc operates there. Some implementations (like musl) do not support non-Unicode locales and thus can unconditionally translate wide characters to an UTF-8 sequence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With