Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux vs. Windows: How does the console render unicode characters?

This is quite a low-level (low in the sense of "closer to the metal") question.

I was wondering if any of you could point me to documentation, explanations, etc. of how, upon receiving a Unicode character (or any character code, but I'm particularly interested in the Unicode Standard) the console in Windows, good ol' cmd.exe (using, say, codepage 65001) and xterm in Linux started with, say, LC_CTYPE=en_US.UTF-8 look up the corresponding glyph (and where).

I know it may be harder to know in Windows, but I can't really find much information.

Thank you.

like image 763
Dervin Thunk Avatar asked Feb 01 '26 12:02

Dervin Thunk


1 Answers

As far as I can tell, cmd.exe is bound to whatever 256-character code page you defined as the "codepage for non-Unicode programs" or whatever it was called.

To elaborate, if I set the above setting to Japanese, cmd.exe suddenly replaces backslashes with yen signs (as does every other non-Unicode app on the system) and correctly interprets ShiftJIS codes, for example. Setting it to Dutch gives me an accented I (I forgot which), while another codepage would give a half-filled vertical solid instead on the same character.

Not Unicode. Unicode would let me do all three at the same time.

like image 53
Kawa Avatar answered Feb 04 '26 00:02

Kawa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!