Linux vs. Windows: How does the console render unicode characters?

Question

This is quite a low-level (low in the sense of "closer to the metal") question.

I was wondering if any of you could point me to documentation, explanations, etc. of how, upon receiving a Unicode character (or any character code, but I'm particularly interested in the Unicode Standard) the console in Windows, good ol' cmd.exe (using, say, codepage 65001) and xterm in Linux started with, say, LC_CTYPE=en_US.UTF-8 look up the corresponding glyph (and where).

I know it may be harder to know in Windows, but I can't really find much information.

Thank you.

Kawa · Accepted Answer

As far as I can tell, cmd.exe is bound to whatever 256-character code page you defined as the "codepage for non-Unicode programs" or whatever it was called.

To elaborate, if I set the above setting to Japanese, cmd.exe suddenly replaces backslashes with yen signs (as does every other non-Unicode app on the system) and correctly interprets ShiftJIS codes, for example. Setting it to Dutch gives me an accented I (I forgot which), while another codepage would give a half-filled vertical solid instead on the same character.

Not Unicode. Unicode would let me do all three at the same time.

Linux vs. Windows: How does the console render unicode characters?

Tags:

linux

windows

encoding

unicode

Dervin Thunk

1 Answers

Kawa

Recent Activity

Donate For Us

Linux vs. Windows: How does the console render unicode characters?

Tags:

linux

windows

encoding

unicode

Dervin Thunk

1 Answers

Kawa

Related questions

Recent Activity

Donate For Us