Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make console be able to print any of 65535 UNICODE characters

Tags:

c#

unicode

I am experimenting with unicode characters and taking unicode values from Wikipedia page

Ihe problem is my console displays all of C0 Controls and Basic Latin unicode characters ie from U+0000 to U+00FF but for all other categories like Latin Extended -B , Cyrillic , other languges etc , the console prints question mark character (?) .

My C# code is

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DataTypes
{
    class Program
    {
        static void Main(string[] args)
        {

            char ch = '\u0181';



            Console.WriteLine("the unicode character is  value" + ch);

        }
    }
}

I am working on windows 7 , Visual studio 2010. What should i do to increase Unicode support.

like image 241
Mudassir Hasan Avatar asked Oct 04 '12 07:10

Mudassir Hasan


1 Answers

There's a lot of history behind that question, I'll noodle about it for a while first. Console mode apps can only operate with an 8-bit text encoding. This goes back to a design decision made 42 years ago by Ken Thompson et al when they designed Unix. A core feature of Unix that terminal I/O was done through pipes and you could chain pipes together to feed the output of one program to the input of another. This feature was also implemented in Windows and is supported by .NET as well with the ProcessStartInfo.RedirectStandardXxxx properties.

Nice feature but that became a problem when operating systems started to adopt Unicode. Windows NT was the first one that was fully Unicode at its core. Unicode characters must always be encoded, a common choice back then was UCS, later morphed into utf-16. Now there's a problem with I/O redirection, a program that spits out 16-bit encoded characters is not going to operate well when it is redirected to a program that still uses 8-bit encoded characters.

Credit Ken Thompson as well with finding a solution for this problem, he invented utf-8 encoding.

That works in Windows as well. Easy to do in a console mode app, you have to re-assign the Console.OutputEncoding property:

using System;
using System.Text;

class Program {
    static void Main(string[] args) {
        Console.OutputEncoding = Encoding.UTF8;
        Console.WriteLine("Ĥėļŀō ŵŏŗłđ");
        Console.ReadLine();
    }
}

You'll now however encounter another problem, the font selected for the console window is likely to be unable to render the text. Press Alt+Space to invoke the system menu, Properties, Font tab. You'll need to pick a non-raster font. Pickings are very slim, on Vista and up you can choose Consolas. Re-run your program and the accented characters should render properly. Unfortunately, forcing the console font programmatically is a problem, you'll need to document this configuration step. In addition, a font like Consolas doesn't have the full set of possible Unicode glyphs. You are likely to see rectangles appear for Unicode codepoints for which it has no glyphs. All an unsubtle reminder that creating a GUI program is really your best bet.

like image 188
Hans Passant Avatar answered Sep 24 '22 06:09

Hans Passant