Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading unicode from console

I am trying to read unicode string from a console in C#, for the sake of example, lets uset his one:

c:\SVN\D³ebugger\src\виталик\Program.cs

At first I just tried to Console.ReadLine() which returned me c:\SVN\D3ebugger\src\???????\Program.cs

I've tried to set the Console.InputEncoding to UTF8 like so Console.InputEncoding = Encoding.UTF8 but that returned me c:\SVN\D³ebugger\src\???????\Program.cs, basically mucking up the Cyrillic part of the string.

So randomly stumbling I've tried to set the encoding like that, Console.InputEncoding = Encoding.GetEncoding(1251); which returned c:\SVN\D?ebugger\src\виталик\Program.cs, this time corrupting the ³ character.

At this point it seems that by switching encodings for the InputStream I can only get a single language at a time.

I've also tried going native and doing something like that:

// Code
public static string ReadLine()
{
    const uint nNumberOfCharsToRead = 1024;
    StringBuilder buffer = new StringBuilder();

    uint charsRead = 0;
    bool result = ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, nNumberOfCharsToRead, out charsRead, (IntPtr)0);

    // Return the input minus the newline character
    if (result && charsRead > 1) return buffer.ToString(0, (int)charsRead - 1);
    return string.Empty;
}

// Extern definitions

    [DllImport("Kernel32.DLL", ExactSpelling = true)]
    internal static extern IntPtr GetStdHandle(int nStdHandle);

    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, ExactSpelling = true)]
    static extern bool ReadConsoleW(IntPtr hConsoleInput, [Out] StringBuilder lpBuffer, 
        uint nNumberOfCharsToRead, out uint lpNumberOfCharsRead, IntPtr lpReserved);

That was working fine for non-unicode strings, however, when I tried to make it read my sample string, the application crashed. I've tried to tell Visual Studio to break on ALL exception (including native ones), yet, the application would still crash.

I also found this open bug in Microsoft's Connect that seems to say that it is impossible right now to read Unicode from the console's InputStream.

It is worth noting, even though not strictly related to my question, that Console.WriteLine is able to print this string just fine, if Console.OutputEncoding is set to UTF8.

Thank you!

Update 1

I am looking for a solution for .NET 3.5

Update 2

Updated with the full native code I've used.

like image 366
VitalyB Avatar asked Feb 29 '12 16:02

VitalyB


2 Answers

This seems to work fine when targetting .NET 4 client profile, but unfortunately not when targetting .NET 3.5 client profile. Ensure you change the console font to Lucida Console.
As pointed out by @jcl, even though I have targetted .NET4, this is only because I have .NET 4.5 installed.

class Program
{
    private static void Main(string[] args)
    {
        Console.InputEncoding = Encoding.Unicode;
        Console.OutputEncoding = Encoding.Unicode;

        while (true)
        {
            string s = Console.ReadLine();

            if (!string.IsNullOrEmpty(s))
            {
                Debug.WriteLine(s);

                Console.WriteLine(s);
            }
        }
    }
}

enter image description here

like image 184
Phil Avatar answered Oct 17 '22 08:10

Phil


Here's one fully working version in .NET 3.5 Client:

class Program
{
  [DllImport("kernel32.dll", SetLastError = true)]
  static extern IntPtr GetStdHandle(int nStdHandle);

  [DllImport("kernel32.dll")]
  static extern bool ReadConsoleW(IntPtr hConsoleInput, [Out] byte[]
     lpBuffer, uint nNumberOfCharsToRead, out uint lpNumberOfCharsRead,
     IntPtr lpReserved);

  public static IntPtr GetWin32InputHandle()
  {
    const int STD_INPUT_HANDLE = -10;
    IntPtr inHandle = GetStdHandle(STD_INPUT_HANDLE);
    return inHandle;
  }

  public static string ReadLine()
  {
    const int bufferSize = 1024;
    var buffer = new byte[bufferSize];

    uint charsRead = 0;

    ReadConsoleW(GetWin32InputHandle(), buffer, bufferSize, out charsRead, (IntPtr)0);
    // -2 to remove ending \n\r
    int nc = ((int)charsRead - 2) * 2;
    var b = new byte[nc];
    for (var i = 0; i < nc; i++)
      b[i] = buffer[i];

    var utf8enc = Encoding.UTF8;
    var unicodeenc = Encoding.Unicode;
    return utf8enc.GetString(Encoding.Convert(unicodeenc, utf8enc, b));
  }

  static void Main(string[] args)
  {
    Console.OutputEncoding = Encoding.UTF8;
    Console.Write("Input: ");
    var st = ReadLine();
    Console.WriteLine("Output: {0}", st);
  }
}

enter image description here

like image 26
Jcl Avatar answered Oct 17 '22 10:10

Jcl