Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# UTF8 Reading/Outputting

Tags:

c#

.net-3.5

utf-8

I'm trying to do something that I think should be fairly simple but I've spent way too much time on it already and I've tried several different approaches that I researched but to no avail.

Basically, I have a huge list of names that have "special" characters in them from the UTF8 charset.

My end goal is to read in each name, and then make an HTTP request using that name in the URL as a GET variable.

My first goal was to read in one name from a file, and put it to standard out to confirm I could read and write UTF8 properly, before creating the strings and make all the HTTP requests.

The test1.txt file I made contained just this contents:

Öwnägé

I then used this C# code to read in the file. I set the StreamReader encoding and the Console.OutputEncoding to UTF8.

static void Main(string[] args)
{
    Console.OutputEncoding = System.Text.Encoding.UTF8;

    using (StreamReader reader = new StreamReader("test1.txt",System.Text.Encoding.UTF8))
    {
        string line;

        while ((line = reader.ReadLine()) != null)
        {
            Console.WriteLine(line);
        }

    }

    Console.ReadLine();
}

Much to my surprise I get this kind of output:

enter image description here

Expected output is the exact same as the original file contents.

How can I be certain that the strings I am going to build to make HTTP requests are going to be correct if I cannot even do a simple task as read/write UTF8 strings?

like image 559
user17753 Avatar asked Mar 06 '12 15:03

user17753


1 Answers

Your program is fine (assuming the input file is actually UTF-8). If you debug your program and use the Watch window to look at the strings (the line variable), you will find that it is correct. That is how you can be certain that you will send correct HTTP requests (or whatever else you do with the strings).

What you’re seeing is a bug in the Windows console.

Fortunately, it only affects raster fonts. If you change your console window to use a TrueType font, e.g. Consolas or Lucida Console, the problem goes away.

screenshot

You can set this for all future windows by using the “Defaults” menu item:

screenshot

like image 101
Timwi Avatar answered Nov 04 '22 08:11

Timwi