Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to unescape unicode string in C#

Tags:

c#

unicode

I have a Unicode string from a text file such that. And I want to display the real character.

For example:

\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b

When read this string from text file, using StreamReader.ReadToLine(), it escape the \ to '\\' such as "\\u8ba1", which is not wanted.

It will display the Unicode string same as from text. Which I want is to display the real character.

  1. How can change the "\\u8ba1" to "\u8ba1" in the result string.
  2. Or should use another Reader to read the string?
like image 705
Hyzups Avatar asked Dec 19 '11 08:12

Hyzups


2 Answers

If you have a string like

var input1 = "\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";

// input1 == "计算机•网络•技术类"

you don't need to unescape anything. It's just the string literal that contains the escape sequences, not the string itself.


If you have a string like

var input2 = @"\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";

you can unescape it using the following regex:

var result = Regex.Replace(
    input2,
    @"\\[Uu]([0-9A-Fa-f]{4})",
    m => char.ToString(
        (char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));

// result == "计算机•网络•技术类"
like image 164
dtb Avatar answered Oct 17 '22 13:10

dtb


This question came out in the first result when googling, but I thought there should be a simpler way... this is what I ended up using:

using System.Text.RegularExpressions;

//...

var str = "Ingl\\u00e9s";
var converted = Regex.Unescape(str);
Console.WriteLine($"{converted} {str != converted}"); // Inglés True
like image 41
rraallvv Avatar answered Oct 17 '22 14:10

rraallvv