Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to decode string contain \uXXXX in c# properly? [duplicate]

Tags:

c#

.net

We have one text file which has the following text

"\u5b89\u5fbd\u5b5f\u5143"

When we read the file content in C# .NET it shows like:

"\\u5b89\\u5fbd\\u5b5f\\u5143"

Our decoder method is

public string Decoder(string value)
{
    Encoding enc = new UTF8Encoding();
    byte[] bytes = enc.GetBytes(value);
    return enc.GetString(bytes);
}

When I pass a hard coded value,

string Output=Decoder("\u5b89\u5fbd\u5b5f\u5143");

it works well, but when we use a variable value it is not working.

When we use the string this is what we get from the text file:

  value=(text file content)
  string Output=Decoder(value);

It returns the wrong output.

How can I fix this?

like image 965
PrateekSaluja Avatar asked Nov 22 '22 22:11

PrateekSaluja


2 Answers

Use the below code. This unescapes any escaped characters from the input string

Regex.Unescape(value);
like image 195
Sagar Avatar answered Jan 18 '23 10:01

Sagar


You could use a regular expression to parse the file:

private static Regex _regex = new Regex(@"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled);

public string Decoder(string value)
{
    return _regex.Replace(
        value,
        m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString()
    );
}

And then:

string data = Decoder(File.ReadAllText("test.txt"));
like image 25
Darin Dimitrov Avatar answered Jan 18 '23 12:01

Darin Dimitrov