Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dotnet core System.Text.Json unescape unicode string

Using JsonSerializer.Serialize(obj) will produce an escaped string, but I want the unescaped version. For example:

using System;
using System.Text.Json;

public class Program
{
    public static void Main()
    {
        var a = new A{Name = "你好"};
        var s = JsonSerializer.Serialize(a);
        Console.WriteLine(s);
    }
}

class A {
    public string Name {get; set;}
}

will produce a string {"Name":"\u4F60\u597D"} but I want {"Name":"你好"}

I created a code snippet at https://dotnetfiddle.net/w73vnO
Please help me.

like image 628
Joey Avatar asked Sep 19 '19 03:09

Joey


4 Answers

You need to set the JsonSerializer options not to encode those strings.

JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;

Then you pass this options when you call your Serialize method.

var s = JsonSerializer.Serialize(a, jso);        

Full code:

JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;

var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, jso);        
Console.WriteLine(s);

Result:

enter image description here

If you need to print the result in the console, you may need to install additional language. Please refer here.

like image 57
rcs Avatar answered Nov 12 '22 04:11

rcs


To change the escaping behavior of the JsonSerializer you can pass in a custom JavascriptEncoder to the JsonSerializer by setting the Encoder property on the JsonSerializerOptions.

https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.encoder?view=netcore-3.0#System_Text_Json_JsonSerializerOptions_Encoder

The default behavior is designed with security in mind and the JsonSerializer over-escapes for defense-in-depth.

If all you are looking for is escaping certain "alphanumeric" characters of a specific non-latin language, I would recommend that you instead create a JavascriptEncoder using the Create factory method rather than using the UnsafeRelaxedJsonEscaping encoder.

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs)
};

var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, options);
Console.WriteLine(s);

Doing so keeps certain safe-guards, for instance, HTML-sensitive characters will continue to be escaped.

I would caution against using System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping flippantly since it does minimal escaping (which is why it has "unsafe" in the name). If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this.

See the remarks section within the API docs: https://learn.microsoft.com/en-us/dotnet/api/system.text.encodings.web.javascriptencoder.unsaferelaxedjsonescaping?view=netcore-3.0#remarks

You could also consider specifying UnicodeRanges.All if you expect/need all languages to remain un-escaped. This still escapes certain ASCII characters that are prone to security vulnerabilities.

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
};

For more information and code samples, see: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?view=netcore-3.0#customize-character-encoding

See the Caution Note

like image 45
ahsonkhan Avatar answered Nov 12 '22 05:11

ahsonkhan


You can use: System.Text.RegularExpressions.Regex.Unescape(string) to unescape the unicode characters. https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.unescape

Updating example from original question:

using System;
using System.Text.Json;

public class Program
{
    public static void Main()
    {
            var a = new A{Name = "你好"};
            var s = JsonSerializer.Serialize(a);
        
            var unescaped = System.Text.RegularExpressions.Regex.Unescape(s);

            Console.WriteLine(s);
            Console.WriteLine(unescaped);
        }
}

class A {
    public string Name {get; set;}
}

Output:

{"Name":"\u4F60\u597D"}
{"Name":"你好"}
like image 7
Steven Peirce Avatar answered Nov 12 '22 06:11

Steven Peirce


Use:

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
};
like image 6
Cyrus Avatar answered Nov 12 '22 04:11

Cyrus