Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Serializing foreign languages using JSON.Net

Tags:

c#

json.net

I want to serialize a .NET object to JSON which contains foreign language strings such as Chinese or Russian. When i do that (using the code below) in the resulting JSON it encodes those characters which are stored as strings as "?" instead of the requisite unicode char.

using Newtonsoft.Json;

var serialized = JsonConvert.SerializeObject(myObj, new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.All, Formatting = Newtonsoft.Json.Formatting.Indented });

Is there a way to use the JSON.Net serializer with foreign languages?

E.g

אספירין (hebrew)

एस्पिरि (hindi)

阿司匹林 (chinese)

アセチルサリチル酸 (japanese)

Many Thanks!

like image 722
Jon S Avatar asked Sep 04 '15 20:09

Jon S


1 Answers

It is not the serializer that is causing this issue; Json.Net handles foreign characters just fine. More likely you are doing one of the following:

  1. Using an inappropriate encoding (or not setting the encoding) when writing the JSON to a file or stream. You should probably be using Encoding.UTF8.
  2. Storing the JSON into a varchar column in your database rather than nvarchar. varchar does not support unicode characters.
  3. Viewing the JSON with a viewer that does not support unicode, uses the wrong encoding and/or uses a font that does not have the full set of unicode character glyphs. The Windows command prompt window seems to have this issue, for example.

To prove that the serializer is not the problem, try compiling and running the following example program. It will create two different output files from the same JSON, one using UTF-8 encoding and the other using the default encoding. Open each file using Notepad. The "default" file will have the foreign characters as ? characters. In the UTF-8 encoded file, you should see all the characters are intact. (If you still don't see them, try changing the Notepad font to "Arial Unicode MS".)

You can also see the foreign characters are correct in the JSON using the Visual Studio debugger; just put a breakpoint after the line where it serializes the JSON and examine the json variable.

using System;
using System.Collections.Generic;
using System.IO;
using Newtonsoft.Json;

class Program
{
    static void Main(string[] args)
    {
        List<Foo> foos = new List<Foo>
        {
            new Foo { Language = "Hebrew", Sample = "אספירין" },
            new Foo { Language = "Hindi", Sample = "एस्पिरि" },
            new Foo { Language = "Chinese", Sample = "阿司匹林" },
            new Foo { Language = "Japanese", Sample = "アセチルサリチル酸" },
        };

        var json = JsonConvert.SerializeObject(foos, Formatting.Indented);

        File.WriteAllText("utf8.json", json, Encoding.UTF8);
        File.WriteAllText("default.json", json, Encoding.Default);
    }
}

class Foo
{
    public string Language { get; set; }
    public string Sample { get; set; }
}
like image 165
Brian Rogers Avatar answered Nov 16 '22 21:11

Brian Rogers