Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing large JSON file in .NET

I have used the "JsonConvert.Deserialize(json)" method of Json.NET so far which worked quite well and to be honest, I didn't need anything more than this.

I am working on a background (console) application which constantly downloads the JSON content from different URLs, then deserializes the result into a list of .NET objects.

 using (WebClient client = new WebClient())  {       string json = client.DownloadString(stringUrl);        var result = JsonConvert.DeserializeObject<List<Contact>>(json);   } 

The simple code snippet above doesn't probably seem perfect, but it does the job. When the file is large (15,000 contacts - 48 MB file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.

The downloaded JSON content is an array and this is how a sample looks like. Contact is a container class for the deserialized JSON object.

[   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   } ] 

My initial guess is it runs out of memory. Just out of curiosity, I tried to parse it as JArray which caused the same exception too.

I have started to dive into Json.NET documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.

UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So downloaded two of them and checked them in Notepad++. I noticed that if the array length is more than 12,000, after 12000th element, the "[" is closed and another array starts. In other words, the JSON looks exactly like this:

[   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   } ] [   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   },   {     "firstname": "sometext",     "lastname": "sometext"   } ] 
like image 467
Yavar Hasanov Avatar asked Aug 26 '15 13:08

Yavar Hasanov


2 Answers

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.

Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually.

This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.

    using (WebClient client = new WebClient())     using (Stream stream = client.OpenRead(stringUrl))     using (StreamReader streamReader = new StreamReader(stream))     using (JsonTextReader reader = new JsonTextReader(streamReader))     {         reader.SupportMultipleContent = true;          var serializer = new JsonSerializer();         while (reader.Read())         {             if (reader.TokenType == JsonToken.StartObject)             {                 Contact c = serializer.Deserialize<Contact>(reader);                 Console.WriteLine(c.FirstName + " " + c.LastName);             }         }     } 

Full demo here: https://dotnetfiddle.net/2TQa8p

like image 116
Brian Rogers Avatar answered Oct 02 '22 13:10

Brian Rogers


Json.NET supports deserializing directly from a stream. Here is a way to deserialize your JSON using a StreamReader reading the JSON string one piece at a time instead of having the entire JSON string loaded into memory.

using (WebClient client = new WebClient()) {     using (StreamReader sr = new StreamReader(client.OpenRead(stringUrl)))     {         using (JsonReader reader = new JsonTextReader(sr))         {             JsonSerializer serializer = new JsonSerializer();              // read the json from a stream             // json size doesn't matter because only a small piece is read at a time from the HTTP request             IList<Contact> result = serializer.Deserialize<List<Contact>>(reader);         }     } } 

Reference: JSON.NET Performance Tips

like image 31
Kristian Vukusic Avatar answered Oct 02 '22 12:10

Kristian Vukusic