Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out of memory exception while loading large json file from disk

Tags:

json

c#

json.net

I have a 1.2 GB json file which when deserialized ought to give me a list with 15 mil objects.

The machine on which I'm trying to deserialize the same is a windows 2012 server(64 bit) with 16 core and 32 GB Ram.

The application has been built with target of x64.

Inspite of this when I try to read the json doc and convert it to list of objects I'm getting Out of memory exception. when I look at task manager I find that only 5GB memory has been used.

The codes i tried are as below..

a.

 string plays_json = File.ReadAllText("D:\\Hun\\enplays.json");

                plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);

b.

 string plays_json = "";
        using (var reader = new StreamReader("D:\\Hun\\enplays.json"))
        {
            plays_json = reader.ReadToEnd();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

c.

 using (StreamReader sr = File.OpenText("D:\\Hun\\enplays.json"))
        {
            StringBuilder sb = new StringBuilder();
            sb.Append(sr.ReadToEnd());
            plays_json = sb.ToString();
            plays = JsonConvert.DeserializeObject<List<playdata>>(plays_json);
        }

All help is sincerely appreciated

like image 623
Arnab Avatar asked Sep 18 '25 08:09

Arnab


1 Answers

The problem is that you are reading your entire huge file into memory and then trying to deserialize it all at once into a huge list. You should be using a StreamReader to process your file incrementally. Example (b) in your question doesn't cut it, even though you are using a StreamReader there, because you are still reading the entire file via ReadToEnd(). You should be doing something like this instead:

using (StreamReader sr = new StreamReader("D:\\Hun\\enplays.json"))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    var serializer = new JsonSerializer();

    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartObject)
        {
            // Deserialize each object from the stream individually and process it
            var playdata = serializer.Deserialize<playdata>(reader);

            ProcessPlayData(playdata);
        }
    }
}

The ProcessPlayData method should process a single playdata object and then ideally write the result to a file or a database rather than an in-memory list (otherwise you may find yourself back in the same situation again). If you must store the results of processing each item into an in-memory list, then you might want to consider using a linked list or a similar structure that does not try to allocate memory in one contiguous block and does not need to reallocate and copy when it needs to expand.

like image 153
Brian Rogers Avatar answered Sep 20 '25 22:09

Brian Rogers