Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSON.parse() on a large array of objects is using way more memory than it should

I generate a ~200'000-element array of objects (using object literal notation inside map rather than new Constructor()), and I'm saving a JSON.stringify'd version of it to disk, where it takes up 31 MB, including newlines and one-space-per-indentation level (JSON.stringify(arr, null, 1)).

Then, in a new node process, I read the entire file into a UTF-8 string and pass it to JSON.parse:

var fs = require('fs');
var arr1 = JSON.parse(fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}));

Node memory usage is about 1.05 GB according to Mavericks' Activity Monitor! Even typing into a Terminal feels laggier on my ancient 4 GB RAM machine.

But if, in a new node process, I load the file's contents into a string, chop it up at element boundaries, and JSON.parse each element individually, ostensibly getting the same object array:

var fs = require('fs');
var arr2 = fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}).trim().slice(1,-3).split('\n },').map(function(s) {return JSON.parse(s+'}');});

node is using just ~200 MB of memory, and no noticeable system lag. This pattern persists across many restarts of node: JSON.parseing the whole array takes a gig of memory while parsing it element-wise is much more memory-efficient.

Why is there such a huge disparity in memory usage? Is this a problem with JSON.parse preventing efficient hidden class generation in V8? How can I get good memory performance without slicing-and-dicing strings? Must I use a streaming JSON parse 😭?

For ease of experimentation, I've put the JSON file in question in a Gist, please feel free to clone it.

like image 635
Ahmed Fasih Avatar asked Jun 01 '15 02:06

Ahmed Fasih


People also ask

Does JSON parse work on arrays?

JSON. parse() converts array data into a Javascript array. The array data must be a valid JSON string.

Does JSON parse have a limit?

The Limitations of Parse JSONPower Automate has a usage limit of 5,000 API requests. Reading the licensing information clarifies that this doesn't mean you can run the flow 5,000 times because the software system considers every flow action as an API request.

How do I handle a large JSON file?

Instead of reading the whole file at once, the 'chunksize' parameter will generate a reader that gets a specific number of lines to be read every single time and according to the length of your file, a certain amount of chunks will be created and pushed into memory; for example, if your file has 100.000 lines and you ...

How much memory is a JSON object?

One of the more frequently asked questions about the native JSON data type, is what size can a JSON document be. The short answer is that the maximum size is 1GB.


1 Answers

I think a comment hinted at the answer to this question, but I'll expand on it a little. The 1 GB of memory being used presumably includes a large number of allocations of data that is actually 'dead' (in that it has become unreachable and is therefore not really being used by the program any more) but has not yet been collected by the Garbage Collector.

Almost any algorithm processing a large data set is likely to produce a very large amount of detritus in this manner, when the programming language/technology used is a typical modern one (e.g. Java/JVM, c#/.NET, JavaScript). Eventually the GC removes it.

It is interesting to note that techniques can be used to dramatically reduce the amount of ephemeral memory allocation that certain algorithms incur (by having pointers into the middles of strings), but I think these techniques are hard or impossible to employ in JavaScript.

like image 159
debater Avatar answered Oct 17 '22 01:10

debater