JSON.parse() on a large array of objects is using way more memory than it should

Tags:

I generate a ~200'000-element array of objects (using object literal notation inside map rather than new Constructor()), and I'm saving a JSON.stringify'd version of it to disk, where it takes up 31 MB, including newlines and one-space-per-indentation level (JSON.stringify(arr, null, 1)).

Then, in a new node process, I read the entire file into a UTF-8 string and pass it to JSON.parse:

var fs = require('fs');
var arr1 = JSON.parse(fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}));

Node memory usage is about 1.05 GB according to Mavericks' Activity Monitor! Even typing into a Terminal feels laggier on my ancient 4 GB RAM machine.

But if, in a new node process, I load the file's contents into a string, chop it up at element boundaries, and JSON.parse each element individually, ostensibly getting the same object array:

var fs = require('fs');
var arr2 = fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}).trim().slice(1,-3).split('\n },').map(function(s) {return JSON.parse(s+'}');});

node is using just ~200 MB of memory, and no noticeable system lag. This pattern persists across many restarts of node: JSON.parseing the whole array takes a gig of memory while parsing it element-wise is much more memory-efficient.

Why is there such a huge disparity in memory usage? Is this a problem with JSON.parse preventing efficient hidden class generation in V8? How can I get good memory performance without slicing-and-dicing strings? Must I use a streaming JSON parse 😭?

For ease of experimentation, I've put the JSON file in question in a Gist, please feel free to clone it.

635

asked Jun 01 '15 02:06

Ahmed Fasih

1 Answers

I think a comment hinted at the answer to this question, but I'll expand on it a little. The 1 GB of memory being used presumably includes a large number of allocations of data that is actually 'dead' (in that it has become unreachable and is therefore not really being used by the program any more) but has not yet been collected by the Garbage Collector.

Almost any algorithm processing a large data set is likely to produce a very large amount of detritus in this manner, when the programming language/technology used is a typical modern one (e.g. Java/JVM, c#/.NET, JavaScript). Eventually the GC removes it.

It is interesting to note that techniques can be used to dramatically reduce the amount of ephemeral memory allocation that certain algorithms incur (by having pointers into the middles of strings), but I think these techniques are hard or impossible to employ in JavaScript.

159

answered Oct 17 '22 01:10

debater

Related questions
                            
                                Prev & Next button with counter for overlay using jQuery
                            
                                Onclick javascript stops form submit in Chrome
                            
                                gapi.load versus gapi.client.load
                            
                                How to bring focused field into the view using iscroll and Android WebView
                            
                                "Protocols, domains, and ports must match" problems
                            
                                Setting dynamic ng-model names in AngularJS
                            
                                How to create a Three.js 3D line series with width and thickness?
                            
                                How do you get URL parameters in a Google Form using Google Apps Script?
                            
                                Fully buffer video in Chrome
                            
                                Bower mirror repository
                            
                                How use mocha test framework with node.js and sails.js
                            
                                Javascript setMonth shows improper date
                            
                                Is the caret on the first line of the textarea? On the last line?
                            
                                Editable HTML content is very laggy when it is large
                            
                                Observer Pattern vs Mediator Pattern
                            
                                How do I get a hold of a Strongloop loopback model?
                            
                                How can I determine US County by zip-code?
                            
                                AngularJS - why manipulating DOM in controller is a bad thing?
                            
                                Avoid Adding Event Multiple Times
                            
                                Functions inside constructor vs prototype

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

JSON.parse() on a large array of objects is using way more memory than it should

Tags:

json

javascript

arrays

node.js

parsing

Ahmed Fasih

People also ask

1 Answers

debater

Recent Activity

Donate For Us