Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to parse a huge JSON file in ruby

I'm having a hard time parsing a huge json file.

The file is >1GB, and I've tried using the two gems: ruby-stream and yajl, and they both don't work.

Here's an example of what happens.

fileStr = File.read("hugeJSONfile.json")

^ This part is OK.

But when I try to load the fileStr into a JSON hash (via ruby-stream or yajl), my computer freezes.

Any other ideas on how to do this more efficiently? Thank you.

like image 434
hackstar15 Avatar asked Oct 20 '22 11:10

hackstar15


1 Answers

Take a look into the json-stream or yajl:

Key quote from the docs:

json-stream:

the document itself is never fully read into memory.

yajl:

The main benefit of this library is in its memory usage. Since it's able to parse the stream in chunks, its memory requirements are very, very low.

You register events you are looking for, and it returns keys/values while reading through the JSON instead of loading it all into a ruby data structure (and consequently into memory).

like image 70
14 revs, 12 users 16% Avatar answered Oct 22 '22 03:10

14 revs, 12 users 16%