Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse large JSON file [duplicate]

Tags:

json

php

mysql

I'm working on a cron script that hits an API, receives JSON file (a large array of objects) and stores it locally. Once that is complete another script needs to parse the downloaded JSON file and insert each object into a MySQL database.

I'm currently using a file_get_contents() along with json_decode(). This will attempt to read the whole file into memory before trying to process it. This would be fine except for the fact that my JSON files will usually range from 250MB-1GB+. I know I can increase my PHP memory limit but that doesn't seem to be the greatest answer in my mind. I'm aware that I can run fopen() and fgets() to read the file in line by line, but I need to read the file in by each json object.

Is there a way to read in the file per object, or is there another similar approach?

like image 709
Dan Ramos Avatar asked Mar 12 '13 22:03

Dan Ramos


2 Answers

try this lib https://github.com/shevron/ext-jsonreader

The existing ext/json which is shipped with PHP is very convenient and simple to use - but it is inefficient when working with large ammounts of JSON data, as it requires reading the entire JSON data into memory (e.g. using file_get_contents()) and then converting it into a PHP variable at once - for large data sets, this takes up a lot of memory.

JSONReader is designed for memory efficiency - it works on streams and can read JSON data from any PHP stream without loading the entire data into memory. It also allows the developer to extract specific values from a JSON stream without decoding and loading all data into memory.

like image 149
Pawel Wodzicki Avatar answered Oct 02 '22 05:10

Pawel Wodzicki


This really depends on what the json files contain.

If opening the file one shot into memory is not an option, your only other option, as you eluded to, is fopen/fgets.

Reading line by line is possible, and if these json objects have a consistent structure, you can easily detect when a json object in a file starts, and ends.

Once you collect a whole object, you insert it into a db, then go on to the next one.

There isn't much more to it. the algorithm to detect the beginning and end of a json object may get complicating depending on your data source, but I hvae done something like this before with a far more complex structure (xml) and it worked fine.

like image 26
Kovo Avatar answered Oct 02 '22 07:10

Kovo