Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing large JSON files in PHP

I am trying to process somewhat large (possibly up to 200M) JSON files. The structure of the file is basically an array of objects.

So something along the lines of:

[   {"property":"value", "property2":"value2"},   {"prop":"val"},   ...   {"foo":"bar"} ] 

Each object has arbitrary properties and does not necessary share them with other objects in the array (as in, having the same).

I want to apply a processing on each object in the array and as the file is potentially huge, I cannot slurp the whole file content in memory, decoding the JSON and iterating over the PHP array.

So ideally I would like to read the file, fetch enough info for each object and process it. A SAX-type approach would be OK if there was a similar library available for JSON.

Any suggestion on how to deal with this problem best?

like image 475
The Mighty Rubber Duck Avatar asked Oct 29 '10 06:10

The Mighty Rubber Duck


People also ask

How does PHP handle large JSON data?

Use it for parsing very large JSON documents to avoid loading the entire thing into memory, which is how just about every other JSON parser for PHP works. Give a try to JSON Machine and simply iterate a JSON file or stream of any size with pure foreach . No other setup necessary.

How big can JSON files be?

How large can JSON Documents be? One of the more frequently asked questions about the native JSON data type, is what size can a JSON document be. The short answer is that the maximum size is 1GB.

How do I load a large JSON file in Python?

To load big JSON files in a memory efficient and fast way with Python, we can use the ijson library. We call ijson. parse to parse the file opened by open . Then we print the key prefix , data type of the JSON value store in the_type , and the value of the entry with the given key prefix .

Which would be better option to consider in a environment when you have big JSON file?

question. Both big JSON file & multiple small JSON files to deal with standard days for transmitting data in the server as well as web application to the extent. It is also easier for taking everything in more efficient way.


1 Answers

I've written a streaming JSON pull parser pcrov/JsonReader for PHP 7 with an api based on XMLReader.

It differs significantly from event-based parsers in that instead of setting up callbacks and letting the parser do its thing, you call methods on the parser to move along or retrieve data as desired. Found your desired bits and want to stop parsing? Then stop parsing (and call close() because it's the nice thing to do.)

(For a slightly longer overview of pull vs event-based parsers see XML reader models: SAX versus XML pull parser.)


Example 1:

Read each object as a whole from your JSON.

use pcrov\JsonReader\JsonReader;  $reader = new JsonReader(); $reader->open("data.json");  $reader->read(); // Outer array. $depth = $reader->depth(); // Check in a moment to break when the array is done. $reader->read(); // Step to the first object. do {     print_r($reader->value()); // Do your thing. } while ($reader->next() && $reader->depth() > $depth); // Read each sibling.  $reader->close(); 

Output:

Array (     [property] => value     [property2] => value2 ) Array (     [prop] => val ) Array (     [foo] => bar ) 

Objects get returned as stringly-keyed arrays due (in part) to edge cases where valid JSON would produce property names that are not allowed in PHP objects. Working around these conflicts isn't worthwhile as an anemic stdClass object brings no value over a simple array anyway.


Example 2:

Read each named element individually.

$reader = new pcrov\JsonReader\JsonReader(); $reader->open("data.json");  while ($reader->read()) {     $name = $reader->name();     if ($name !== null) {         echo "$name: {$reader->value()}\n";     } }  $reader->close(); 

Output:

property: value property2: value2 prop: val foo: bar 

Example 3:

Read each property of a given name. Bonus: read from a string instead of a URI, plus get data from properties with duplicate names in the same object (which is allowed in JSON, how fun.)

$json = <<<'JSON' [     {"property":"value", "property2":"value2"},     {"foo":"foo", "foo":"bar"},     {"prop":"val"},     {"foo":"baz"},     {"foo":"quux"} ] JSON;  $reader = new pcrov\JsonReader\JsonReader(); $reader->json($json);  while ($reader->read("foo")) {     echo "{$reader->name()}: {$reader->value()}\n"; }  $reader->close(); 

Output:

foo: foo foo: bar foo: baz foo: quux 

How exactly to best read through your JSON depends on its structure and what you want to do with it. These examples should give you a place to start.

like image 64
user3942918 Avatar answered Sep 27 '22 02:09

user3942918