Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP | json_decode huge json file

Tags:

json

arrays

php

im trying to decode large json file 222mb file.

i understand i can not use json_decode directly by using file_get_contents() to read whole file and decode whole string, as it would consume alot of memory and would return nothing(this is what its doing so far.)

so i went to try out libraries, The one i tried recently is JSONParser. what it does reads the objects one by one in json array.

but due to lack of documentation there, i want to ask here if anyone has worked with this library.

this is the example test code from github

// initialise the parser object
$parser = new JSONParser();

// sets the callbacks
$parser->setArrayHandlers('arrayStart', 'arrayEnd');
$parser->setObjectHandlers('objStart', 'objEnd');
$parser->setPropertyHandler('property');
$parser->setScalarHandler('scalar');
/*
echo "Parsing top level object document...\n";
// parse the document
$parser->parseDocument(__DIR__ . '/data.json');*/

$parser->initialise();

//echo "Parsing top level array document...\n";
// parse the top level array

$parser->parseDocument(__DIR__ . '/array.json');

how to use a loop and save the object in php variable that we can easily decode to php array for our further use.

this would take some time as it would be doing this one by one for all objects of json array, but question stands how to loop over it using this library, or isn't there such option.

Or are any other better options or libraries for this sorta job?

like image 493
Sizzling Code Avatar asked Jun 17 '16 18:06

Sizzling Code


1 Answers

One alternative here is to use the salsify/jsonstreamingparser

You need to create your own Listener.

$testfile = '/path/to/file.json';
$listener = new MyListener();
$stream = fopen($testfile, 'r');
try {
    $parser = new \JsonStreamingParser\Parser($stream, $listener);
    $parser->parse();
    fclose($stream);
} catch (Exception $e) {
    fclose($stream);
    throw $e;
}

To make things simply to understand, I"m using this json for example:

JSON Input

{
    "objects": [
    {
        "propertyInt": 1,
        "propertyString": "string",
        "propertyObject": { "key": "value" }            
    },
    {
        "propertyInt": 2,
        "propertyString": "string2",
        "propertyObject": { "key": "value2" }
    }]
}

You need to implement your own listener. In this case, I just want to get the objects inside array.

PHP

class MyListener extends \JsonStreamingParser\Listener\InMemoryListener
{
    //control variable that allow us to know if is a child or parent object
    protected $level = 0;

    protected function startComplexValue($type)
    {
        //start complex value, increment our level
        $this->level++;
        parent::startComplexValue($type);
    }
    protected function endComplexValue()
    {
        //end complex value, decrement our level
        $this->level--;
        $obj = array_pop($this->stack);
        // If the value stack is now empty, we're done parsing the document, so we can
        // move the result into place so that getJson() can return it. Otherwise, we
        // associate the value
        if (empty($this->stack)) {
            $this->result = $obj['value'];
        } else {
            if($obj['type'] == 'object') {
                //insert value to top object, author listener way
                $this->insertValue($obj['value']);
                //HERE I call the custom function to do what I want
                $this->insertObj($obj);
            }
        }
    }

    //custom function to do whatever
    protected function insertObj($obj)
    {
        //parent object
        if($this->level <= 2) {
          echo "<pre>";
          var_dump($obj);
          echo "</pre>";
        }
    }
}

Output

array(2) {
  ["type"]=>
  string(6) "object"
  ["value"]=>
  array(3) {
    ["propertyInt"]=>
    int(1)
    ["propertyString"]=>
    string(6) "string"
    ["propertyObject"]=>
    array(1) {
      ["key"]=>
      string(5) "value"
    }
  }
}
array(2) {
  ["type"]=>
  string(6) "object"
  ["value"]=>
  array(3) {
    ["propertyInt"]=>
    int(2)
    ["propertyString"]=>
    string(7) "string2"
    ["propertyObject"]=>
    array(1) {
      ["key"]=>
      string(6) "value2"
    }
  }
}

I tested it against a JSON file with 166MB and it works. Maybe you need to adapt the listener to your needs.

like image 89
Felippe Duarte Avatar answered Sep 27 '22 17:09

Felippe Duarte