Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running an intensive batch process in PHP, and avoiding memory exhaustion

I have several thousand records (stored in in a table in a MYSQL table) that I need to "batch process." All of the records contain a large JSON. In some cases, the JSON is over 1MB (yes, my DB is well over 1GB).

I have a function that grabs a record, decodes the JSON, changes some data, re-encodes the PHP array back to a JSON, and saves it back to the db. Pretty simple. FWIW, this is within the context of a CakePHP app.

Given an array of ID's, I'm attempting to do something like this (very simple mock code):

foreach ($ids as $id) {
    $this->Model->id = $id;
    $data = $this->Model->read();
    $newData = processData($data);
    $this->Model->save($newData);
}

The issue is that, very quickly, PHP runs out of memory. When running a foreach like this, it's almost as if PHP moves from one record to the next, without releasing the memory required for the preceding operations.

Is there anyway to run a loop in such a way that memory is freed before moving on to the next iteration of the loop, so that I can actually process the massive amount of data?

Edit: Adding more code. This function takes my JSON, converts it to a PHP array, does some manipulation (namely, reconfiguring data based on what's present in another array), and replacing values in the the original array. The JSON is many layers deep, hence the extremely long foreach loops.

function processData($theData) {
    $toConvert = json_decode($theData['Program']['data'], $assoc = true);
    foreach($toConvert['cycles'] as $cycle => $val) {
        foreach($toConvert['cycles'][$cycle]['days'] as $day => $val) {
            foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'] as $section => $val) {
                foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'] as $section => $val) {
                    foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'] as $exercise => $val) {
                        if (isset($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder'])) {
                            $folderName = $toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder']['folderName'];
                            if ( isset($newFolderList['Folders'][$folderName]) ) {
                                $toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder'] = $newFolderList['Folders'][$folderName]['id'];
                            }
                        }
                        if (isset($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile'])) {
                            $fileName = basename($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile']['fileURL']);
                            if ( isset($newFolderList['Exercises'][$fileName]) ) {
                                $toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile'] = $newFolderList['Exercises'][$fileName]['id'];
                            }
                        }
                    }
                }
            }
        }
    }
    return $toConvert;
}

Model->read() essentially just tells Cake to pull a record from the db, and returns it in an array. There's plenty of stuff that's happening behind the scenes, someone more knowledgable would have to explain that.

like image 782
Benjamin Allison Avatar asked Nov 03 '22 23:11

Benjamin Allison


1 Answers

The first step I would do is make sure everything is passed by reference.

Eg,

foreach ($ids as $id) {
processData($data);
}

function processData(&$d){}

http://php.net/manual/en/language.references.pass.php

like image 141
Jase Whatson Avatar answered Nov 08 '22 10:11

Jase Whatson