Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory considerations for long-running php scripts

I want to write a worker for beanstalkd in php, using a Zend Framework 2 controller. It starts via the CLI and will run forever, asking for jobs from beanstalkd like this example.

In simple pseudo-like code:

while (true) {
    $data   = $beanstalk->reserve();

    $class  = $data->class;
    $params = $data->params;

    $job    = new $class($params);
    $job();
}

The $job has here an __invoke() method of course. However, some things in these jobs might be running for a long time. Some might run with a considerable amount of memory. Some might have injected the $beanstalk object, to start new jobs themselves, or have a Zend\Di\Locator instance to pull objects from the DIC.

I am worried about this setup for production environments on the long term, as perhaps circular references might occur and (at this moment) I do not explicitly "do" any garbage collection while this action might run for weeks/months/years *.

*) In beanstalk, reserve is a blocking call and if no job is available, this worker will wait until it gets any response back from beanstalk.

My question: how will php handle this on the long term and should I take any special precaution to keep this from blocking?

This I did consider and might be helpful (but please correct if I am wrong and add more if possible):

  1. Use gc_enable() before starting the loop
  2. Use gc_collect_cycles() in every iteration
  3. Unset $job in every iteration
  4. Explicitly unset references in __destruct() from a $job

(NB: Update from here)

I did run some tests with arbitrary jobs. The jobs I included were: "simple", just set a value; "longarray", create an array of 1,000 values; "producer", let the loop inject $pheanstalk and add three simplejobs to the queue (so there is now a reference from job to beanstalk); "locatoraware", where a Zend\Di\Locator is given and all job types are instantiated (though not invoked). I added 10,000 jobs to the queue, then I reserved all jobs in a queue.

Results for "simplejob" (memory consumption per 1,000 jobs, with memory_get_usage())

0:     56392
1000:  548832
2000:  1074464
3000:  1538656
4000:  2125728
5000:  2598112
6000:  3054112
7000:  3510112
8000:  4228256
9000:  4717024
10000: 5173024

Picking a random job, measuring the same as above. Distribution:

["Producer"] => int(2431)
["LongArray"] => int(2588)
["LocatorAware"] => int(2526)
["Simple"] => int(2456)

Memory:

0:     66164
1000:  810056
2000:  1569452
3000:  2258036
4000:  3083032
5000:  3791256
6000:  4480028
7000:  5163884
8000:  6107812
9000:  6824320
10000: 7518020

The execution code from above is updated to this:

$baseMemory = memory_get_usage();
gc_enable();

for ( $i = 0; $i <= 10000; $i++ ) {
    $data = $bheanstalk->reserve();

    $class = $data->class;
    $params = $data->params;

    $job = new $class($params);
    $job();

    $job = null;
    unset($job);

    if ( $i % 1000 === 0 ) {
        gc_collect_cycles();
        echo sprintf( '%8d: ', $i ), memory_get_usage() - $baseMemory, "<br>";
    }
}

As everybody notices, the memory consumption is in php not leveraged and kept to a minimum, but increases over time.

like image 392
Jurian Sluiman Avatar asked Apr 02 '12 13:04

Jurian Sluiman


People also ask

How much memory does each PHP process consume?

A typical appropriate memory limit for PHP running Drupal is 128MB per process; for sites with a lot of contributed modules or with high-memory pages, 256MB or even 512MB may be more appropriate. It's often the case that the admin section of a site, or a particular page, uses much more memory than other pages.

Can PHP have memory leaks?

Memory leaks can happen in any language, including PHP. These memory leaks may happen in small increments that take time to accumulate, or in larger jumps that manifest quickly.

How does PHP manage memory?

PHP memory management functions are invoked by the MySQL Native Driver through a lightweight wrapper. Among others, the wrapper makes debugging easier. The various MySQL Server and the various client APIs differentiate between buffered and unbuffered result sets.


1 Answers

I've usually restarted the script regularly - though you don't have to do it after every job is run (unless you want to, and it's useful to clear memory). You could for example run for up to 100 jobs or more at a time or till the script had used say 20MB RAM, and then exit the script, to be instantly re-run.

My blogpost at http://www.phpscaling.com/2009/06/23/doing-the-work-elsewhere-sidebar-running-the-worker/ has some example shell scripts of re-running the scripts.

like image 153
Alister Bulman Avatar answered Nov 13 '22 13:11

Alister Bulman