Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to know which objects and how many of them do i have in memory?

I have a php script that uses Doctrine2 and Zend to calculate some things from a database and send some emails for 30.000 users.

My script is leaking memory and I want to know which are the objects that are consuming that memory, and if it is possible who is keeping a reference to them (thus not allowing them to be released).

Im using php 5.3.x, so plain circular references shouldn't be the problem.

Ive tried using xdebug trace capabilities to get mem_delta with no success (too much data).

Ive tried manually adding memory_get_usage before and after the important functions. But the only conclusion that I got was that I loose around 400k per user, and 3000 users times that gives me the 1Gb that i have available.

Are there any other ways to know where and why memory is leaking? Thanks

like image 564
Hernan Rajchert Avatar asked Oct 05 '11 23:10

Hernan Rajchert


2 Answers

You could try sending say 10 emails and then inserting this

get_defined_vars();

http://nz.php.net/manual/en/function.get-defined-vars.php

At the end of the script or after the email is sent (depending on how your code is setup).

This should tell you what is still loaded, and what you can unset / turn into a reference.

Also if there are two many things loaded you get this near start and end of your code and work out the difference.

like image 103
P4ul Avatar answered Oct 13 '22 23:10

P4ul


30.000 objects to hydrate is quite a lot. Doctrine 2 is stable, but there are some bugs, so I am not too surprised about your memory leak problems.

Although with smaller data sets I had some good success using doctrines batch processing capabilities and creating an iterable result.

You can use the code from the examples, and add a gc_collect_cycles() after each iteration. You have to test it, but for me batch sizes around 100 or so worked quite good – that number gave a good balance between performance and memory usage.

It´s quite important that the script recognizes which entities where processed so that it can be restarted without any problems and resume normal operation without sending emails twice.

$batchSize = 20;
$i = 0;
$q = $em->createQuery('select u from MyProject\Model\User u');
$iterableResult = $q->iterate();
while (($row = $iterableResult->next()) !== false) {
    $entity = $row[0];

    // do stuff with $entity here
    // mark entity as processed

    if (($i % $batchSize) == 0) {
        $em->flush(); 
        $em->clear();

        gc_collect_cycles();
    }
    ++$i;
}

Anyhow, maybe you should rethink your architecture for that script a bit, as a ORM is not well suited for processing large chunks of data. Maybe you can get away with working on the raw SQL rows?

like image 24
Max Avatar answered Oct 13 '22 22:10

Max