Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak when executing Doctrine query in loop

I'm having trouble in locating the cause for a memory leak in my script. I have a simple repository method which increments a 'count' column in my entity by X amount:

public function incrementCount($id, $amount)
{
    $query = $this
        ->createQueryBuilder('e')
        ->update('MyEntity', 'e')
        ->set('e.count', 'e.count + :amount')
        ->where('e.id = :id')
        ->setParameter('id', $id)
        ->setParameter('amount', $amount)
        ->getQuery();

    $query->execute();
}

Problem is, if I call this in a loop the memory usage balloons on every iteration:

$entityManager = $this->getContainer()->get('doctrine')->getManager();
$myRepository = $entityManager->getRepository(MyEntity::class);
while (true) {
    $myRepository->incrementCount("123", 5);
    $doctrineManager->clear();
    gc_collect_cycles();
}

What am I missing here? I've tried ->clear(), as per Doctrine's advice on batch processing. I even tried gc_collect_cycles(), but still the issue remains.

I'm running Doctrine 2.4.6 on PHP 5.5.

like image 339
Jonathan Avatar asked Oct 28 '14 19:10

Jonathan


7 Answers

I just ran into the same issue, these are the things that fixed it for me:

--no-debug

As the OP mentioned in their answer, setting --no-debug (ex: php bin/console <my_command> --no-debug) is crucial for performance/memory in Symfony console commands. This is especially true when using Doctrine, as without it, Doctrine will go into debug mode which consumes a huge amount of additional memory (that increases on each iteration). See the Symfony docs here and here for more info.

--env=prod

You should also always specify the environment. By default, Symfony uses the dev environment for console commands. The dev environment usually isn't optimized for memory, speed, cpu etc. If you want to iterate over thousands of items, you should probably be using the prod environment (ex: php bin/console <my_command> --env prod). See here and here for more info.

Tip: I created an environment called console that I specifically configured for running console commands. Here is info about how to create additional Symfony environments.

php -d memory_limit=YOUR_LIMIT

If running a big update, you should probably choose how much memory is acceptable for it to consume. This is especially important if you think there might be a leak. You can specify the memory for the Command by using php -d memory_limit=x (ex: php -d memory_limit=256M). Note: you can set the limit to -1 (usually the default for the php cli) to let the command run with no memory limit but this is obviously dangerous.

A Well Formed Console Command For Batch Processing

A well formed console command for running a big update using the above tips would look like:

php -d memory_limit=256M bin/console <acme>:<your_command> --env=prod --no-debug

Use Doctrine's IterableResult

Another huge one when using Doctrine's ORM in a loop, is to use Doctrine's IterableResult (see the Doctrine Batch Processing docs). This won't help in the example provided but usually when doing processing like this it is over results from a query.

Flush Periodically

If part of what you are doing is making changes to the data, you should flush periodically instead of on each iteration. Flushing is expensive and slow. The less often you flush, the faster your command will finish. Keep in mind, however, that Doctrine will hold the unflushed data in memory. So the less often that you flush, the more memory you will need.

You can use something like the following to flush every 100 iterations:

if ($count % 100 === 0) {
    $this->em->flush();
}

Also make sure to flush again at the end of your loop (for flushing the last < 100 entries).

Clear the EntityManager

You may also want to clear after you flush:

$this->em->flush();
$em->clear();  // Detach ALL objects from Doctrine.

Or

$this->em->flush();
$em->clear(MyEntity::class); // Detach all MyEntity from Doctrine.
$em->clear(MyRelatedEntity::class); // Detach all MyRelatedEntity from Doctrine.

Output the memory usage as you go

It can be really helpful to keep track of how much memory your command is consuming while it is running. You can do that by outputting the response returned by PHP's built-in memory_get_usage() function.

$output->writeln(memory_get_usage());

Example

$memUse = round(memory_get_usage() / 1000000, 2).'MB';
$this->output->writeln('Processed '.$i.' of '.$totalCount.' (mem: '.$memUse.')');

Roll Your Own Batches

It may also be helpful to roll your own batches. You can do this by using a start and limit just like you would for pagination. I was able to process 4 millions rows using only 90Mb of RAM doing this.

Here's some example code:


protected function execute(InputInterface $input, OutputInterface $output) {
    /* ... */
    $totalCount = $this->getTotalCount();
    $batchSize = 10000;
    $i = 0;
    while ($i < $totalCount) {
        $i = $this->processBatch($i, $batchSize, $totalCount);
    }
    /* ... */
}

private function processBatch(int $start, int $limit, int $totalCount): int {
    /* @var $q \Doctrine\ORM\Query */
    $q = $this->em->createQueryBuilder()
        ->select('e')
        ->from('AcmeExampleBundle:MyEntity', 'e')
        ->setFirstResult($start)
        ->setMaxResults($limit)
        ->getQuery();

    /* @var $iterableResult \Doctrine\ORM\Internal\Hydration\IterableResult */
    $iterableResult = $q->iterate(null, \Doctrine\ORM\Query::HYDRATE_SIMPLEOBJECT);

    $i = $start;
    foreach ($iterableResult as $row) {
        /* @var $myEntity \App\Entity\MyEntity */
        $myEntity = $row[0];

        $this->processOne($myEntity);

        if (0 === ($i % 1000)) {
            $memUse = round(memory_get_usage() / 1000000, 2).'MB';
            $this->output->writeln('Processed '.$i.' of '.$totalCount.' (mem: '.$memUse.')');
        }
        $this->em->detach($row[0]);
        $i++;
    }

    return $i;
}

private function processOne(MyEntity $myEntity): void {
    // Do entity processing here.
}

private function getTotalCount(): int {
    /* @var $q \Doctrine\ORM\Query */
    $q = $this->em
        ->createQueryBuilder()
        ->select('COUNT(e.id)')
        ->from('AcmeExampleBundle:MyEntity', 'e')
        ->getQuery();

    $count = $q->getSingleScalarResult();

    return $count;
}

Good luck!

like image 173
Collin Krawll Avatar answered Sep 21 '22 20:09

Collin Krawll


I resolved this by adding --no-debug to my command. It turns out that in debug mode, the profiler was storing information about every single query in memory.

like image 34
Jonathan Avatar answered Sep 22 '22 20:09

Jonathan


Doctrine keeps logs of any query you make. If you make lots of queries (normally happens in loops) Doctrine can cause a huge memory leak.

You need to disable the Doctrine SQL Logger to overcome this.

I recommend doing this only for the loop part.

Before loop, get current logger:

$sqlLogger = $em->getConnection()->getConfiguration()->getSQLLogger();

And then disable the SQL Logger:

$em->getConnection()->getConfiguration()->setSQLLogger(null);

Do loop here: foreach() / while() / for()

After loop ends, put back the Logger:

$em->getConnection()->getConfiguration()->setSQLLogger($sqlLogger);

like image 23
mFlorin Avatar answered Sep 24 '22 20:09

mFlorin


For me it was clearing doctrine, or as the documentation says, detaching all entities:

$this->em->clear(); //Here em is the entity manager.

So inside my loop y flush every 1000 iterations and detach all entities (I don't need them anymore):

    foreach ($reader->getRecords() as $position => $value) {
        $this->processValue($value, $position);
        if($position % 1000 === 0){
            $this->em->flush();
            $this->em->clear();
        }
        $this->progress->advance();
    }

Hope this helps.

PS: here's the documentation.

like image 26
amcastror Avatar answered Sep 21 '22 20:09

amcastror


You're wasting memory for each iteration. A much better way would be to prepare the query once and swap arguments many times. For example:

class MyEntity extends EntityRepository{
    private $updateQuery = NULL;

    public function incrementCount($id, $ammount)
    {
        if ( $this->updateQuery == NULL ){
            $this->updateQuery = $this->createQueryBuilder('e')
                ->update('MyEntity', 'e')
                ->set('e.count', 'e.count + :amount')
                ->where('e.id = :id')
                ->getQuery();
        }

        $this->updateQuery->setParameter('id', $id)
                ->setParameter('amount', $amount);
                ->execute();
    }
}

As you mentioned, you can employ batch processing here, but try this out first and see how well (if at all) performs...

like image 23
Jovan Perovic Avatar answered Sep 22 '22 20:09

Jovan Perovic


I had similar issues with a memory leak. I'm running Doctrine in a Symfony 5.2 project. More specific, I built a never-ending Command which is processing entries from one table, retrieves entries from another table, and creates 2 new entries in other tables. (Event Processing)

I solved my leakage problems in two steps.

  1. I use the --no-debug when running the command (as already suggested by Jonathan)
  2. I added at the end of the loop $this->entityManager->clear();

In order to see and identify the leakages, I used the following line to output the current memory usage:

$output->writeln('Memory Usage in MB: ' . memory_get_usage() / 1024 / 1024);

Maybe this helps anyone still fighting with leakages.

like image 44
dj_thossi Avatar answered Sep 22 '22 20:09

dj_thossi


I encountered the same issue and disabling the query cache helped me.

$query = $this
    ->createQueryBuilder('e')
    ->update('MyEntity', 'e')
    ->set('e.count', 'e.count + :amount')
    ->where('e.id = :id')
    ->setParameter('id', $id)
    ->setParameter('amount', $amount)
    ->getQuery()
    ->useQueryCache(false); // <-- this line
like image 34
Milan Miscevic Avatar answered Sep 21 '22 20:09

Milan Miscevic