Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory usage goes wild with Doctrine bulk insert

I’m trying to insert a large amount of data (30 000+ lines) in a MySQL database using Doctrine2 and the Symfony2 fixture bundle.

I looked at the right way to do it. I saw lots of questions about memory leaks and Doctrine, but no satisfying answer for me. It often comes the Doctrine clear() function.

So, I did various shapes of this:

while (($data = getData()) {
    $iteration++;

    $obj = new EntityObject();
    $obj->setName('henry');
    // Fill object...

    $manager->persist($obj);

    if ($iteration % 500 == 0) {
        $manager->flush();
        $manager->clear();

        // Also tried some sort of:
        // $manager->clear($obj);   
        // $manager->detach($obj);
        // gc_collect_cycles();
    }
}

PHP memory still goes wild, right after the flush() (I’m sure of that). In fact, every time the entities are flushed, memory goes up for a certain amount depending on batch size and the entities, until it reaches the deadly Allowed Memory size exhausted error. With a very very tiny entity, it works but memory consumption increase too much: several MB whereas it should be KB.

clear(), detach() or calling GC doesn’t seem to have an effect at all. It only clears some KB.

Is my approach flawed? Did I miss something, somewhere? Is it a bug?

More info:

  • Without flush() memory barely moves;
  • Lowering the batch do not change the outcome;
  • Data comes from a CSV that need to be sanitized;

EDIT (partial solution):

@qooplmao brought a solution that significantly decrease memory consumption, disable doctrine sql logger: $manager->getConnection()->getConfiguration()->setSQLLogger(null);

However, it is still abnormally high and increasing.

like image 281
Gui-Don Avatar asked Oct 30 '15 00:10

Gui-Don


Video Answer


2 Answers

I found out that Doctrine logs all SQLs during execute. I recommend to disable it with code below, it can really save memory:

use Doctrine\ORM\EntityManagerInterface;


public function __construct(EntityManagerInterface $entity_manager)
{
    $em_connection = $entity_manager->getConnection();
    $em_connection->getConfiguration()->setSQLLogger(null);
}
like image 62
Karol Gasienica Avatar answered Oct 29 '22 10:10

Karol Gasienica


I resolved my problem using this resource, as @Axalix suggested.

This is how I modified the code:

// IMPORTANT - Disable the Doctrine SQL Logger
$manager->getConnection()->getConfiguration()->setSQLLogger(null);

// SUGGESION - make getData as a generator (using yield) to to save more memory.
while ($data = getData()) {
  $iteration++;

  $obj = new EntityObject();
  $obj->setName('henry');
  // Fill object...

  $manager->persist($obj);

  // IMPORTANT - Temporary store entities (of course, must be defined first outside of the loop)
  $tempObjets[] = $obj;

  if ($iteration % 500 == 0) {
    $manager->flush();

    // IMPORTANT - clean entities
    foreach($tempObjets as $tempObject) {
      $manager->detach($tempObject);
    }

    $tempObjets = null;
    gc_enable();
    gc_collect_cycles();
  }
}

// Do not forget the last flush
$manager->flush();

And, last but not least, as I use this script with Symfony data fixtures, adding the --no-debug parameter in the command is also very important. Then memory consumption is stable.

like image 22
Gui-Don Avatar answered Oct 29 '22 12:10

Gui-Don