Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Doctrine insert many data

I'm working on an import of 300000 row inside many csv.

First I take csv and import every row into a table inside my database.

After I want to parse all row and insert into the right table with some relation that data.

So I have tried this:

    $qb = $this->entityManager->createQueryBuilder();
    $flows = $qb->select('flow')
        ->from('AppBundle:FlowAndata', 'flow')
        ->getQuery()
        ->getResult();

    $countRows = 0;
    foreach ($flows as $row) {
         //some check 
         $entity = new TestTable();
         $entity->setCode($row->getCode());
         //many other fields
         $this->entityManager->persist($entity);
         $this->entityManager->flush();
    }

In this case all procedure took about 5 seconds for every row!

Now If I add setMaxResults like this:

    $qb = $this->entityManager->createQueryBuilder();
    $flows = $qb->select('flow')
        ->from('AppBundle:FlowAndata', 'flow')
        ->setMaxResults(100)
        ->getQuery()
        ->getResult();

It took less than 1 second!

So I have thought to get all rows and split it into a recursive function with setMaxResult like this:

    $qb = $this->entityManager->createQueryBuilder();
    $flows = $qb->select('flow')
        ->from('AppBundle:FlowAndata', 'flow')
        ->getQuery()
        ->getResult();

    $countFlows = count($flows);
    $numberOfQuery = $countFlows / 100;

    for ($i = 0; $i <= $numberOfQuery; $i++) {
         $this->entityManager->clear();
         $qb = $this->entityManager->createQueryBuilder();
         $flows = $qb->select('flow')
            ->from('AppBundle:FlowAndata', 'flow')
            ->setFirstResult($i * 100)
            ->setMaxResults(100)
            ->getQuery()
            ->getResult();

    }

In this way I create many query splitted into 100 rows. Is a good practice or there is a better way to parse many rows and make an insert of It?

like image 409
Alessandro Minoccheri Avatar asked Jul 14 '17 15:07

Alessandro Minoccheri


1 Answers

The efficient way recommended by the official documentation of Doctrine is taking advantage of the transactional write-behind behavior of an EntityManager.

Iterating Large Results for Data-Processing

You can use the iterate() method just to iterate over a large result and no UPDATE or DELETE intention. The IterableResult instance returned from $query->iterate() implements the Iterator interface so you can process a large result without memory problems using the following approach. (See example)

Bulk Inserts

Bulk inserts in Doctrine are best performed in batches, taking advantage of the transactional write-behind behavior of an EntityManager. [...] You may need to experiment with the batch size to find the size that works best for you. Larger batch sizes mean more prepared statement reuse internally but also mean more work during flush. (See example)

Version mixing both techniques (Inside Entity Repository):

$q = $this->_em->createQuery('SELECT f FROM AppBundle:FlowAndata f');
$iterableResult = $q->iterate();

$i = 0;
$batchSize = 100;

foreach ($iterableResult as $row) {
    // do stuff with the data in the row, $row[0] is always the object 
    /** @var AppBundle\Entity\FlowAndata $flow */
    $flow = $row[0];

    //some check 
    $entity = new TestTable();
    $entity->setCode($row->getCode());
    //many other fields

    $this->_em->persist($entity);

    $i++;
    if (($i % $batchSize) === 0) {
        $this->_em->flush();
        // Detaches all objects from Doctrine!
        $this->_em->clear(); 
    } else {
        // detach from Doctrine, so that it can be Garbage-Collected immediately
        $this->_em->detach($flow);
    }
}

$this->_em->flush(); //Persist objects that did not make up an entire batch
$this->_em->clear();
like image 55
yceruto Avatar answered Nov 18 '22 08:11

yceruto