I'm working on an import of 300000 row inside many csv.
First I take csv and import every row into a table inside my database.
After I want to parse all row and insert into the right table with some relation that data.
So I have tried this:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->getQuery()
->getResult();
$countRows = 0;
foreach ($flows as $row) {
//some check
$entity = new TestTable();
$entity->setCode($row->getCode());
//many other fields
$this->entityManager->persist($entity);
$this->entityManager->flush();
}
In this case all procedure took about 5 seconds for every row!
Now If I add setMaxResults like this:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->setMaxResults(100)
->getQuery()
->getResult();
It took less than 1 second!
So I have thought to get all rows and split it into a recursive function with setMaxResult like this:
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->getQuery()
->getResult();
$countFlows = count($flows);
$numberOfQuery = $countFlows / 100;
for ($i = 0; $i <= $numberOfQuery; $i++) {
$this->entityManager->clear();
$qb = $this->entityManager->createQueryBuilder();
$flows = $qb->select('flow')
->from('AppBundle:FlowAndata', 'flow')
->setFirstResult($i * 100)
->setMaxResults(100)
->getQuery()
->getResult();
}
In this way I create many query splitted into 100 rows. Is a good practice or there is a better way to parse many rows and make an insert of It?
The efficient way recommended by the official documentation of Doctrine is taking advantage of the transactional write-behind behavior of an EntityManager
.
Iterating Large Results for Data-Processing
You can use the
iterate()
method just to iterate over a large result and no UPDATE or DELETE intention. TheIterableResult
instance returned from$query->iterate()
implements the Iterator interface so you can process a large result without memory problems using the following approach. (See example)
Bulk Inserts
Bulk inserts in Doctrine are best performed in batches, taking advantage of the transactional write-behind behavior of an
EntityManager
. [...] You may need to experiment with the batch size to find the size that works best for you. Larger batch sizes mean more prepared statement reuse internally but also mean more work duringflush
. (See example)
Version mixing both techniques (Inside Entity Repository):
$q = $this->_em->createQuery('SELECT f FROM AppBundle:FlowAndata f');
$iterableResult = $q->iterate();
$i = 0;
$batchSize = 100;
foreach ($iterableResult as $row) {
// do stuff with the data in the row, $row[0] is always the object
/** @var AppBundle\Entity\FlowAndata $flow */
$flow = $row[0];
//some check
$entity = new TestTable();
$entity->setCode($row->getCode());
//many other fields
$this->_em->persist($entity);
$i++;
if (($i % $batchSize) === 0) {
$this->_em->flush();
// Detaches all objects from Doctrine!
$this->_em->clear();
} else {
// detach from Doctrine, so that it can be Garbage-Collected immediately
$this->_em->detach($flow);
}
}
$this->_em->flush(); //Persist objects that did not make up an entire batch
$this->_em->clear();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With