I am trying to insert (in a mySQL database) datas from a "large" CSV file (3Mo / 37000 lines / 7 columns) using doctrine data fixtures.
The process is very slow and at this time I could not succeed (may be I had to wait a little bit more).
I suppose that doctrine data fixtures are not intended to manage such amount of datas ? Maybe the solution should be to import directly my csv into database ?
Any idea of how to proceed ?
Here is the code :
<?php
namespace FBN\GuideBundle\DataFixtures\ORM;
use Doctrine\Common\DataFixtures\AbstractFixture;
use Doctrine\Common\DataFixtures\OrderedFixtureInterface;
use Doctrine\Common\Persistence\ObjectManager;
use FBN\GuideBundle\Entity\CoordinatesFRCity as CoordFRCity;
class CoordinatesFRCity extends AbstractFixture implements OrderedFixtureInterface
{
public function load(ObjectManager $manager)
{
$csv = fopen(dirname(__FILE__).'/Resources/Coordinates/CoordinatesFRCity.csv', 'r');
$i = 0;
while (!feof($csv)) {
$line = fgetcsv($csv);
$coordinatesfrcity[$i] = new CoordFRCity();
$coordinatesfrcity[$i]->setAreaPre2016($line[0]);
$coordinatesfrcity[$i]->setAreaPost2016($line[1]);
$coordinatesfrcity[$i]->setDeptNum($line[2]);
$coordinatesfrcity[$i]->setDeptName($line[3]);
$coordinatesfrcity[$i]->setdistrict($line[4]);
$coordinatesfrcity[$i]->setpostCode($line[5]);
$coordinatesfrcity[$i]->setCity($line[6]);
$manager->persist($coordinatesfrcity[$i]);
$this->addReference('coordinatesfrcity-'.$i, $coordinatesfrcity[$i]);
$i = $i + 1;
}
fclose($csv);
$manager->flush();
}
public function getOrder()
{
return 1;
}
}
Two rules to follow when you create big batch imports like this:
Disable SQL Logging: ($manager->getConnection()->getConfiguration()->setSQLLogger(null);
) to avoid huge memory loss.
Flush and clear frequently instead of only once at the end. I suggest you add if ($i % 25 == 0) { $manager->flush(); $manager->clear() }
inside your loop, to flush every 25 INSERTs.
EDIT: One last thing I forgot: don't keep your entities inside variables when you don't need them anymore. Here, in your loop, you only need the current entity that is being processed, so don't store previous entity in a $coordinatesfrcity
array. This might lead you to memory overflow if you keep doing that.
There is a great example in the Docs: http://doctrine-orm.readthedocs.org/projects/doctrine-orm/en/latest/reference/batch-processing.html
Use a modulo (x % y) expression to implement batch processing, this example will insert 20 at a time. You may be able to optimise this depending on your server.
$batchSize = 20;
for ($i = 1; $i <= 10000; ++$i) {
$user = new CmsUser;
$user->setStatus('user');
$user->setUsername('user' . $i);
$user->setName('Mr.Smith-' . $i);
$em->persist($user);
if (($i % $batchSize) === 0) {
$em->flush();
$em->clear(); // Detaches all objects from Doctrine!
}
}
$em->flush(); //Persist objects that did not make up an entire batch
$em->clear();
For fixtures which need lots of memory but don't depend on each other, I get around this problem by using the append flag to insert one entity (or smaller group of entities) at a time:
bin/console doctrine:fixtures:load --fixtures="memory_hungry_fixture.file" --append
Then I write a Bash script which runs that command as many times as I need.
In your case, you could extend the Fixtures command and have a flag which does batches of entities - the first 1000 rows, then the 2nd 1000, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With