Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

doctrine2 with codeigniter foreign key insert

I've following database schema -

database schema

Now department, year and division tables are already filled with information.

I now need to insert student data. Student data is to be imported from xls file (importing and parsing part is done). As you can see in schema, columns from student_data table refers to year_id, department_di and division_id. So while inserting I need their ID field as xls has respective name values.

So I've to fetch respective ID depending upon column value for each student. So this introduces 3 queries to be fired for inserting one record in student table. Like this -

forloop(...):
     $studentData = new Entities\StudentData();

    $year =  $this->em->getRepository("Entities\Year")->findBy(array('year_name' => $this->year[$i]));
    $department =  $this->em->getRepository("Entities\Department")->findBy(array('department_name' => $this->branch[$i]));
    $division =  $this->em->getRepository("Entities\Division")->findBy(array('division_name'=>$this->division[$i]));

    $studentData->setYear($year[0]);
    $studentData->setDepartment($department[0]);
    $studentData->setDivision($division[0]);

    //other data
    .
    .
    .
    .
    .
    $this->em->persist($studentData);

endforloop();   

$this->em->flush();
$this->em->clear();

As you can see, I've to get ID withing loop for each dept, year and division. Suppose I'm importing 100 student list, so it ultimately runs 300 queries just to fetch those 3 ID fields.

Can I get IDs for year, department and division from their name directly while inserting data ? I'm new to doctrine to I don't know how to go about that.


Update If question is unclear please let me know. I can update it with more details or restructure it.

like image 712
SachinGutte Avatar asked Dec 08 '13 14:12

SachinGutte


2 Answers

Optimize

You could optimize your process without using Doctrine's result caches:

First create a map of years to their ids like so:

$yearsMap = array();

$q = $em->createQuery('SELECT y.id, y.year_name FROM Entities\Year y');

foreach ($q->getScalarResult() as $row) {
    $yearsMap[$row['year_name']] = $row['id'];
}

Also create a map of departments to their ids, and division to their ids. This will result in 3 (light) queries. The best place to put this code is in a (custom) repository.

Next you can run your loop, but "get" the actual entities like this:

$year       = $this->em->getReference('Entities\Year', $yearsMap[$this->year[$i]]);
$department = $this->em->getReference('Entities\Department', $departmentsMap[$this->branch[$i]]);
$division   = $this->em->getReference('Entities\Division', $divisionsMap[$this->division[$i]]);

I say "get", because getReference() actually creates a proxy (unless it was already loaded by the entity-manager, but in this case it probably isn't). That proxy won't be loaded yet, so no queries are performed here.

The rest of your code doesn't need changes.

Now when flush() is called, Doctrine will load each distinct year/department/division only once. This could still result in a few queries, depending on how many different years/departments/divisions are used. So if all 100 students use different years/departments/divisions, you'll end up with 403 queries (3 for the maps, 300 for loading proxies, 100 for inserting students). But if all 100 students use the same year/department/division, you'll end up with only 106 queries (3 for the maps, 3 for loading proxies, 100 for inserting students).

Optimize another way

Another way to go is to use the names you've gathered to fetch all the entities you need:

$q = $em->createQuery('SELECT y FROM Entities\Year y INDEX BY y.year_name WHERE y.year_name IN(:years)');
$q->setParameter('years', $yearNames);

$yearsMap = $q->getResult();

You now have all Year entities you need with only 1 query. You can do the same for departments and divisions.

Also note the INDEX BY in the DQL statement: This will make sure you'll get an array with year_name as key and the entity as value. You can use this straight away in your loop like so:

$year       = $yearsMap[$this->year[$i]];
$department = $departmentsMap[$this->branch[$i]];
$division   = $divisionsMap[$this->division[$i]];

The end result for 100 students will always be 103 queries (3 for the maps, 100 for inserting students).

Cache

When you need to run this loop relatively often and it strains the database, it's wise to use Doctrine's result cache. A couple of things to note though:

getReference() doesn't support result caches (yet), and result caches aren't used automatically. So I suggest you put something like this in a repository:

public function findOneYearByName($name)
{
    $q = $em->createQuery('SELECT y FROM Entities\Year y WHERE y.year_name = :year');
    $q->setParameter('year', $name);
    $q->useResultCache(true);

    return $q->getSingleResult();
}

You'd probably want to configure the result cache, see the docs about that.

Another note is that the result cache will cache the result fetched from the database, before it's hydrated. So even when using result caches, the actual entities are hydrated every time. I therefor still recommend to use maps, but implemented slightly different:

$yearsMap       = array();
$departmentsMap = array();
$divisionsMap   = array();

forloop (...):
    if (!isset($yearsMap[$this->year[$i]])) {
        $yearsMap[$this->year[$i]] = $this->em->getRepository('Entities\Year')->findOneYearByName($this->year[$i]);
    }

    if (!isset($departmentsMap[$this->branch[$i]])) {
        $departmentsMap[$this->branch[$i]] = $this->em->getRepository('Entities\Department')->findOneDepartmentByName($this->branch[$i]);
    }

    if (!isset($divisionsMap[$this->division[$i]])) {
        $divisionsMap[$this->division[$i]] = $this->em->getRepository('Entities\Division')->findOneDivisionByName($this->division[$i]);
    }

    $year       = $yearsMap[$this->year[$i]];
    $department = $departmentsMap[$this->branch[$i]];
    $division   = $divisionsMap[$this->division[$i]];

This will make sure each distinct year/department/division is hydrated only once.

PS: Using a result cache for "Optimize another way" won't work as efficient, because the names of the years/departments/divisions are likely to be different each time you run your loop. With every change of the names the queries change, and cached results can't be used.

DBAL

Can I get IDs for year, department and division from their name directly while inserting data?

You can, but you won't be using the ORM, but only DBAL. You basically do this:

$connection = $em->getConnection();
$statement  = $conn->executeQuery('insert query', array('parameter1', 'etc'));
$statement->execute();

I doubt this will be more efficient, because MySQL (or whatever vendor you're using) will still perform those 3 (sub)queries for every insert, they just don't "go over the wire". And you don't get any help from the ORM, like managing associations, etc.

Still, you can find everything on the subject here.

like image 120
Jasper N. Brouwer Avatar answered Oct 13 '22 01:10

Jasper N. Brouwer


Did you check whether it runs 300 queries? Because it most certainly shouldn't, unless all students have different years, departments and divisions, which seems highly unlikely. And if it were, the 300 queries would at least be required with or without Doctrine, barring other optimizations.

The good thing is, Doctrine is much more than just a fancy way of accessing objects - it's a complete database abstraction layer, providing many more services, such as a full-blown entity cache. The following line:

$year =  $this->em->getRepository("Entities\Year")->findBy(array('year_name' => $this->year[$i]));

This should, for a single given year, execute at most 1 query - after that the result is stored, fully hydrated, in Doctrine's internal caches inside the entity manager. And that is assuming you're using the stock MemoryCache, enabled by default if you didn't specify anything else, which only caches during the single request. If you install APC, Memcache, Memcached or even the FilesystemCache (pick one!) the results are likely cached throughout multiple requests.

So, in a nutshell, you're imagining a problem that isn't there, or easily alleviated with a few simple configuration calls. Unless we're talking about the hypothetical case where all years, departments and divisions are unique - then you are indeed triggering 300 queries. The problem in that case however isn't Doctrine's - it's just doing what you command it do, separately retrieving 300 unique objects. In that case, nobody's stopping you from writing some intelligent code yourself around Doctrine, like:

// Build cache before loop
$years = [];
foreach($this->em->getRepository("Entities\Year")->findAll() as $year)
  $years[$year->getYearName()] = $year;

// Now loop much faster because everything's already indexed
forloop(...) :
  $studentData = new Entities\StudentData();
  $studentData->setYear($years[$this->year[$i]]);
endforloop;

And all of a sudden you have 1 'expensive' query instead of a 100 slightly cheaper ones. Doctrine is a convenience to make a lot of DB-related coding a ton easier and more structured, it doesnt't prohibit smart performance-oriented coding like this. In the end you're still the coder and Doctrine is just one tool in your belt that you can wield as you see fit.

like image 28
Niels Keurentjes Avatar answered Oct 13 '22 01:10

Niels Keurentjes