Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the fastest performing way to avoid n+1 issues and why?

I'm looking to add some utility methods to help avoid a lot of n+1 issues in a legacy application.

The common pattern is this:

select a.* /* over 10 columns */
from [table-A] a
where /* something */

Retrieved into a collection of ClassA record instances

Then the sub-instances are lazy retrieved:

select b.* /* over 10 columns */
from [sub-table-B] b
where b.ParentId = @ClassA_ID

This results in an n+1 selects issue. Mostly this isn't a major problem as only a couple of ClassA instances are being retrieved on an infrequently hit page, but in an increasing number of places this n+1 issue causes the pages to become too slow as the application has scaled.

I'm looking to replace a part of the existing data access code of this application so that the ClassA instances and ClassB instances are retrieved together.

I think there are 3 ways this could be done:

1) Get the ClassA instances as we do now, then get the ClassB instances in one aggregated call:

select b.*
from [sub-table-B] b
where b.ParentId in ( /* list of parent IDs */ )

This is two separate DB calls, and the query plan of the dynamic SQL will not be cacheable (due to the list of IDs).

2) Get the ClassB instances with a sub query:

select b.*
from [sub-table-B] b
    inner join [table-A] a
        on b.ParentId = a.[ID]
where /* something */

This is also two DB calls, and query against [table-A] has to be evaluated twice.

3) Get all together and de-dupicate the ClassA instances:

select a.*, b.*
from [table-A] a
    left outer join [sub-table-B] b 
        on a.[ID] = b.ParentId
where /* something */

This is just one DB call, but now we get the contents of [table-A] repeated - the result set will be larger and the time sending the data from the DB to the client will be more.

So really this is 3 possible compromises:

  1. 2 DB calls, no query caching
  2. 2 DB calls, complex query evaluated twice
  3. 1 DB call, significantly larger result set

I can test these three patterns for any one parent-child pair of tables, but I have loads of them. What I want to know is which pattern is consistently quicker? More importantly why? Is one of these compromises an obvious performance-killer?

What do existing mechanisms like Linq, EF and NHibernate use?

Is there a 4th way that's better than all 3?

like image 427
Keith Avatar asked Nov 13 '22 16:11

Keith


1 Answers

I think EF and L2S use your third approach - there is definitly just one db call.

Normally more db roundtrips take more time than less db roundtrips with bigger resultsets.

Maybe there are some edge cases where you have massive data in table A and the bigger resultset increases transfer time to the clien too much.

But thats mainly a question of latency and bandwith between db server and client.

A 4th way might be to write a stored proc which returns more than one resultset. One for each table you query with just the records you need. That fits to your 1st approach but reduced to one roundtrip. But that would complicate things a bit and is not as flexible as the other approaches.

like image 102
Jan Avatar answered Jan 05 '23 16:01

Jan