I'm looking to add some utility methods to help avoid a lot of n+1 issues in a legacy application.
The common pattern is this:
select a.* /* over 10 columns */
from [table-A] a
where /* something */
Retrieved into a collection of ClassA
record instances
Then the sub-instances are lazy retrieved:
select b.* /* over 10 columns */
from [sub-table-B] b
where b.ParentId = @ClassA_ID
This results in an n+1 selects issue. Mostly this isn't a major problem as only a couple of ClassA
instances are being retrieved on an infrequently hit page, but in an increasing number of places this n+1 issue causes the pages to become too slow as the application has scaled.
I'm looking to replace a part of the existing data access code of this application so that the ClassA
instances and ClassB
instances are retrieved together.
I think there are 3 ways this could be done:
1) Get the ClassA
instances as we do now, then get the ClassB
instances in one aggregated call:
select b.*
from [sub-table-B] b
where b.ParentId in ( /* list of parent IDs */ )
This is two separate DB calls, and the query plan of the dynamic SQL will not be cacheable (due to the list of IDs).
2) Get the ClassB
instances with a sub query:
select b.*
from [sub-table-B] b
inner join [table-A] a
on b.ParentId = a.[ID]
where /* something */
This is also two DB calls, and query against [table-A]
has to be evaluated twice.
3) Get all together and de-dupicate the ClassA
instances:
select a.*, b.*
from [table-A] a
left outer join [sub-table-B] b
on a.[ID] = b.ParentId
where /* something */
This is just one DB call, but now we get the contents of [table-A]
repeated - the result set will be larger and the time sending the data from the DB to the client will be more.
So really this is 3 possible compromises:
I can test these three patterns for any one parent-child pair of tables, but I have loads of them. What I want to know is which pattern is consistently quicker? More importantly why? Is one of these compromises an obvious performance-killer?
What do existing mechanisms like Linq, EF and NHibernate use?
Is there a 4th way that's better than all 3?
I think EF and L2S use your third approach - there is definitly just one db call.
Normally more db roundtrips take more time than less db roundtrips with bigger resultsets.
Maybe there are some edge cases where you have massive data in table A and the bigger resultset increases transfer time to the clien too much.
But thats mainly a question of latency and bandwith between db server and client.
A 4th way might be to write a stored proc which returns more than one resultset. One for each table you query with just the records you need. That fits to your 1st approach but reduced to one roundtrip. But that would complicate things a bit and is not as flexible as the other approaches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With