ORM Select n + 1 performance; join or no join

Tags:

There are similar questions to this, but I don't think anyone has asked this particular question.

Scenario:

Customer - Order (where Order has a CustomerID) - OrderPart - Part

I want a query that returns a customer with all its orders and each order with its parts.

Now I have two main choices:

Use a nested loop (which produces separate queries)
Use data loading options (which produces a single query join)

The question:

Most advice and examples on ORMs suggest using option 2 and I can see why. However, option 2 will potentially be sending back a huge amount of duplicated data, eg:

Option 1 results (3 queries):

ID  Name       Country
1   Customer1  UK

ID  Name
1   Order1
2   Order2

ID  Name
1   Part1
2   Part2
3   Part3

Option 2 results (1 query):

ID  Name       Country  ID  Name    ID Name
1   Customer1  UK       1   Order1  1  Part1
1   Customer1  UK       1   Order1  2  Part2
1   Customer1  UK       1   Order1  3  Part3
1   Customer1  UK       2   Order2  1  Part1
1   Customer1  UK       2   Order2  2  Part2

Option 1 sends back 13 fields with 3 queries. Option 2 sends back 42 fields in 1 query. Now imagine Customer table has 30 fields and Orders have more complex sub joins, the data duplication can quickly become huge.

What impact on overall performance do the following things have:

Overhead of making a database connection
Time taken to send data (potentially across network if on different server)
Bandwidth

Is option 2 always the best choice, option 1 the best choice or does it depend on the situation? If it depends, what criteria should you use to determine? Are any ORMs clever enough to work it out for themselves?

403

asked Sep 02 '09 17:09

JontyMC

2 Answers

Overhead of making a database connection

Very little if they are on the same subnet, which they usually are. If they're not then this is still not a huge overhead and can be overcome with caching, which most ORMs have (NHibernate has 1st and 2nd level caching).

Time taken to send data (potentially across network if on different server)

For SELECT N+1 this will obviously be longer as it will have to send the select statement each time, which might be up to 1k long. It will also have to grab a new connection from the pool. Chatty versus chunky use to be an argument around 2002-2003 but now it really doesn't make a huge difference unless this is a really big application, in which case you will probably want a more experienced (or better paid) pundit giving his views - i.e. a consultant.

I would favour joins however, as databases will be optimised for this usage over their 10 or more years of development. If performance is really slow a View can sort this out, or Stored Procedure.

By the way, SELECT N+1 is probably the commonest performance problem people experience with NHibernate when they first start using it (including me), and is something that actually takes tweaking to sort out. This is because NHibernate is to ORMs what C++ is to languages.

Bandwidth

An extra SELECT statement for every Customer will eventually build up to however many Customer objects * Orders. So for a large system this might be noticeable - but as I mentioned, ORMs usually have caching mechanisms in place to negate this problem. The amount of SELECT statements also isn't going to be that huge considering:

You're on the same network as the SQL server most of the time
The increased amount of bytes account for about an extra 0.5-50k of extra bandwidth? Think how fast that is on most servers.

108

answered Oct 10 '22 05:10

Chris S

a great deal of this is going to depend on the amount of data you are going through. The join, while returning more fields, is going to run markedly faster (as a rule) than the Option 1 set of queries. From my personal experience, slow-downs are almost always at that level, the actual running of the query, not the sheer amount of data being passed along whatever pipe you have.

answered Oct 10 '22 03:10

moleboy

Related questions
                            
                                Can SQL Server Pivot without knowing the resulting column names?
                            
                                SQLBulkCopy or Bulk Insert
                            
                                Is MySQL naturally slow at this kind of query, or do I have it misconfigured?
                            
                                Is there a quick way to check if ANY column is NULL?
                            
                                Understanding MySQL Cursor Types
                            
                                SQL Server passing identifiers to stored procedures/dynamic SQL
                            
                                Do a dry-run of an Alembic upgrade
                            
                                Mysql: Create inline table within select statement?
                            
                                The database [dbName] is not accessible. (ObjectExplorer)
                            
                                SQL - Query - max(count())
                            
                                Code reuse and modularity in SQL
                            
                                CommandType.Text vs CommandType.StoredProcedure
                            
                                Php & Sql Injection - UTF8 POC
                            
                                Visual Studio Diff Window: How to Edit .sql Files
                            
                                SQL: Most Overdue pair of numbers?
                            
                                How to use time-series with Sqlite, with fast time-range queries?
                            
                                ANT sql task: How to run SQL and PL/SQL and notice execution failure?
                            
                                Return Stream from WCF service, using SqlFileStream
                            
                                What is the best way to store single non-repeating data to a database?
                            
                                Ordering distinct column values by (first value of) other column in aggregate function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With