Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORM Select n + 1 performance; join or no join

There are similar questions to this, but I don't think anyone has asked this particular question.

Scenario:

Customer - Order (where Order has a CustomerID) - OrderPart - Part

I want a query that returns a customer with all its orders and each order with its parts.

Now I have two main choices:

  1. Use a nested loop (which produces separate queries)
  2. Use data loading options (which produces a single query join)

The question:

Most advice and examples on ORMs suggest using option 2 and I can see why. However, option 2 will potentially be sending back a huge amount of duplicated data, eg:

Option 1 results (3 queries):

ID  Name       Country
1   Customer1  UK

ID  Name
1   Order1
2   Order2

ID  Name
1   Part1
2   Part2
3   Part3

Option 2 results (1 query):

ID  Name       Country  ID  Name    ID Name
1   Customer1  UK       1   Order1  1  Part1
1   Customer1  UK       1   Order1  2  Part2
1   Customer1  UK       1   Order1  3  Part3
1   Customer1  UK       2   Order2  1  Part1
1   Customer1  UK       2   Order2  2  Part2

Option 1 sends back 13 fields with 3 queries. Option 2 sends back 42 fields in 1 query. Now imagine Customer table has 30 fields and Orders have more complex sub joins, the data duplication can quickly become huge.

What impact on overall performance do the following things have:

  • Overhead of making a database connection
  • Time taken to send data (potentially across network if on different server)
  • Bandwidth

Is option 2 always the best choice, option 1 the best choice or does it depend on the situation? If it depends, what criteria should you use to determine? Are any ORMs clever enough to work it out for themselves?

like image 403
JontyMC Avatar asked Sep 02 '09 17:09

JontyMC


People also ask

Is join faster than select?

The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.

Do joins affect performance?

Basically, join order DOES matter because if we can join two tables that will reduce the number of rows needed to be processed by subsequent steps, then our performance will improve.

How can n 1 problems be prevented?

Detecting the Problem If you see a bunch of similar queries, this often indicates an N+1 query issue (or a similar kind of query batching problem). Restructuring code so you can run a single query to fetch all the data at once will always improve the performance of the page.

Are joins faster than separate queries?

I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.


2 Answers

Overhead of making a database connection

Very little if they are on the same subnet, which they usually are. If they're not then this is still not a huge overhead and can be overcome with caching, which most ORMs have (NHibernate has 1st and 2nd level caching).

Time taken to send data (potentially across network if on different server)

For SELECT N+1 this will obviously be longer as it will have to send the select statement each time, which might be up to 1k long. It will also have to grab a new connection from the pool. Chatty versus chunky use to be an argument around 2002-2003 but now it really doesn't make a huge difference unless this is a really big application, in which case you will probably want a more experienced (or better paid) pundit giving his views - i.e. a consultant.

I would favour joins however, as databases will be optimised for this usage over their 10 or more years of development. If performance is really slow a View can sort this out, or Stored Procedure.

By the way, SELECT N+1 is probably the commonest performance problem people experience with NHibernate when they first start using it (including me), and is something that actually takes tweaking to sort out. This is because NHibernate is to ORMs what C++ is to languages.

Bandwidth

An extra SELECT statement for every Customer will eventually build up to however many Customer objects * Orders. So for a large system this might be noticeable - but as I mentioned, ORMs usually have caching mechanisms in place to negate this problem. The amount of SELECT statements also isn't going to be that huge considering:

  • You're on the same network as the SQL server most of the time
  • The increased amount of bytes account for about an extra 0.5-50k of extra bandwidth? Think how fast that is on most servers.
like image 108
Chris S Avatar answered Oct 10 '22 05:10

Chris S


a great deal of this is going to depend on the amount of data you are going through. The join, while returning more fields, is going to run markedly faster (as a rule) than the Option 1 set of queries. From my personal experience, slow-downs are almost always at that level, the actual running of the query, not the sheer amount of data being passed along whatever pipe you have.

like image 38
moleboy Avatar answered Oct 10 '22 03:10

moleboy