There are similar questions to this, but I don't think anyone has asked this particular question.
Scenario:
Customer - Order (where Order has a CustomerID) - OrderPart - Part
I want a query that returns a customer with all its orders and each order with its parts.
Now I have two main choices:
The question:
Most advice and examples on ORMs suggest using option 2 and I can see why. However, option 2 will potentially be sending back a huge amount of duplicated data, eg:
Option 1 results (3 queries):
ID Name Country 1 Customer1 UK ID Name 1 Order1 2 Order2 ID Name 1 Part1 2 Part2 3 Part3
Option 2 results (1 query):
ID Name Country ID Name ID Name 1 Customer1 UK 1 Order1 1 Part1 1 Customer1 UK 1 Order1 2 Part2 1 Customer1 UK 1 Order1 3 Part3 1 Customer1 UK 2 Order2 1 Part1 1 Customer1 UK 2 Order2 2 Part2
Option 1 sends back 13 fields with 3 queries. Option 2 sends back 42 fields in 1 query. Now imagine Customer table has 30 fields and Orders have more complex sub joins, the data duplication can quickly become huge.
What impact on overall performance do the following things have:
Is option 2 always the best choice, option 1 the best choice or does it depend on the situation? If it depends, what criteria should you use to determine? Are any ORMs clever enough to work it out for themselves?
The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery. By using joins, you can maximize the calculation burden on the database i.e., instead of multiple queries using one join query.
Basically, join order DOES matter because if we can join two tables that will reduce the number of rows needed to be processed by subsequent steps, then our performance will improve.
Detecting the Problem If you see a bunch of similar queries, this often indicates an N+1 query issue (or a similar kind of query batching problem). Restructuring code so you can run a single query to fetch all the data at once will always improve the performance of the page.
I won't leave you in suspense, between Joins and Subqueries, joins tend to execute faster. In fact, query retrieval time using joins will almost always outperform one that employs a subquery. The reason is that joins mitigate the processing burden on the database by replacing multiple queries with one join query.
Very little if they are on the same subnet, which they usually are. If they're not then this is still not a huge overhead and can be overcome with caching, which most ORMs have (NHibernate has 1st and 2nd level caching).
For SELECT N+1
this will obviously be longer as it will have to send the select statement each time, which might be up to 1k long. It will also have to grab a new connection from the pool. Chatty versus chunky use to be an argument around 2002-2003 but now it really doesn't make a huge difference unless this is a really big application, in which case you will probably want a more experienced (or better paid) pundit giving his views - i.e. a consultant.
I would favour joins however, as databases will be optimised for this usage over their 10 or more years of development. If performance is really slow a View can sort this out, or Stored Procedure.
By the way, SELECT N+1
is probably the commonest performance problem people experience with NHibernate when they first start using it (including me), and is something that actually takes tweaking to sort out. This is because NHibernate is to ORMs what C++ is to languages.
An extra SELECT
statement for every Customer
will eventually build up to however many Customer
objects * Orders
. So for a large system this might be noticeable - but as I mentioned, ORMs usually have caching mechanisms in place to negate this problem. The amount of SELECT
statements also isn't going to be that huge considering:
a great deal of this is going to depend on the amount of data you are going through. The join, while returning more fields, is going to run markedly faster (as a rule) than the Option 1 set of queries. From my personal experience, slow-downs are almost always at that level, the actual running of the query, not the sheer amount of data being passed along whatever pipe you have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With