Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL WHERE on a large table -> Join to small table first or put FK directly in the WHERE clause?

What is better in a WHERE clause?

I have a large table, with a FK to a small table. I can search on the FK directly, or I can join to the FK table and set the WHERE restriction on the joined table. What is better / preferable?

So this:

SELECT lt.* FROM LargeTable lt 
WHERE lt.SomeId in(12,55)

Or this:

SELECT lt.* FROM LargeTable lt 
INNER JOIN SmallTable st ON lt.SomeId=st.ItemId
WHERE st.Id in(12,55)


I tested this with Set statistics time on, but I didn't expect this as a result. Who can explain what happens here?

First test without join:

(946 row(s) affected)
 SQL Server Execution Times:
   CPU time = 1544 ms,  elapsed time = 1580 ms.

Second test with the join

(946 row(s) affected)
 SQL Server Execution Times:
   CPU time = 2636 ms,  elapsed time = 366 ms.

EDIT: When i do SELECT Id instead of SELECT *, then the first query without join has a lower elapsed time, and the query cost in the execution plan is 25% for no join vs 75% for the query with the join.

like image 270
Erik Dekker Avatar asked Oct 10 '22 04:10

Erik Dekker


1 Answers

Based on your execution plans, both queries are essentially scanning every record in the entire large table... the second query is simply finding a small set of records from the small table before scanning the large table, which is why the relative cost is 50% for both.

I'd recommend considering an index on largeTable.SomeId, and then go with the first query:

SELECT lt.* FROM LargeTable lt 
WHERE lt.SomeId in(12,55)

EDIT:

So the big question is why did the query with the join have a shorter duration than the query without the join.

I think Martin Smith gave the answer to this question:

Your second one gets a parallel plan the first one doesn't

You'll notice that your first query had a shorter CPU time, but a longer elapsed time. To roughly summarize, your first query took less effort for the server to complete, but your second query used a parallel plan, and enlisted multiple processors to perform the query, so it took less time to complete, but more overall effort.

like image 174
Michael Fredrickson Avatar answered Oct 15 '22 00:10

Michael Fredrickson