Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL - Filtering large tables with joins - best practices

I have a table with a lot of data and I need to join it with some other large tables.

Only a small portion of my table is actually relevant for me each time.

When is it best to filter my data?

  1. In the where clause of the SQL.

  2. Create a temp table with the specific data and only then join it.

  3. Add the predicate to the first inner join ON clause.

  4. Some other idea.

1.

Select * 
From RealyBigTable
Inner Join AnotherBigTable On …
Inner Join YetAnotherBigTable On …
Where RealyBigTable.Type = ?

2.

Select * 
Into #temp
From RealyBigTable
Where RealyBigTable.Type = ?

Select * 
From #temp
Inner Join AnotherBigTable On …
Inner Join YetAnotherBigTable On …

3.

Select * 
From RealyBigTable
Inner Join AnotherBigTable On RealyBigTable.type = ? And … 
Inner Join YetAnotherBigTable On …

Another question: What happens first? Join or Where?

like image 508
Gil Stal Avatar asked Mar 31 '11 08:03

Gil Stal


2 Answers

Because you are using INNER JOINs the WHERE or JOIN debate only depends on your taste and style. Personally, I like to keep the links between the two tables (e.g. foreign key constraint) in the ON clause, and actual filters against data in the WHERE clause.

SQL Server will parse the query into the same token tree, and will therefore build identical query execution plans.

If you were using [LEFT/RIGHT] OUTER JOINS instead, it makes a world of difference since not only is the performance probably different, but also very likely the results.


To answer your other questions:

When is it best to filter my data?

  1. In the where clause of the SQL.
  2. Create a temp table with the specific data and only then join it.
  3. Add the predicate to the first inner join ON clause.
  4. Some other idea.

In the WHERE or ON clause, both are seen as the same. For 3, the "first inner join" has no relevance. In a multi-table INNER JOIN scenario, it really doesn't matter which goes first (in the query), as the query optimizer will shuffle the order as it sees fit.

Using a temp table is completely unnecessary and won't help, because you are having to extract the relevant portion anyway - which is what a JOIN would do as well. Moreover, if you had a good index on the JOIN conditions/WHERE filter, the index will be used to only visit the relevant data without looking at the rest of the table(s).

like image 86
RichardTheKiwi Avatar answered Oct 04 '22 13:10

RichardTheKiwi


You should put your query in the management studio, tick "include actual execution plan", and run it. That way you will get the exact answer what SQL server did with your query. From then, you can move forward with optimization.

In general:

  • The columns used for join should be indexed
  • Use the most discriminating filter first
like image 43
Endy Tjahjono Avatar answered Oct 04 '22 11:10

Endy Tjahjono