Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joins between two tables generating Cartesian product

I am reading a book "Inside Microsoft SQL Server 2008: T-SQL Querying* which is saying by an example that when doing any joins between two tables first the Cartesian Product happen between them then it is getting filtered with the ON condition then by "RIGHT", "LEFT" or "FULL" join type.

From an example from that book,

SELECT C.customerid, COUNT(O.orderid) AS numorders
FROM dbo.Customers AS C
LEFT OUTER JOIN dbo.Orders AS O
ON C.customerid = O.customerid

Customer table has 4 rows and Orders has 7. So, first Cartesian product will generate 4*7 = 28 rows, then it will get filter by "ON" clause and LEFT OUTER.

Does that mean that irrespective of the type of join I use, every time Cartesian product is going to happen between the table? Then why we see performance difference between different joins?

like image 317
Zerotoinfinity Avatar asked Nov 05 '13 20:11

Zerotoinfinity


2 Answers

SQL Server certainly doesn't calculate the cartesian product for every join and then filter it, what it does do is take your SQL statement with left, right, inner.... whatever join type you have specified, then the optimizer will make a decision based on the statistics that are present on the table on what physical join operator to use.

There are 3 physical operators:

  • Nested loops join
  • Merge Join
  • Hash Join

All 3 have their own ideal scenarios where they are best used (I'm not going to explain them here, there are loads of articles on each of these), and it mostly depends on the cardinality estimate for each table involved in the join and the statistics on how many rows the optimizer expects to get back as to which one is used.

Craig Freedman has a great series of blog posts discussing how joins work in SQL server which are all here:

Joins - Craig Freedman

I would recommend looking at the bottom 5 articles in that list, which include an introduction to joins, a summary of join properties and then reasonably in depth information on each physical join operator.

like image 186
steoleary Avatar answered Sep 21 '22 06:09

steoleary


the

any joins between two tables first the Cartesian Product happen between them then it is getting filtered with the ON condition then by "RIGHT", "LEFT" or "FULL" join type.

is only a logical description of what is done. The result will be the same as this but it will be implemented differntly depending on what indices you have and what data is in the table.

See set showplan on and then do a query and it will explain how the data is looked up. Hopefully the book will explain this as you getfurther into it.

like image 43
mmmmmm Avatar answered Sep 23 '22 06:09

mmmmmm