Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sql joins as venn diagram

I've had trouble understanding joins in sql and came upon this image which I think might help me. The problem is that I don't fully understand it. For example, the join in the top right corner of the image, which colors the full B circle red and but only the overlap from A. The image makes it seem like circle B is the primary focus of the sql statement, but the sql statement itself, by starting with A (select from A, join B), conveys the opposite impression to me, namely that A would be the focus of the sql statement.

Similarly, the image below that only includes data from the B circle, so why is A included at all in the join statement?

Question: Working clockwise from the top right and finishing in the center, can someone provide more information about the representation of each sql image, explaining

a) why a join would be necessary in each case (for example, especially in situations where no data's taken from A or B i.e. where only A or B but not both is colored)

b) and any other detail that would clarify why the image is a good representation of the sql

sql join diagram

like image 446
BrainLikeADullPencil Avatar asked Dec 21 '12 21:12

BrainLikeADullPencil


People also ask

What are the 4 types of joins in SQL?

Four types of joins: left, right, inner, and outer.

What does (+) mean in SQL joins?

The plus sign is Oracle syntax for an outer join. There isn't a minus operator for joins. An outer join means return all rows from one table. Also return the rows from the outer joined where there's a match on the join key.


2 Answers

I agree with Cade about the limitations of Venn diagrams here. A more apposite visual representation might be this.

Tables

Tables

SELECT A.Colour, B.Colour FROM A CROSS JOIN B SQL Fiddle

The cross join (or cartesian product) produces a result with every combination of the rows from the two tables. Each table has 4 rows so this produces 16 rows in the result.

Cross Join

SELECT A.Colour, B.Colour FROM A INNER JOIN B ON A.Colour = B.Colour SQL Fiddle

The inner join logically returns all rows from the cross join that match the join condition. In this case five do.

Inner Join

SELECT A.Colour, B.Colour FROM A INNER JOIN B ON A.Colour NOT IN ('Green','Blue') SQL Fiddle

The inner join condition need not necessarily be an equality condition and it need not reference columns from both (or even either) of the tables. Evaluating A.Colour NOT IN ('Green','Blue') on each row of the cross join returns.

inner 2

An inner join condition of 1=1 would evaluate to true for every row in the cross join so the two are equivalent (SQL Fiddle).

SELECT A.Colour, B.Colour FROM A LEFT OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

Outer Joins are logically evaluated in the same way as inner joins except that if a row from the left table (for a left join) does not join with any rows from the right hand table at all it is preserved in the result with NULL values for the right hand columns.

LOJ

SELECT A.Colour, B.Colour FROM A LEFT OUTER JOIN B ON A.Colour = B.Colour WHERE B.Colour IS NULL SQL Fiddle

This simply restricts the previous result to only return the rows where B.Colour IS NULL. In this particular case these will be the rows that were preserved as they had no match in the right hand table and the query returns the single red row not matched in table B. This is known as an anti semi join.

It is important to select a column for the IS NULL test that is either not nullable or for which the join condition ensures that any NULL values will be excluded in order for this pattern to work correctly and avoid just bringing back rows which happen to have a NULL value for that column in addition to the un matched rows.

loj is null

SELECT A.Colour, B.Colour FROM A RIGHT OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

Right outer joins act similarly to left outer joins except they preserve non matching rows from the right table and null extend the left hand columns.

ROJ

SELECT A.Colour, B.Colour FROM A FULL OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

Full outer joins combine the behaviour of left and right joins and preserve the non matching rows from both the left and the right tables.

FOJ

like image 51
Martin Smith Avatar answered Sep 21 '22 03:09

Martin Smith


I think your main underlying confusion is that when (for example) only A is highlighted in red, you're taking that to mean "the query only returns data from A", but in fact it means "the query only returns data for those cases where A has a record". The query might still contain data from B. (For cases where B does not have a record, the query will substitute NULL.)

Similarly, the image below that only includes data from the B circle, so why is A included at all in the join statement?

If you mean — the image where A is entirely in white, and there's a red crescent-shape for the part of B that doesn't overlap with A, then: the reason that A appears in the query is, A is how it finds the records in B that need to be excluded. (If A didn't appear in the query, then Venn diagram wouldn't have A, it would only show B, and there'd be no way to distinguish the desired records from the unwanted ones.)

The image makes it seem like circle B is the primary focus of the sql statement, but the sql statement itself, by starting with A (select from A, join B), conveys the opposite impression to me, namely that A would be the focus of the sql statement.

Quite right. For this reason, RIGHT JOINs are relatively uncommon; although a query that uses a LEFT JOIN can nearly always be re-ordered to use a RIGHT JOIN instead (and vice versa), usually people will write their queries with LEFT JOIN and not with RIGHT JOIN.

like image 22
ruakh Avatar answered Sep 18 '22 03:09

ruakh