Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SELECT FROM WHERE IN compared to SELECT FROM on multiple tables

Tags:

sql

I attend a database course at my school. The teacher gave us a simple exercise: consider the following, simple schema:

Table Book:
    Column title (primary key)
    Column genre (one of: "romance", "polar", ...)

Table Author:
    Column title (foreign key on Book.title)
    Column name
    Primary key on (title, name)

Among the questions was the following one:

Write the query that returns the authors who have written romance books.

I proposed this answer:

select distinct name 
from Author where title in (select title from Book where genre = "romance")

However the teacher said it was wrong, and that the correct answer was:

select distinct name 
from Book, Author 
where Book.title = Author.title 
  and genre = "romance"

When I asked for explanations all I got was a "if you had paid more attention to the course you would know why". Brilliant.

So, why is my answer incorrect? What exactly is the difference between these queries? What exactly do they do, on the DB engine level?

like image 620
user703016 Avatar asked May 18 '12 11:05

user703016


People also ask

Can you SELECT from multiple tables?

In SQL we can retrieve data from multiple tables also by using SELECT with multiple tables which actually results in CROSS JOIN of all the tables. The resulting table occurring from CROSS JOIN of two contains all the row combinations of the 2nd table which is a Cartesian product of tables.

How do I find different records in two tables in SQL?

In SQL, to fetch data from multiple tables, the join operator is used. The join operator adds or removes rows in the virtual table that is used by SQL server to process data before the other steps of the query consume the data.

How do I compare two tables in SQL to find unmatched records?

Use the Find Unmatched Query Wizard to compare two tables One the Create tab, in the Queries group, click Query Wizard. In the New Query dialog box, double-click Find Unmatched Query Wizard. On the first page of the wizard, select the table that has unmatched records, and then click Next.


2 Answers

So, why is my answer incorrect?

You answer is correct.

My guess why the teacher marked it as wrong, that he/she tried to practise the use of joins with that question. But that should have been part of the question if it was intended.

What exactly is the difference between these queries

Technically they are different indeed. A DBMS with a simple query optimizer will retrieve the subselect in a different way than the join from your teacher's answer.

I wouldn't be surprised if a DBMS with good optimizer might actually come up with the same execution plan for both queries.

Edit

I created some testdata with 50000 books, 50000 authors and 7 different genres to test (smaller numbers don't really make sense as the optimizers tend to simply grab the whole table then). The statement would return 7144 rows.

PostgreSQL

The execution plans are nearly identical with some small change in the "join" method.

Here is the plan for the sub-select version: http://explain.depesz.com/s/eov
Here is the plan for the join version: http://explain.depesz.com/s/aTI

Surprisingly, the join version has a slightly higher cost value.

Oracle

Both plans are 100% identical:

--------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |  6815 |   399K|       |   273   (2)| 00:00:04 |
|   1 |  HASH UNIQUE        |        |  6815 |   399K|   464K|   273   (2)| 00:00:04 |
|*  2 |   HASH JOIN         |        |  6815 |   399K|       |   172   (2)| 00:00:03 |
|*  3 |    TABLE ACCESS FULL| BOOK   |  6815 |   166K|       |    69   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| AUTHOR | 50000 |  1708K|       |   103   (1)| 00:00:02 |
--------------------------------------------------------------------------------------

Looking at the statistics when using autotrace there is also no difference whatsoever. I didn't bother to actually create a trace file to analyze it as I don't expect to see a difference there.

Things don't really change if an index on book.genre is added. Oracle sticks with the full table scan (even with 100000 rows). Probably because the tables are not very wide and a lot of rows fit on a single page.

PostgreSQL does use the index for both statements but there is still no real difference between the plans.

like image 132
a_horse_with_no_name Avatar answered Nov 01 '22 16:11

a_horse_with_no_name


Both queries are valid and return the same.

Your teacher uses quite outdated (though still valid) join syntax, and you are using the construct which is less efficient in some databases (MySQL, for instance).

If I were your teacher, I would write the query as this:

SELECT  DISTINCT name
FROM    books b
JOIN    authors a
ON      a.title = b.title
WHERE   b.genre = 'romance'

but still accept both your and your teacher's queries, if the course was not specific to MySQL optimization.

Can't it be what the teacher meant when he/she said about paying attention?

Update:

On the DB engine level both queries would be optimized to use the same plan, except if the DB engine is MySQL.

In MySQL, your query would be forced to use Authors as a leading table, while for you teacher's query, the optimizer can choose which table to make leading depending on the table statistics.

like image 32
Quassnoi Avatar answered Nov 01 '22 16:11

Quassnoi