SELECT FROM WHERE IN compared to SELECT FROM on multiple tables

Tags:

sql

I attend a database course at my school. The teacher gave us a simple exercise: consider the following, simple schema:

Table Book:
    Column title (primary key)
    Column genre (one of: "romance", "polar", ...)

Table Author:
    Column title (foreign key on Book.title)
    Column name
    Primary key on (title, name)

Among the questions was the following one:

Write the query that returns the authors who have written romance books.

I proposed this answer:

select distinct name 
from Author where title in (select title from Book where genre = "romance")

However the teacher said it was wrong, and that the correct answer was:

select distinct name 
from Book, Author 
where Book.title = Author.title 
  and genre = "romance"

When I asked for explanations all I got was a "if you had paid more attention to the course you would know why". Brilliant.

So, why is my answer incorrect? What exactly is the difference between these queries? What exactly do they do, on the DB engine level?

620

asked May 18 '12 11:05

2 Answers

So, why is my answer incorrect?

You answer is correct.

My guess why the teacher marked it as wrong, that he/she tried to practise the use of joins with that question. But that should have been part of the question if it was intended.

What exactly is the difference between these queries

Technically they are different indeed. A DBMS with a simple query optimizer will retrieve the subselect in a different way than the join from your teacher's answer.

I wouldn't be surprised if a DBMS with good optimizer might actually come up with the same execution plan for both queries.

Edit

I created some testdata with 50000 books, 50000 authors and 7 different genres to test (smaller numbers don't really make sense as the optimizers tend to simply grab the whole table then). The statement would return 7144 rows.

PostgreSQL

The execution plans are nearly identical with some small change in the "join" method.

Here is the plan for the sub-select version: http://explain.depesz.com/s/eov
Here is the plan for the join version: http://explain.depesz.com/s/aTI

Surprisingly, the join version has a slightly higher cost value.

Oracle

Both plans are 100% identical:

--------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |  6815 |   399K|       |   273   (2)| 00:00:04 |
|   1 |  HASH UNIQUE        |        |  6815 |   399K|   464K|   273   (2)| 00:00:04 |
|*  2 |   HASH JOIN         |        |  6815 |   399K|       |   172   (2)| 00:00:03 |
|*  3 |    TABLE ACCESS FULL| BOOK   |  6815 |   166K|       |    69   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| AUTHOR | 50000 |  1708K|       |   103   (1)| 00:00:02 |
--------------------------------------------------------------------------------------

Looking at the statistics when using autotrace there is also no difference whatsoever. I didn't bother to actually create a trace file to analyze it as I don't expect to see a difference there.

Things don't really change if an index on book.genre is added. Oracle sticks with the full table scan (even with 100000 rows). Probably because the tables are not very wide and a lot of rows fit on a single page.

PostgreSQL does use the index for both statements but there is still no real difference between the plans.

132

answered Nov 01 '22 16:11

a_horse_with_no_name

Both queries are valid and return the same.

Your teacher uses quite outdated (though still valid) join syntax, and you are using the construct which is less efficient in some databases (MySQL, for instance).

If I were your teacher, I would write the query as this:

SELECT  DISTINCT name
FROM    books b
JOIN    authors a
ON      a.title = b.title
WHERE   b.genre = 'romance'

but still accept both your and your teacher's queries, if the course was not specific to MySQL optimization.

Can't it be what the teacher meant when he/she said about paying attention?

Update:

On the DB engine level both queries would be optimized to use the same plan, except if the DB engine is MySQL.

In MySQL, your query would be forced to use Authors as a leading table, while for you teacher's query, the optimizer can choose which table to make leading depending on the table statistics.

answered Nov 01 '22 16:11

Quassnoi

Related questions
                            
                                How to escape square brackets inside square brackets for field name
                            
                                How to find maximum avg
                            
                                Oracle SQL -- insert multiple rows into a table with one statement?
                            
                                Check for x consecutive days - given timestamps in database
                            
                                SELECT with LIMIT in Codeigniter
                            
                                How to delete leading empty space in a SQL Database Table using MS SQL Server Management Studio
                            
                                Write a Postgres Get or Create SQL Query
                            
                                SQL Server unpivot multiple columns
                            
                                Insert into with union
                            
                                extract date only from given timestamp in oracle sql
                            
                                Generate SQL to update primary key
                            
                                Find which rows where foreign key constraint fail
                            
                                Error Message: TOK_ALLCOLREF is not supported in current context - while Using DISTINCT in HIVE
                            
                                Select inside CASE THEN
                            
                                Getting insert id with insert PDO MySQL
                            
                                MySQL replace all whitespaces with -
                            
                                Sequentially number rows by keyed group in SQL?
                            
                                Convert hex to binary in MySQL
                            
                                How to do LEFT JOIN with more than 2 tables?
                            
                                Why are sequences not updated when COPY is performed in PostgreSQL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With