Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL INNER JOIN automatic optimization in HSQLDB

The execution speed of the following queries is dramatically different. The second one finishes orders of magnitude faster than the first one.

SELECT * FROM A INNER JOIN B ON A.X=B.Y WHERE B.Z=1
SELECT * FROM A INNER JOIN (SELECT * FROM B) ON A.X=B.Y WHERE B.Z=1

It would be great if someone would write why this is. The database is HSQLDB with JDBC.

Additional information: Version of HSQLDB is 2.3.2. And column A.X is indexed but column B.Y is not.

like image 471
user3726374 Avatar asked Nov 11 '22 01:11

user3726374


1 Answers

The answer is: indexing

Imagine that I have a dictionary and someone gives me a task to find 5000 words in it. This task would take me several hours.
But now imagine that this dictionary is unsorted. It would take me years to find all these words in it.
Computer is faster and for first task it needs only miliseconds while second task remains several seconds.

Why is the first query so slow?

This is because there is INNER JOIN and it is done on unindexed column.

Why is the second query so fast?

This is because there is subquery. This subquery is materialized into temporary table and index is created for join column. So you are not joining now with unindexed B table but with indexed temporary table. HSQLDB creates this indexing on temporary table to make it easier to join. Even if you change join condition to more complex (for example: A.X = B.Y + 2*B.Z) this query will still be fast. This means that HSQLDB creates index on expression used in join condition.

like image 58
rtruszk Avatar answered Dec 11 '22 01:12

rtruszk