Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding redudant join conditions in Oracle results in a different plan

Tags:

sql

join

oracle

I have a common database joining situation involving three tables. One table, A, is the main table with a primary key named id. Tables B and C contain auxiliary data for entries and A, and each also has a column named id which is a foreign key pointing to A.id. Now, if I want all data from A, B and C in one query, I would write:

SELECT *
FROM A
INNER JOIN B
ON B.id = A.id
INNER JOIN C
ON C.id = A.id

which of course works perfectly.

Recently, our DBA told us that this is inefficient in Oracle, and you need to join conditions between C and B as well, as follows:

SELECT *
FROM A
INNER JOIN B
ON B.id = A.id
INNER JOIN C
ON C.id = A.id AND C.id = B.id

This looked redundant to me, so naturally I didn't believe here. Until I actually ran into a slow query that had a terrible execution plan, and managed to fix it by exactly adding the missing join condition. I ran explain plan on both versions: the one without the "redundant" query condition had a cost of 1 035 while the "improved" one had 389 (and there were huge differences in cardinality and bytes as well). Both queries produced the exact same result.

Can anyone explain why this extra condition makes a difference? To me C and B are not even related. Note also that if you take away the other join condition it is equally bad - they both need to be there.

like image 333
waxwing Avatar asked Jun 14 '09 15:06

waxwing


2 Answers

What you've got is two issues.

Firstly, with the original SQL, the optimizer makes an estimate about the number of rows in A with rows matching the ID in B which also have a matching row in C. The estimate is inaccurate, and the wrong plan is chosen.

Now, you add the redundant condition. Oracle assumes no conditions are truly redundant (as, if they were, an intelligent developer wouldn't include them). It also assumes that each condition is independent of the others. For example, a select where hair = 'bald' may get 10% of a table, a select where gender = 'F' may get 50%. Oracle would assume that a select where hair = 'bald' and gender = 'F' would give 5% (whereas in reality baldness is mostly limited to men).

By adding the 'redundant' predicate, Oracle will over-estimate the numbers or rows to be excluded and will choose the plan accordingly.

If, with the redundant predicate, Oracle is choosing a better plan, it suggests that the estimates for the original query over-estimated the number of rows matching. The redundant predicate is countering that with an under-estimate. And in this case, two wrongs are making a right.

It's not a solution I'd recommend, but if it works.....

PS. I'm assuming data types of all the IDs are consistent. If B.ID and C.ID are date and A.ID was character, or vice versa, then it is possible to have some rows where A.ID = B.ID and A.ID = C.ID but B.ID != C.ID, because implicit conversion may lose timestamps.

like image 94
Gary Myers Avatar answered Nov 02 '22 23:11

Gary Myers


Interesting.

It seems that Oracle can infer this transitive equality only under some circumstances : They call it Transitive Closure and you should be able to benefit from it when query rewriting is enabled.

But to be on the safe side, better spell out the redundant predicate yourself.

like image 22
Thilo Avatar answered Nov 02 '22 23:11

Thilo