I have a common database joining situation involving three tables. One table, A, is the main table with a primary key named id
. Tables B and C contain auxiliary data for entries and A, and each also has a column named id
which is a foreign key pointing to A.id
. Now, if I want all data from A, B and C in one query, I would write:
SELECT *
FROM A
INNER JOIN B
ON B.id = A.id
INNER JOIN C
ON C.id = A.id
which of course works perfectly.
Recently, our DBA told us that this is inefficient in Oracle, and you need to join conditions between C and B as well, as follows:
SELECT *
FROM A
INNER JOIN B
ON B.id = A.id
INNER JOIN C
ON C.id = A.id AND C.id = B.id
This looked redundant to me, so naturally I didn't believe here. Until I actually ran into a slow query that had a terrible execution plan, and managed to fix it by exactly adding the missing join condition. I ran explain plan on both versions: the one without the "redundant" query condition had a cost of 1 035 while the "improved" one had 389 (and there were huge differences in cardinality and bytes as well). Both queries produced the exact same result.
Can anyone explain why this extra condition makes a difference? To me C and B are not even related. Note also that if you take away the other join condition it is equally bad - they both need to be there.
What you've got is two issues.
Firstly, with the original SQL, the optimizer makes an estimate about the number of rows in A with rows matching the ID in B which also have a matching row in C. The estimate is inaccurate, and the wrong plan is chosen.
Now, you add the redundant condition. Oracle assumes no conditions are truly redundant (as, if they were, an intelligent developer wouldn't include them). It also assumes that each condition is independent of the others. For example, a select where hair = 'bald' may get 10% of a table, a select where gender = 'F' may get 50%. Oracle would assume that a select where hair = 'bald' and gender = 'F' would give 5% (whereas in reality baldness is mostly limited to men).
By adding the 'redundant' predicate, Oracle will over-estimate the numbers or rows to be excluded and will choose the plan accordingly.
If, with the redundant predicate, Oracle is choosing a better plan, it suggests that the estimates for the original query over-estimated the number of rows matching. The redundant predicate is countering that with an under-estimate. And in this case, two wrongs are making a right.
It's not a solution I'd recommend, but if it works.....
PS. I'm assuming data types of all the IDs are consistent. If B.ID and C.ID are date and A.ID was character, or vice versa, then it is possible to have some rows where A.ID = B.ID and A.ID = C.ID but B.ID != C.ID, because implicit conversion may lose timestamps.
Interesting.
It seems that Oracle can infer this transitive equality only under some circumstances : They call it Transitive Closure and you should be able to benefit from it when query rewriting is enabled.
But to be on the safe side, better spell out the redundant predicate yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With