Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where clause affecting join

Dumb question time. Oracle 10g.

Is it possible for a where clause to affect a join?

I've got a query in the form of:

select * from
(select product, product_name from products p
join product_serial ps on product.id = ps.id
join product_data pd on pd.product_value = to_number(p.product_value)) product_result
where product_name like '%prototype%';

Obviously this is a contrived example. No real need to show the table structure as it's all imaginary. Unfortunately, I can't show the real table structure or query. In this case, p.product_value is a VARCHAR2 field which in certain rows have an ID stored inside it rather than text. (Yes, bad design - but something I inherited and am unable to change)

The issue is in the join. If I leave out the where clause, the query works and rows are returned. However, if I add the where clause, I get "invalid number" error on the pd.product_value = to_number(p.product_value) join condition.

Obviously, the "invalid number" error happens when rows are joined which contain non-digits in the p.product_value field. However, my question is how are those rows being selected? If the join succeeds without the outer where clause, shouldn't the outer where clause just select rows from the result of the join? It appears what is happening is the where clause is affecting what rows are joined, despite the join being in an inner query.

Is my question making sense?

like image 635
wadesworld Avatar asked Oct 23 '22 02:10

wadesworld


2 Answers

It affects the plan that's generated.

The actual order that tables are joined (and so filtered) is not dictated by the order you write your query, but by the statistics on the tables.

In one version, the plan generated co-incidentally means that the 'bad' rows never get processed; because the preceding joins filtered the result set down to a point that they're never joined on.

The introduction of the WHERE clause has meant that ORACLE now believes a different order of join is better (because filtering by the product name requires a certain index, or because it narrows the data down a lot, etc).

This new order means that the 'bad' rows get processed before the join that filters them out.


I would endeavour to clean the data before querying it. Possibly by creating a derived column where the value is already cast to a number, or left as NULL if it is not possible to do so.

You can also use EXPLAIN PLAN to see the different plans being gerenated from your queries.

like image 66
MatBailie Avatar answered Nov 15 '22 08:11

MatBailie


Short answer: yes.

Long answer: the query engine is free to rewrite your query however it wants, as long as it returns the same results. All of the query is available to it to use for the purpose of producing the most efficient query it can.

In this case, I'd guess that there is an index that covers what you are wanting, but it doesn't cover product name, when you add that to the where clause, the index isn't used and instead there's a scan where both conditions are tested at the same time, thus your error.

Which is really an error in your join condition, you shouldn't be using to_number unless you are sure it's a number.

like image 26
jmoreno Avatar answered Nov 15 '22 08:11

jmoreno