Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does SQL cost explode with simple "or"?

I have the following statement to find unambiguous names in my data (~1 Million entries):

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) 
   and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

Oracle shows a huge cost of 1477315000 and execution does not end after 5 minutes. Simply splitting the OR into an own exists subclause boosts performance to 0,5 s and costs to 45000:

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where p1.Surname = p2.Surname and
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
   select * from person p2 where p1.Surname = p2.Altname and 
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

It's not my question to tweak this to the best, as it is only a seldomly executed query, and I know CONTACT is surpassing any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.

like image 299
stracktracer Avatar asked May 23 '11 14:05

stracktracer


People also ask

How does EXPLODE work in SQL?

If EXPLODE is applied on an instance of SQL. ARRAY <T>, the resulting rowset contains a single column of type T where each item in the array is placed into its own row. If the array value was empty or null, then the resulting rowset is empty. If EXPLODE is applied on an instance of SQL.

Are views faster than queries mysql?

Contrary to the answers - In my experience, for views with lots of joins, doing a direct query runs faster.

Does number of columns affect performance in mysql?

Does the number of columns on a table have impact on performance? Yes, more number of columns, more metadata is needed. We suggest to use least number of columns needed, ideally < 100 columns on a table if possible.


2 Answers

The answer is in the EXPLAIN PLAN for your queries. They may semantically be equivalent but the execution plan behind the scenes for your queries are vastly different.

EXISTS operates differently from a JOIN and essentially, your OR filter statement is what joins the tables together.

No JOIN occurs in the second query as you are only retrieving records from one table.

like image 194
maple_shaft Avatar answered Oct 03 '22 05:10

maple_shaft


The results of your two queries may be semantically equivalent, but the execution is not operationally equivalent. Your second example never makes use of an OR operator to combine predicates. All of your predicates in the second example are combined using an AND.

The performance is better because, if the first predicate that is combined with an AND does not evaluate to true then the second (or any other predicate) is skipped, (not evaluated). If you used an OR then both (or all) predicates would have to be evaluated frequently thus slowing down your query. (ORed predicates are checked until one evaluates to true.)

like image 22
Paul Sasik Avatar answered Oct 03 '22 05:10

Paul Sasik