I have the following statement to find unambiguous names in my data (~1 Million entries): <pre class="prettyprint"><code>select Prename, Surname from person p1 where Prename is not null and Surname is not null and not exists ( select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and inv_date IS NULL </code></pre> Oracle shows a huge cost of 1477315000 and execution does not end after 5 minutes. Simply splitting the OR into an own exists subclause boosts performance to 0,5 s and costs to 45000: <pre class="prettyprint"><code>select Prename, Surname from person p1 where Prename is not null and Surname is not null and not exists ( select * from person p2 where p1.Surname = p2.Surname and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and not exists ( select * from person p2 where p1.Surname = p2.Altname and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id ) and inv_date IS NULL </code></pre> It's not my question to tweak this to the best, as it is only a seldomly executed query, and I know CONTACT is surpassing any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.

The answer is in the EXPLAIN PLAN for your queries. They may semantically be equivalent but the execution plan behind the scenes for your queries are vastly different. EXISTS operates differently from a JOIN and essentially, your OR filter statement is what joins the tables together. No JOIN occurs in the second query as you are only retrieving records from one table.

Why does SQL cost explode with simple "or"?

Tags:

performance

sql

oracle

sqlperformance

I have the following statement to find unambiguous names in my data (~1 Million entries):

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) 
   and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

Oracle shows a huge cost of 1477315000 and execution does not end after 5 minutes. Simply splitting the OR into an own exists subclause boosts performance to 0,5 s and costs to 45000:

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where p1.Surname = p2.Surname and
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
   select * from person p2 where p1.Surname = p2.Altname and 
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

It's not my question to tweak this to the best, as it is only a seldomly executed query, and I know CONTACT is surpassing any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.

299

asked May 23 '11 14:05

stracktracer

2 Answers

The answer is in the EXPLAIN PLAN for your queries. They may semantically be equivalent but the execution plan behind the scenes for your queries are vastly different.

EXISTS operates differently from a JOIN and essentially, your OR filter statement is what joins the tables together.

No JOIN occurs in the second query as you are only retrieving records from one table.

194

answered Oct 03 '22 05:10

maple_shaft

The results of your two queries may be semantically equivalent, but the execution is not operationally equivalent. Your second example never makes use of an OR operator to combine predicates. All of your predicates in the second example are combined using an AND.

The performance is better because, if the first predicate that is combined with an AND does not evaluate to true then the second (or any other predicate) is skipped, (not evaluated). If you used an OR then both (or all) predicates would have to be evaluated frequently thus slowing down your query. (ORed predicates are checked until one evaluates to true.)

answered Oct 03 '22 05:10

Paul Sasik

Related questions
                            
                                MySQL: week date range from week number in a query
                            
                                SQL : ERROR: more than one row returned by a subquery used as an expression
                            
                                Good resources for learning database optimization part [closed]
                            
                                SQL Server difference between catalog views, information schema views vs DMVs
                            
                                Program Structure -- Simple Commandline To Do List App -- What's the Haskell way?
                            
                                Stored Procedure to Open and Read a text file
                            
                                SQL "IN subquery" when subquery can be NULL
                            
                                Cross join behaviour (SQLServer 2008)
                            
                                How to find out if a column is an auto increment field in oracle?
                            
                                Standard method for MySQL's IF() function
                            
                                Sql: average of a dates
                            
                                Setting database name as a variable in SQL
                            
                                Deletion of duplicate records using one query only
                            
                                How do I SQL query for words with punctuation in Postgresql?
                            
                                Self referencing foreign-key constraints and delete
                            
                                SqlDataSource.Select()? How do I use this? (ASP.net)
                            
                                How to escape ":" in Oracle dynamic SQL and also have bind variables?
                            
                                How do I change the default SELECT TOP 1000 query to use * instead of each field?
                            
                                Listing indices using sqlalchemy
                            
                                Incorrect syntax near the keyword "Primary"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With