I have the following statement to find unambiguous names in my data (~1 Million entries):
select Prename, Surname from person p1
where Prename is not null and Surname is not null
and not exists (
select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname)
and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL
Oracle shows a huge cost of 1477315000 and execution does not end after 5 minutes. Simply splitting the OR into an own exists subclause boosts performance to 0,5 s and costs to 45000:
select Prename, Surname from person p1
where Prename is not null and Surname is not null
and not exists (
select * from person p2 where p1.Surname = p2.Surname and
p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
select * from person p2 where p1.Surname = p2.Altname and
p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL
It's not my question to tweak this to the best, as it is only a seldomly executed query, and I know CONTACT is surpassing any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.
If EXPLODE is applied on an instance of SQL. ARRAY <T>, the resulting rowset contains a single column of type T where each item in the array is placed into its own row. If the array value was empty or null, then the resulting rowset is empty. If EXPLODE is applied on an instance of SQL.
Contrary to the answers - In my experience, for views with lots of joins, doing a direct query runs faster.
Does the number of columns on a table have impact on performance? Yes, more number of columns, more metadata is needed. We suggest to use least number of columns needed, ideally < 100 columns on a table if possible.
The answer is in the EXPLAIN PLAN for your queries. They may semantically be equivalent but the execution plan behind the scenes for your queries are vastly different.
EXISTS operates differently from a JOIN and essentially, your OR filter statement is what joins the tables together.
No JOIN occurs in the second query as you are only retrieving records from one table.
The results of your two queries may be semantically equivalent, but the execution is not operationally equivalent. Your second example never makes use of an OR operator to combine predicates. All of your predicates in the second example are combined using an AND.
The performance is better because, if the first predicate that is combined with an AND does not evaluate to true then the second (or any other predicate) is skipped, (not evaluated). If you used an OR then both (or all) predicates would have to be evaluated frequently thus slowing down your query. (ORed predicates are checked until one evaluates to true.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With