Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing Oracle stored procedures

I was recently tasked with optimizing some existing Oracle stored procedures. Each of the stored procedures query the database and generate an XML file output. One in particular was taking about 20 minutes to finish execution. Taking a look at it there were several nested loops and unnecessary queries. For example, rather than doing a

SELECT * from Employee e, Department d WHERE e.DEPT_ID = d.ID
--write data from query to XML

it was more like

FOR emp_rec in ( SELECT * from employee )
LOOP
   SELECT * from Department WHERE id = emp_rec.DEPT_ID;
   --write data from query to XML
END LOOP;

Changing all these cases to look more like the first option sped up the procedures immensely. My question is why? Why is doing a join in the select query quicker than manually combining the tables? What are the underlying processes?

like image 562
bjsample Avatar asked Jan 18 '23 14:01

bjsample


1 Answers

Let's look at how the original version is likely to be processed.

FOR emp_rec in ( SELECT * from employee )
LOOP
   SELECT * from Department WHERE id = emp_rec.DEPT_ID;
   --write data from query to XML
END LOOP;

The loop query is likely to do a full table scan on employee. Then, for each row returned, it will execute the inner query. Assuming that id is the primary key of department, each execution of the query is likely to do a unique lookup using the primary key index.

Sounds great, right? Unique index lookups are usually the fastest way to get a single row (except for explicit lookup by ROWID). But think about what this is doing over multiple iterations of the loop. Presumably, every employee belongs to a department; every department has employees; and most or all departments have multiple employees.

So on multiple iterations of the loop, you're repeating the exact same work for the inner query multiple times. Yes, the data blocks may be cached so you don't have do repeat physical reads, but accessing data in the cache does have some CPU overhead, which can become very significant when the same blocks are accessed over and over again.

Furthermore, ultimately you will likely want every row in department at least once, and probably more than once. Since every single block in the table will need to be read, you're not really saving work by doing an index lookup -- you're adding work.

When you rewrite the loop as a single query, the optimizer is able to take this into account. One option it has would be to do a nested loop join driven by employee, which would be essentially the same as the explicit loop in PL/SQL (minus the context switching as pointed out by Mark). However, given the relationships between the two tables, and the lack of any filtering predicate, the optimizer will be able to tell that it's more efficient to simply full-scan both tables and do a merge or hash join. This actually results in fewer physical IOs (assuming a clean cache at the start of each execution) and much fewer logical IOs.

like image 80
Dave Costa Avatar answered Feb 01 '23 07:02

Dave Costa