We've got an Oracle 11g installation that is starting to get big. This database is the backend to a parallel optimization system running on a cluster. Input to the process is contained in the database along with output from the optimization steps. The input includes rote configuration data and some binary files (using 11g's SecureFiles). The output includes 1D, 2D, 3D, and 4D data currently stored in the DB. DB Structure: <pre class="prettyprint"><code>/* Metadata tables */ Case(CaseId, DeleteFlag, ...) On Delete Cascade CaseId OptimizationRun(OptId, CaseId, ...) On Delete Cascade OptId OptimizationStep(StepId, OptId, ...) On Delete Cascade StepId /* Data tables */ Files(FileId, CaseId, Blob) /* deletes are near instantateous here */ /* Data per run */ OnedDataX(OptId, ...) TwoDDataY1(OptId, ...) /* packed representation of a 1D slice */ /* Data not only per run, but per step */ TwoDDataY2(StepId, ...) /* packed representation of a 1D slice */ ThreeDDataZ(StepId, ...) /* packed representation of a 2D slice */ FourDDataZ(StepId, ...) /* packed representation of a 3D slice */ /* ... About 10 or so of these tables exist */ </code></pre> A reaper script comes around daily and looks for cases with the <code>DeleteFlag = 1</code> and proceeds with the <code>DELETE FROM Case WHERE DeleteFlag = 1</code>, allowing the cascades to continue. This strategy works great for read/write, but is now outstripping our capabilities when we want to purge data! The rub is deleting a Case takes ~20-40 minutes depending on the size and often overloads our archiver space. The next major version of the product will take a "from the ground up" approach to solving the problem. The next minor release needs to stay within the confines of data stored in the database. So, for the minor release we need an approach that can improve delete performance and at most require moderate changes to the database. <ol> <li>REF Partitioning, but the question is HOW? I would love to do INTERVAL on <code>Case</code> and REF on the rest, but that isn't supported. Is there some way to manually partition <code>OptimizationRun</code> by <code>CaseId</code> through a trigger?</li> <li>Disable archiving/redo logs for deletes? Couldn't find a HINT to go with this one. Not sure it is even feasible.</li> <li> <strike>Truncate? This likely would need some sorta complicated table setup. But maybe I'm not considering all of my option.</strike> (per answer, stricken) </li> </ol> To help illustrate the issue, the data in question per case ranges from 15MiB to 1.5GiB with anywhere from 20k to 2M rows. Update: Current size of the DB is ~1.5TB.

Just some thoughts: <ol> <li>I assume you have indexes on all foreign keys. ON DELETE CASCADE will hold row level locks until the Case delete is complete, and with no indexes will hold table locks I believe and be super slow of course</li> <li>Do you have any deferred constraints? This would most likely slow things down for Oracle cascading through the various table deletes</li> <li>Have you tried to do the deletes separately for all affected tables (instead of relying on on delete cascade)? Not as easy, but you may be surprised.</li> </ol> EDIT: One more thought. You may consider doing a SOFT delete on Case table, meaning you have a status field that will tell your app if that Case should be considered. This flag could have many different values, but maybe 'A' for active and 'I' for inactive. Assuming you are always using Case as a driving/primary table in joins to other tables, you can avoid the HARD deletes all-together (and occasionally do a cleanup off hours on whatever schedule if you like). Apps would need to be aware of this flag of course, and you'd be tied to joining back to Case table. May or may not fit for your situation...

Strategy to improve Oracle DELETE performance

Tags:

oracle

oracle11g

We've got an Oracle 11g installation that is starting to get big. This database is the backend to a parallel optimization system running on a cluster. Input to the process is contained in the database along with output from the optimization steps. The input includes rote configuration data and some binary files (using 11g's SecureFiles). The output includes 1D, 2D, 3D, and 4D data currently stored in the DB.

DB Structure:

/* Metadata tables */
Case(CaseId, DeleteFlag, ...) On Delete Cascade CaseId
OptimizationRun(OptId, CaseId, ...) On Delete Cascade OptId
OptimizationStep(StepId, OptId, ...) On Delete Cascade StepId

/* Data tables */
Files(FileId, CaseId, Blob) /* deletes are near instantateous here */

/* Data per run */
OnedDataX(OptId, ...)
TwoDDataY1(OptId, ...) /* packed representation of a 1D slice */

/* Data not only per run, but per step */
TwoDDataY2(StepId, ...)  /* packed representation of a 1D slice */
ThreeDDataZ(StepId, ...) /* packed representation of a 2D slice */
FourDDataZ(StepId, ...)  /* packed representation of a 3D slice */
/* ... About 10 or so of these tables exist */

A reaper script comes around daily and looks for cases with the DeleteFlag = 1 and proceeds with the DELETE FROM Case WHERE DeleteFlag = 1, allowing the cascades to continue.

This strategy works great for read/write, but is now outstripping our capabilities when we want to purge data! The rub is deleting a Case takes ~20-40 minutes depending on the size and often overloads our archiver space. The next major version of the product will take a "from the ground up" approach to solving the problem. The next minor release needs to stay within the confines of data stored in the database.

So, for the minor release we need an approach that can improve delete performance and at most require moderate changes to the database.

REF Partitioning, but the question is HOW? I would love to do INTERVAL on Case and REF on the rest, but that isn't supported. Is there some way to manually partition OptimizationRun by CaseId through a trigger?
Disable archiving/redo logs for deletes? Couldn't find a HINT to go with this one. Not sure it is even feasible.
~~Truncate? This likely would need some sorta complicated table setup. But maybe I'm not considering all of my option.~~ (per answer, stricken)

To help illustrate the issue, the data in question per case ranges from 15MiB to 1.5GiB with anywhere from 20k to 2M rows.

Update: Current size of the DB is ~1.5TB.

686

asked Apr 26 '11 15:04

user7116

2 Answers

Deleting data is a hell of a job, for the database. It has to create before images, update indexes, write redo logs and remove the data. This is a slow process. If you can have a window to perform this task, easiest and fastest is to build new tables, containing the wanted data. Drop the old tables and rename the new tables. This requires some setup work, that is obvious but is very well possible to make. One step less drastic is to drop the indexes before the delete takes place. My vote would go for CTAS (Create Table As Select from) and build the new tables. A nice partitioning schema would certainly be helpful, maybe in the next release Oracle can combine interval and reference partitioning. It would be very nice to have.

Disabling logging .... can not be done for deletes but CTAS can use nologging. Make a backup when ready and make sure to transfer the datafiles to the standby database, if you have one.

answered Sep 21 '22 23:09

ik_zelf

Just some thoughts:

I assume you have indexes on all foreign keys. ON DELETE CASCADE will hold row level locks until the Case delete is complete, and with no indexes will hold table locks I believe and be super slow of course
Do you have any deferred constraints? This would most likely slow things down for Oracle cascading through the various table deletes
Have you tried to do the deletes separately for all affected tables (instead of relying on on delete cascade)? Not as easy, but you may be surprised.

EDIT:

One more thought. You may consider doing a SOFT delete on Case table, meaning you have a status field that will tell your app if that Case should be considered. This flag could have many different values, but maybe 'A' for active and 'I' for inactive. Assuming you are always using Case as a driving/primary table in joins to other tables, you can avoid the HARD deletes all-together (and occasionally do a cleanup off hours on whatever schedule if you like). Apps would need to be aware of this flag of course, and you'd be tied to joining back to Case table. May or may not fit for your situation...

answered Sep 19 '22 23:09

tbone

Related questions
                            
                                oracle convert unix epoch time to date
                            
                                How to get first and last day of week in Oracle?
                            
                                Call to undefined function oci_connect()
                            
                                Oracle connection string without tnsnames.ora file
                            
                                Oracle SQL: How to use more than 1000 items inside an IN clause [duplicate]
                            
                                Could not load file or assembly 'Oracle.DataAccess' or one of its dependencies. An attempt was made to load a program with an incorrect format
                            
                                DROP all tables starting with "EXT_" in Oracle SQL
                            
                                I don't understand Collation? (Mysql, RDBMS, Character sets)
                            
                                Why does Oracle allow having several subqueries with the same alias_name in a WITH clause?
                            
                                Performance test sql queries
                            
                                How to clear the ODP.NET connection pool on connection errors?
                            
                                Joining tables when data may or may not exist
                            
                                Schema independent Entity Framework Code First Migrations
                            
                                Should SELECT ... FOR UPDATE always contain ORDER BY?
                            
                                DBD-Oracle (1.74 or 1.76) with oracle instantclient 11.2 on win10 wsl ubuntu
                            
                                Slow performance on Hibernate + Java but fast when I use TOAD with the same native Oracle query
                            
                                ODP.NET Oracle.ManagedDataAccess causes ORA-12537 network session end of file
                            
                                Install ORDS with Apex 5.0
                            
                                Is there any equivalent for packages (Oracle) in MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With