Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimal way to DELETE specified rows from Oracle

Tags:

oracle

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.

My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:

7384 separate DELETE statements took 203 seconds:

delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...

7384 separate SELECT statements took 57 seconds:

select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...

7384 separate DELETE from (SELECT) statements took 214 seconds:

delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...

1 SELECT statement that has 7384 OR clauses in the where took 127.4s:

select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...

1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:

delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)

While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.

So my questions are:

  • Is there a better way to delete?
  • Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
  • Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
like image 467
GIS-Jonathan Avatar asked Apr 10 '12 16:04

GIS-Jonathan


People also ask

How do I delete multiple rows in SQL Developer?

Another way to delete multiple rows is to use the IN operator. DELETE FROM table_name WHERE column_name IN (value 1, value 2, value 3, etc...); If you want to delete all records from the table then you can use this syntax.


2 Answers

In advance of my questions being answered, this is how I'd go about it:

Minimize the number of statements and the work they do issued in relative terms.

All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.

Consider Using CREATE TABLE AS SELECT for really large deletes

If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.

Don't run DELETE statements you know won't delete any rows

If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.

CREATE TABLE TABLE1_PURGE NOLOGGING
AS 
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;

DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);

DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);

DROP TABLE TABLE1_PURGE;

and repeat.

Manage Concurrency if you have to

Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.

declare
  l_sql varchar2(4000);
begin
  for i in (select table_name from all_tables 
             where table_name in ('TABLE_1', 'TABLE_2', ...)
             order by table_name);
  loop
    l_sql := 'delete from ' || i.table_name || 
             ' where id in (select id from purge_ids) ' || 
             '   and rownum <= 1000000';
    loop
      commit;
      execute immediate l_sql;
      exit when sql%rowcount <> 1000000;  -- if we delete less than 1,000,000
    end loop;                             -- no more rows need to be deleted!
  end loop;
  commit;
end;
like image 121
Adam Musch Avatar answered Oct 11 '22 14:10

Adam Musch


Store all the to be deleted ID's into a table. Then there are 3 ways. 1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.

2) Use Oracle Bulk Delete

3) Use correlated delete query.

Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.

like image 36
Richard Chung Avatar answered Oct 11 '22 14:10

Richard Chung