Hibernate batch-delete vs single delete

Tags:

EDIT: Based on some of my debugging and logging, I think the question boils down to why is DELETE FROM table WHERE id = x much faster than DELETE FROM table WHERE id IN (x) where x is just a single ID.

I recently tested batch-delete versus deleting each row one by one and noticed that batch-delete was much slower. The table had triggers for delete, update, and insert but I've tested with and without the triggers and each time batch-delete was slower. Can anyone shed some light on why this is the case or share tips on how I can debug this? From what I understand, I can't really reduce the number of times the trigger activates but I had originally figured that lowering the number of "delete" query would help with the performance.

I've included some information below, please let me know if I've left out anything relevant.

Deletion are done in batches of 10,000 and the code look something like :

private void batchDeletion( Collection<Long> ids ) {
  StringBuilder sb = new StringBuilder();
  sb.append( "DELETE FROM ObjImpl WHERE id IN (:ids)" );

  Query sql = getSession().createQuery( sb.toString() );
  sql.setParameterList( "ids", ids );

  sql.executeUpdate();
}

The code to delete just a single row is basically:

SessionFactory.getCurrentSession().delete(obj);

The table has two indexes which is not used in any of the deletion. No cascade operation will occur.

Here is a sample of the EXPLAIN ANALYZE of DELETE FROM table where id IN ( 1, 2, 3 );:

Delete on table  (cost=12.82..24.68 rows=3 width=6) (actual time=0.143..0.143 rows=0 loops=1)
  ->  Bitmap Heap Scan on table  (cost=12.82..24.68 rows=3 width=6) (actual time=0.138..0.138 rows=0 loops=1)
        Recheck Cond: (id = ANY ('{1,2,3}'::bigint[]))
        ->  Bitmap Index Scan on pk_table  (cost=0.00..12.82 rows=3 width=0) (actual time=0.114..0.114 rows=0 loops=1)
              Index Cond: (id = ANY ('{1,2,3}'::bigint[]))
Total runtime: 3.926 ms

I've vacuumed and reindexed each time I reload my data for testing and my test data contains 386,660 rows.

The test is to delete all the rows and I'm not using TRUNCATE because normally there's a selection criteria but for testing purposes, I've made the criteria include all rows. With triggers enabled, deleting each row one by one took 193,616ms whereas batch-delete took 285,558ms. I then disabled the triggers and got 93,793ms for single row deletes and 181,537ms for batch-delete. The trigger goes and sums up values and updates another table - basically bookkeeping.

I've played around with lower batch sizes (100 and 1) and they all seem to perform worse.

EDIT: Turned on Hibernate logging and for single row by row deletes, it's basically doing: delete from table where id=? and the EXPLAIN ANALYZE:

Delete on table  (cost=0.00..8.31 rows=1 width=6) (actual time=0.042..0.042 rows=0 loops=1)
  ->  Index Scan using pk_table on table  (cost=0.00..8.31 rows=1 width=6) (actual time=0.037..0.037 rows=0 loops=1)
        Index Cond: (id = 3874904)
Total runtime: 0.130 ms

EDIT: Was curious if the list actually contained 10,000 ID, if Postgres would do something different: nope.

Delete on table  (cost=6842.01..138509.15 rows=9872 width=6) (actual time=17.170..17.170 rows=0 loops=1)
  ->  Bitmap Heap Scan on table  (cost=6842.01..138509.15 rows=9872 width=6) (actual time=17.160..17.160 rows=0 loops=1)
        Recheck Cond: (id = ANY ('{NUMBERS 1 THROUGH 10,000}'::bigint[]))
        ->  Bitmap Index Scan on pk_table  (cost=0.00..6839.54 rows=9872 width=0) (actual time=17.139..17.139 rows=0 loops=1)
              Index Cond: (id = ANY ('{NUMBERS 1 THROUGH 10,000}'::bigint[]))
Total runtime: 17.391 ms

EDIT: Based on the EXPLAIN ANALYZE of the above, I've retrieved some logging from the actual delete operations. Below is logging of two variation of single row by row deletes.

Here are some single deletes:

2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?
2013-03-14 13:09:25,424:delete from table where id=?

Here is the other variation of single deletes (the list is just 1 item)

2013-03-14 13:49:59,858:delete from table where id in (?)
2013-03-14 13:50:01,460:delete from table where id in (?)
2013-03-14 13:50:03,040:delete from table where id in (?)
2013-03-14 13:50:04,544:delete from table where id in (?)
2013-03-14 13:50:06,125:delete from table where id in (?)
2013-03-14 13:50:07,707:delete from table where id in (?)
2013-03-14 13:50:09,275:delete from table where id in (?)
2013-03-14 13:50:10,833:delete from table where id in (?)
2013-03-14 13:50:12,369:delete from table where id in (?)
2013-03-14 13:50:13,873:delete from table where id in (?)

Both are IDs that exists in the table and should be sequential.

EXPLAIN ANALYZE of DELETE FROM table WHERE id = 3774887;

Delete on table  (cost=0.00..8.31 rows=1 width=6) (actual time=0.097..0.097 rows=0 loops=1)
  ->  Index Scan using pk_table on table  (cost=0.00..8.31 rows=1 width=6) (actual time=0.055..0.058 rows=1 loops=1)
        Index Cond: (id = 3774887)
Total runtime: 0.162 ms

EXPLAIN ANALYZE of DELETE FROM table WHERE id IN (3774887);

Delete on table  (cost=0.00..8.31 rows=1 width=6) (actual time=0.279..0.279 rows=0 loops=1)
  ->  Index Scan using pk_table on table  (cost=0.00..8.31 rows=1 width=6) (actual time=0.210..0.213 rows=1 loops=1)
        Index Cond: (id = 3774887)
Total runtime: 0.452 ms

0.162 vs 0.452 considered significant difference?

EDIT:

Set batch size to 50,000 and Hibernate didn't like that idea:

java.lang.StackOverflowError
        at org.hibernate.hql.ast.util.NodeTraverser.visitDepthFirst(NodeTraverser.java:40)
        at org.hibernate.hql.ast.util.NodeTraverser.visitDepthFirst(NodeTraverser.java:41)
        at org.hibernate.hql.ast.util.NodeTraverser.visitDepthFirst(NodeTraverser.java:42)
....

592

asked Mar 14 '13 00:03

nevets1219

1 Answers

Ok, the first thing you have to note is that SQL has to get converted into a plan in some way. Your EXPLAIN results suggest that the logic here is fundamentally different for an equality compared to an IN(vals) construct.

WHERE id = 1;

Is transformed to a simple equality filter.

WHERE id IN (1);

Is transformed into an array match of:

WHERE id = ANY(ARRAY[1]);

Apparently the planner is not smart enough to notice that these are mathematically identical where an array has exactly one member. So what it's doing is planning for an array of any size which is why you get the nested loop bitmap index scan.

What's interesting here is not just that it is slower but that performance holds out better for the most part. So with one member in the in() clause, it is 40 times slower, and with 10000 members, it is only 170 times slower, but that also means that the 10000 member version is also 50 times faster than 10000 separate index scans on id.

So what is happening here is that the planner is selecting a plan which performs better when there are a large number of id's checked but performs more poorly when there are only a few.

100

answered Oct 06 '22 20:10

Chris Travers

Related questions
                            
                                How to get reference to running service?
                            
                                Should I use upper-case naming to declare java constant variables?
                            
                                Generate JNLP dynamically
                            
                                Android illegalstatexception
                            
                                C3P0 Configurations! Where and How?
                            
                                Set the permgen size for the Child JVM processes that Maven will spin off
                            
                                Maven + Java package declaration
                            
                                Cancel Google Drive upload possible?
                            
                                Java generics SuppressWarnings("unchecked") mystery
                            
                                Eclemma always reporting 0% of code coverage
                            
                                What is the lifecycle of spring bean?
                            
                                UI components of java programs appearing as blank boxes
                            
                                Memory settings with thousands of threads
                            
                                REST API with Akka in Java
                            
                                Graphical dataflow composition framework [closed]
                            
                                Three-tier architecture and exceptions
                            
                                new BigInteger(String) performance / complexity
                            
                                Listagg function and ORA-01489: result of string concatenation is too long
                            
                                @PreAuthorize does not work with method security rules and method parameters
                            
                                Does FileOutputStream truncate an existing file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hibernate batch-delete vs single delete

Tags:

java

postgresql

hibernate

nevets1219

People also ask

1 Answers

Chris Travers

Recent Activity

Donate For Us