Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Hibernate Batch insert works?

Can some one explain me how

hibernate.jdbc.batch_size = 1000 

and

if (i % 100 == 0 && i > 0) {
    session.flush();
    session.clear();
}

together works?

like image 818
Pratik Jaiswal Avatar asked Mar 08 '23 08:03

Pratik Jaiswal


1 Answers

Hibernate property hibernate.jdbc.batch_size is a way for hibernate to optimize your insert or update statetment whereas flushing loop is about memory exhaustion.

Without batchsize when you try to save an entity hibernate fire 1 insert statement, thus if you work with a big collection, for each save hibernate fire 1 statement.

Imagine the following chunk of code :

for (Entity e : entities) {
    session.save(e);
}

Here hibernate will fire 1 insert statement per entity in your collection. if you have 100 elements in your collection so 100 insert statements will be fire. This approach is not very efficient for 2 main reasons:

    1. You increase exponentially your 1st level cache and you'll probably finish soon with an OutOfMemoryException.
    1. You degrade performance due to network round trip for each statement.

hibernate.jdbc.batch_size and the flushing loop have 2 differents purposes but are complementary.

Hibernate use the first to control how many entities will be in batch. Under the cover Hibernate use java.sql.Statement.addBatch(...) and executeBatch() methods.

So hibernate.jdbc.batch_size tells hibernate how many times it have to call addBatch() before calling executeBatch().

So setting this property doesn't prevent you of memory exhaution.

In order to take care of the memory you have to flush your session on a regular basis and this is the purpose of flushing loop.

When you write:

for (Entity e : entities) {
    if (i % 100 == 0 && i > 0) {
        session.flush();
        session.clear();
    }
}

you're telling hibernate to flush and clear the session every 100 entities (you release memory).

So now what is the link between the 2 ?

In order to be optimal you have to define your jdbc.batch_size and your flushing param identical.

if you define a flush param lower that the batch_size you choose so hibernate will flush the session more frequently so it will create small batch until it arrive to batch size, which is not efficient.

when the 2 are the same hibernate will only execute batches of optimal size except for the last one if size of collection is not a multiple of your batch_size.

You can see the following post for more details about this last point.

like image 183
Abass A Avatar answered Mar 15 '23 09:03

Abass A