In my application I need to massively improve insert performance. Example: A file with about 21K records takes over 100 min to insert. There are reasons it can takes some time, like 20 min or so but over 100 min is just too long.
Data is inserted into 3 tables (many-to-many). Id's are generated from a sequence but I have already googled and set hibernate.id.new_generator_mappings = true
and allocationSize + sequence increment to 1000.
Also the amount of data is not anything extraordinary at all, the file is 90 mb.
I have verified with visual vm that most of the time is spent in jdbc driver (postgresql) and hibernate. I think the issue is related to a unique constraint in the child table. The service layer makes a manual check (=SELECT) before inserting. If the record already exists, it reuses it instead of waiting for a constraint exception.
So to sum it up for the specific file there will be 1 insert per table (could be different but not for this file which is the ideal (fastest) case). That means total 60k inserts + 20k selects. Still over 100 min seems very long (yeah hardware counts and it is on a simple PC with 7200 rpm drive, no ssd or raid). However this is an improved version over a previous application (plain jdbc) on which the same insert on this hardware took about 15 min. Considering that in both cases about 4-5 min is spent on "pre-processing" the increase is massive.
Any tips who this could be improved? Is there any batch loading functionality?
To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.
Drop Index before Insertion of Data We should drop the index before insertion of a large amount of data. This makes the insert statement run faster.
One of the most common ways to improve the performance of an INSERT operation is to use the APPEND optimizer hint. APPEND forces the optimizer to perform a direct path INSERT and appends new values above the high water mark (the end of the table) while new blocks are being allocated.
see
spring-data JPA: manual commit transaction and restart new one
Add entityManager.flush()
and entityManager.clear()
after every n-th call to save() method. If you use hibernate add hibernate.jdbc.batch_size=100
which seems like a reasonable choice.
Performance increase was > 10x, probably close to 100x.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With