Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk insert or update with Hibernate?

I need to consume a rather large amounts of data from a daily CSV file. The CSV contains around 120K records. This is slowing to a crawl when using hibernate. Basically, it seems hibernate is doing a SELECT before every single INSERT (or UPDATE) when using saveOrUpdate(); for every instance being persisted with saveOrUpdate(), a SELECT is issued before the actual INSERT or a UPDATE. I can understand why it's doing this, but its terribly inefficient for doing bulk processing, and I'm looking for alternatives

I'm confident that the performance issue lies with the way I'm using hibernate for this, since I got another version working with native SQL (that parses the CSV in the excat same manner) and its literally running circles around this new version)

So, to the actual question, does a hibernate alternative to mysqls "INSERT ... ON DUPLICATE" syntax exist?

Or, if i choose to do native SQL for this, can I do native SQL within a hibernate transaction? Meaning, will it support commit/rollbacks?

like image 794
JustDanyul Avatar asked Sep 08 '11 14:09

JustDanyul


People also ask

How does Hibernate batch processing work?

Hibernate is storing the freshly inserted objects in the second-level cache. Because of this, there is always the possibility of OutOfMemoryException when Inserting more than one million objects. But there will be situations to inserting huge data into the database.

What is Hibernate batch size?

batch_size , the Hibernate documentation recommends a value of between 5 and 30 but this value depends upon the application's needs. The Hibernate documentation's recommendation is suitable for most OLTP-like applications.


2 Answers

There are many possible bottlenecks in to bulk operations. The best approach depends heavily on what your data looks like. Have a look at the Hibernate Manual section on batch processing.

At a minimum, make sure you are using the following pattern (copied from the manual):

Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();  for ( int i=0; i<100000; i++ ) { Customer customer = new Customer(.....); session.save(customer);     if ( i % 20 == 0 ) { //20, same as the JDBC batch size         //flush a batch of inserts and release memory:         session.flush();         session.clear();     } }  tx.commit(); session.close(); 

If you are mapping a flat file to a very complex object graph you may have to get more creative, but the basic principal is that you have to find a balance between pushing good sized chunks of data to the database with each flush/commit and avoiding exploding the size of the session level cache.

Lastly, if you don't need Hibernate to handle any collections or cascading for your data to be correctly inserted, consider using a StatelessSession.

like image 138
jcwayne Avatar answered Sep 19 '22 16:09

jcwayne


From Hibernate Batch Processing For update i used the following :

Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction();  ScrollableResults employeeCursor = session.createQuery("FROM EMPLOYEE")                                    .scroll(); int count = 0;  while ( employeeCursor.next() ) {    Employee employee = (Employee) employeeCursor.get(0);    employee.updateEmployee();    seession.update(employee);     if ( ++count % 50 == 0 ) {       session.flush();       session.clear();    } } tx.commit(); session.close(); 

But for insert i would go for jcwayne answer

like image 40
shareef Avatar answered Sep 19 '22 16:09

shareef