I have a text file ~6GB which I need to parse and later persist. By 'parsing' I'm reading a line from the file (usually 2000 chars), create a Car-object from the line and later I persist it.
I'm using a producer consumer pattern to parse and persist and wonder if it makes any difference (for performance reasons) to persist one object at a time or 1000 (or any other amount) in one commit?
At the moment, it takes me >2hr to persist everything (3 million lines) and it looks too much time for me (or I may be wrong).
Currently I'm doing this:
public void persistCar(Car car) throws Exception
{
try
{
carDAO.beginTransaction(); //get hibernate session...
//do all save here.
carDAO.commitTransaction(); // commit the session
}catch(Exception e)
{
carDAO.rollback();
e.printStackTrace();
}
finally
{
carDAO.close();
}
}
Before I make any design changes I was wondering if there's a reason why this design is better (or not) and if so, what should be the cars.size()? Also, is open/close of session considered expensive?
public void persistCars(List<Car> cars) throws Exception
{
try
{
carDAO.beginTransaction(); //get hibernate session...
for (Car car : cars)
//do all save here.
carDAO.commitTransaction(); // commit the session
}catch(Exception e)
{
carDAO.rollback();
e.printStackTrace();
}
finally
{
carDAO.close();
}
}
The save method proves to be of less use in a long-running conversation that has extended a given Session context. As the persist method is called outside the transaction boundaries, it is utilized in long-running conversations that offer an extended Session context. Save() method gets support only through Hibernate.
A hibernate session is more or less a database connection and a cache for database objects. And you can have multiple successive transactions in a single database connection.
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn't that be just awesome? Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
Traditionally hibernate does not go that well with bulk inserts. There are some ways to optimize it to some level.
Take this example from the API Docs,
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
In the above example the session if flushed after inserting 20 entries which will make the operation little faster.
Here an interesting article discussing the same stuff.
We have successfully implemented an alternative way of bulk inserts using stored procedures. In this case you will pass the parameters to the SP as "|" separated list, and will write the insert scrips inside the SP. Here the code might look a bit complex but is very effective.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With