I need to process a CSV file and for each record (line) persist an entity. Right now, I do it this way:
while ((line = reader.readNext()) != null) {
Entity entity = createEntityObject(line);
entityManager.save(entity);
i++;
}
where the save(Entity)
method is basically just an EntityManager.merge()
call. There are about 20,000 entities (lines) in the CSV file. Is this an effective way to do it? It seems to be quite slow. Would it be better to use EntityManager.persist()
? Is this solution flawed in any way?
EDIT
It's a lengthy process (over 400s) and I tried both solutions, with persist
and merge
. Both take approximately the same amount of time to complete (459s vs 443s). The question is if saving the entities one by one like this is optimal. As far as I know, Hibernate (which is my JPA provider) does implement some cache/flush functionality so I shouldn't have to worry about this.
The JPA API doesn't provide you all the options to make this optimal. Depending on how fast you want to do this you are going to have to look for ORM specific options - Hibernate in your case.
Things to check:
So in Ebean ORM this would be:
EbeanServer server = Ebean.getServer(null);
Transaction transaction = server.beginTransaction();
try {
// Use JDBC batch API with a batch size of 100
transaction.setBatchSize(100);
// Don't bother getting generated keys
transaction.setBatchGetGeneratedKeys(false);
// Skip cascading persist
transaction.setPersistCascade(false);
// persist your beans ...
Iterator<YourEntity> it = null; // obviously should not be null
while (it.hasNext()) {
YourEntity yourEntity = it.next();
server.save(yourEntity);
}
transaction.commit();
} finally {
transaction.end();
}
Oh, and if you do this via raw JDBC you skip the ORM overhead (less object creation / garbage collection etc) - so I wouldn't ignore that option.
So yes, this doesn't answer your question but might help your search for more ORM specific batch insert tweaks.
I think one common way to do this is with transactions. If you begin a new transaction and then persist a large number of objects, they won't actually be inserted into the DB until you commit the transaction. This can gain you some efficiencies if you have a large number of items to commit.
Check out EntityManager.getTransaction
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With