I'm trying to do a cascading save on a large object graph using JPA. For example (my object graph is a little bigger but close enough):
@Entity
@Table(name="a")
public class A {
private long id;
@OneToMany(cascade = CascadeType.ALL, mappedBy = "a")
private Collection<B> bs;
}
@Entity
@Table(name="b")
public class B {
private long id;
@ManyToOne
private A a;
}
So I'm trying to persist A which has a collection of 100+ B's. Code is just
em.persist(a);
Problem is, it's SLOW. My save is taking approximately 1300ms. I looked at the SQL being generated and it's horribly inefficient. Something like this:
select a_seq.nextval from dual;
select b_seq.nextval from dual;
select b_seq.nextval from dual;
select b_seq.nextval from dual;
...
insert into a (id) values (1);
insert into b (id, fk) values (1, 1);
insert into b (id, fk) values (2, 1);
insert into b (id, fk) values (3, 1);
...
Currently using toplink as the persistence provider but I've tried eclipselink and hibernate also. Backend is oracle 11g. Problem is really how the sql is put together. Each of these operations is getting done discretely rather than in bulk, so if there is a network latency of even 5ms between my appserver and db server, doing 200 discrete operations adds 1 second. I've tried increasing the allocationSize of my sequences but that only helps out a bit. I've also tried direct JDBC as a batch statement:
for...{
statement = connection.prepareStatement(sql);
statement.addBatch();
}
statement.executeBatch();
For my datamodel it takes about 33ms done as direct JDBC batch. Oracle itself is taking 5ms for the 100+ inserts.
Is there anyway of making JPA (i'm stuck with 1.0 right now...) go faster without delving into vendor specific things like hibernate bulk insert?
Thanks!
Curious why you find increasing the INCREMENT BY as dirty? It is an optimization that reduces the number of calls to the database to retrieve the next sequence value and is a common pattern used in database clients where the id value is assigned in the client prior to INSERT. I don't see this as a JPA or ORM issue and should be the same cost in your JDBC comparison since it must also retrieve a new sequence number for each new row prior to INSERT. If you have a different approach in your JDBC case then we should be able to get EclipseLink JPA to follow the same approach.
The cost of JPA is probably most obvious on the isolated INSERT scenario because you are not gaining any benefit from repeated reads on a transactional or shared cache and depending on your cache configuration you are paying a price to put these new entities into the cache within the flush/commit.
Please note that there is also a cost to creating the first EntityManager where all of the metadata processing, class-loading, possibly weaving, and metamodel initialization. Make sure you keep this time out of your comparison. In your real application this occurs once and all subsequent EntityManager benefit from the shared metadata.
If you have other scenarios that need to read these entities then the cost of putting them in the cache can reduce their cost of retrieving them. In my experience I can make an application overall much faster then a typical hand-written JDBC solution but its a balance across the entire set of concurrent users and not on an isolated test case.
I hope this helps. Happy to provide any more guidance and EclipseLink JPA and its performance and scalability options.
Doug
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With