I have a long running (but rather simple) application that uses Hibernate (via JPA). It was experiencing rather dramatic slowdown as it ran. I've been able to narrow down to requiring an occasional entityManager.clear()
call. When Hibernate's entity manager is tracking 100,000 entities, it is ~100x slower than when it is tracking only a few (see results below). My question is: why does Hiberate slow down so much when it is tracking a lot of entities? And are there any other ways around it?
!!! Update: I've been able to narrow this down to Hibernate's auto-flushing code. !!!
Specifically to org.hibernate.event.internal.AbstractFlushingEventListener
's flushEntities()
method (at least in Hibernate 4.1.1.Final). In it there is a loop that iterates over ALL entities in the persistence context, performing some extensive checks around flushing each of them (even though all the entities are already flushed in my example!).
So partially answering the second part of my question, the performance problem can be addressed by setting the flush mode to FlushModeType.COMMIT
on the query (see updated results below). e.g.
Place place = em.createQuery("from Place where name = :name", Place.class) .setParameter("name", name) .setFlushMode(FlushModeType.COMMIT) // <-- yay! .getSingleResult();
... but this seems like a rather ugly solution--passing the responsibility for knowing if things are flushed to the query methods instead of keeping it in the updating methods. It also pretty much means that I either have to set flush mode to COMMIT on all query methods, or, more likely, set it on the EntityManager.
This makes me wonder: is this expected behavior? Am I doing something wrong with flushing or how I define entities? Or is this a limitation of (or possibly bug in) Hibernate?
The sample code that I used to isolate the problem follows:
@Entity @Table(name="place") @Immutable public class Place { private Long _id; private String _name; @Id @GeneratedValue public Long getId() { return _id; } public void setId(Long id) { _id = id; } @Basic(optional=false) @Column(name="name", length=700, updatable=false, nullable=false, unique=true, columnDefinition="varchar(700) character set 'ascii' not null") public String getName() { return _name; } public void setName(String name) { _name = name; } @Override public boolean equals(Object o) { /* ... */ } @Override public int hashCode() { return getName().hashCode(); } }
The test code I have generates 100000 random place names and inserts them. Then queries out 5000 of those randomly by name. There is an index on the name column.
Place place = em.createQuery( "select p from Place p where p.name = :name", Place.class) .setParameter("name", name) .getSingleResult();
For comparison, and to make sure that it wasn't something in the database, I ran the following JDBC based query (under em.unwrap(Session.class).doWork(...)
) over a separate randomly selected 5000 place names:
PreparedStatement ps = c.prepareStatement( "select id, name from place where name = ?"); ps.setString(1, name); ResultSet rs = ps.executeQuery(); while (rs.next()) { Place place = new Place(); place.setId(rs.getLong(1)); place.setName(rs.getString(2)); } rs.close(); ps.close();
(Note, I do create and close a PreparedStatement for each of the 5000 queries for the benchmark).
All results below are averaged over 5000 queries. The JVM was given -Xmx1G
Seconds/Query Approach 0.000160s JDBC 0.000286s Hibernate calling clear() after import and every 100 queries 0.000653s Hibernate calling clear() once after the import 0.012533s Hibernate w/o calling clear() at all 0.000292s Hibernate w/o calling clear(), and with flush-mode COMMIT
Other observations: the during the Hibernate queries (w/o any clear calls), the java process pegged a core at near 100% utilization. The JVM never exceeded 500MB heap. There was a lot of GC activity during the queries too, but CPU utilization was clearly dominated by the Hibernate code.
But mainly I'm curious why Hibernate seems to exhibit O(n) or even O(n^2) lookups for the queries--seems like it should be able to use a hashtable or binary-tree under the hood to keep the queries fast. Notice the 2-orders-of-magnitude difference when its tracking 100000 entities vs 100 entities.
The O(n²) complexity results from the way a query must be handled. Since Hibernate internally deferred updates and inserts as long as it can (to use the chance to group similar updates/insertions together, especially if you set multiple properties of an object).
So before you can savely query objects in the database Hibernate has to detect all object changes and flush all changes. The problem here is that hibernate has also some notification and interception going on. So It iterates over every entity object managed by the persistence context. Even if the object is in itself is not mutable it might hold mutable objects or even reference collections.
Also the interception mechanism grands you access to any object being considered for being dirty to allow your own code to implement additional dirtyness checks or perform additional calculations like calculate sums, average values, log additional information and so on.
But lets look at the code for a minute:
The flush call to prepare the query results in:
DefaultFlushEventListener.onFlush(..)
-> AbstractFlushingEventListener.flushEverythingToExecution(event) -> AbstractFlushingEventListener.prepareEntityFlushes(..)
The implementation uses:
for ( Map.Entry me : IdentityMap.concurrentEntries( persistenceContext.getEntityEntries() ) ) { EntityEntry entry = (EntityEntry) me.getValue(); Status status = entry.getStatus(); if ( status == Status.MANAGED || status == Status.SAVING || status == Status.READ_ONLY ) { cascadeOnFlush( session, entry.getPersister(), me.getKey(), anything ); } }
As you can see a map of all entities in the persistence context is retrieved and iterated.
That means for every invocation of a query you iterate over all former results to check for dirty objects. And even more cascadeOnFlush creates a new Object and does even more things. Here is the code of cascadeOnFlush:
private void cascadeOnFlush(EventSource session, EntityPersister persister, Object object, Object anything) throws HibernateException { session.getPersistenceContext().incrementCascadeLevel(); try { new Cascade( getCascadingAction(), Cascade.BEFORE_FLUSH, session ) .cascade( persister, object, anything ); } finally { session.getPersistenceContext().decrementCascadeLevel(); } }
So this is the explanation. Hibernate just checks every object managed by the persistence context every time you issue a query.
So for everyone reading this here is the complexity calculation: 1. Query: 0 entities 2. Query: 1 entity 3. Query: 2 entities .. 100. Query: 100 entities . .. 100k + 1 Query: 100k entries
So we have O(0+1+2...+n) = O(n(n+1)/2) = O(n²).
This explains your observation. In order to maintain a small cpu and memory footprint hibernate managed persistence context should be kept as small as possible. Letting Hibernate manage more than lets say 100 or 1000 entities slows down Hibernate considerably. Here one should consider to change the flush mode, use a second session for query and one for change (if this is possible at all) or use StatelessSession.
So your observation is correct it is O(n²) going on.
Perhaps you are familiar that EntityManager
keeps track of persistent objects (i.e the ones created by calling em.createQuery(...).getSingleResult()
). They accumulate in so called persistent context or session (the Hibernate term) and allows very neat features. For example you can modify the object by calling the mutator method setName(...)
and the EntityManager
will synchronize this state change in memory with database (will issue an UPDATE statement) whenever it is appropriate. This happens without requiring you to call explicit save()
or update()
methods. All you need is to work with the object as if it is normal Java object and EntityManager
will take care of persistence.
Why this is slow(er)?
For one, it ensures that there is just one, single instance per primary key in memory. This means that if you load one and same row twice, there will be just one object created in the heap (both results will be ==
). This makes a lot of sense - imagine if you have 2 copies of same row, EntityManager
can't guarantee that it reliably synchronized the Java object as you can make changes in both objects independently. Perhaps there are a lot of other low-level operations that eventually slow down Entitymanager
if there are a lot of objects that has to be tracked. The clear()
methods actually removes there objects the persistent context and makes the task easier (less objects to track = faster operation).
How can you get around it?
If your EntityManager
implementation is Hibernate, you can use StatelessSession which is designed to workaround these performance penalties. I think you can get it through:
StatelessSession session = ((Session) entityManager.getDelegate()).getSessionFactory().openStatelessSession();
(NB! code not tested, taken from another question)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With