Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hibernate queries much slower with flushMode=AUTO until clear() is called

Tags:

I have a long running (but rather simple) application that uses Hibernate (via JPA). It was experiencing rather dramatic slowdown as it ran. I've been able to narrow down to requiring an occasional entityManager.clear() call. When Hibernate's entity manager is tracking 100,000 entities, it is ~100x slower than when it is tracking only a few (see results below). My question is: why does Hiberate slow down so much when it is tracking a lot of entities? And are there any other ways around it?


!!! Update: I've been able to narrow this down to Hibernate's auto-flushing code. !!!

Specifically to org.hibernate.event.internal.AbstractFlushingEventListener's flushEntities() method (at least in Hibernate 4.1.1.Final). In it there is a loop that iterates over ALL entities in the persistence context, performing some extensive checks around flushing each of them (even though all the entities are already flushed in my example!).

So partially answering the second part of my question, the performance problem can be addressed by setting the flush mode to FlushModeType.COMMIT on the query (see updated results below). e.g.

Place place = em.createQuery("from Place where name = :name", Place.class)     .setParameter("name", name)     .setFlushMode(FlushModeType.COMMIT)  // <-- yay!     .getSingleResult(); 

... but this seems like a rather ugly solution--passing the responsibility for knowing if things are flushed to the query methods instead of keeping it in the updating methods. It also pretty much means that I either have to set flush mode to COMMIT on all query methods, or, more likely, set it on the EntityManager.

This makes me wonder: is this expected behavior? Am I doing something wrong with flushing or how I define entities? Or is this a limitation of (or possibly bug in) Hibernate?


The sample code that I used to isolate the problem follows:

The Test Entity

@Entity @Table(name="place") @Immutable public class Place {     private Long _id;     private String _name;      @Id @GeneratedValue     public Long getId() { return _id; }     public void setId(Long id) { _id = id; }      @Basic(optional=false) @Column(name="name", length=700,         updatable=false, nullable=false, unique=true,         columnDefinition="varchar(700) character set 'ascii' not null")     public String getName() { return _name; }     public void setName(String name) { _name = name; }      @Override     public boolean equals(Object o) { /* ... */ }      @Override     public int hashCode() { return getName().hashCode(); } } 

The Benchmark Code

The test code I have generates 100000 random place names and inserts them. Then queries out 5000 of those randomly by name. There is an index on the name column.

Place place = em.createQuery(     "select p from Place p where p.name = :name", Place.class)     .setParameter("name", name)     .getSingleResult(); 

For comparison, and to make sure that it wasn't something in the database, I ran the following JDBC based query (under em.unwrap(Session.class).doWork(...)) over a separate randomly selected 5000 place names:

PreparedStatement ps = c.prepareStatement(     "select id, name from place where name = ?"); ps.setString(1, name); ResultSet rs = ps.executeQuery(); while (rs.next()) {     Place place = new Place();     place.setId(rs.getLong(1));     place.setName(rs.getString(2)); } rs.close(); ps.close(); 

(Note, I do create and close a PreparedStatement for each of the 5000 queries for the benchmark).

The Results

All results below are averaged over 5000 queries. The JVM was given -Xmx1G

Seconds/Query    Approach 0.000160s        JDBC 0.000286s        Hibernate calling clear() after import and every 100 queries 0.000653s        Hibernate calling clear() once after the import 0.012533s        Hibernate w/o calling clear() at all 0.000292s        Hibernate w/o calling clear(), and with flush-mode COMMIT 

Other observations: the during the Hibernate queries (w/o any clear calls), the java process pegged a core at near 100% utilization. The JVM never exceeded 500MB heap. There was a lot of GC activity during the queries too, but CPU utilization was clearly dominated by the Hibernate code.

like image 715
ɲeuroburɳ Avatar asked Apr 13 '12 15:04

ɲeuroburɳ


2 Answers

But mainly I'm curious why Hibernate seems to exhibit O(n) or even O(n^2) lookups for the queries--seems like it should be able to use a hashtable or binary-tree under the hood to keep the queries fast. Notice the 2-orders-of-magnitude difference when its tracking 100000 entities vs 100 entities.

The O(n²) complexity results from the way a query must be handled. Since Hibernate internally deferred updates and inserts as long as it can (to use the chance to group similar updates/insertions together, especially if you set multiple properties of an object).

So before you can savely query objects in the database Hibernate has to detect all object changes and flush all changes. The problem here is that hibernate has also some notification and interception going on. So It iterates over every entity object managed by the persistence context. Even if the object is in itself is not mutable it might hold mutable objects or even reference collections.

Also the interception mechanism grands you access to any object being considered for being dirty to allow your own code to implement additional dirtyness checks or perform additional calculations like calculate sums, average values, log additional information and so on.

But lets look at the code for a minute:

The flush call to prepare the query results in:

DefaultFlushEventListener.onFlush(..) 

-> AbstractFlushingEventListener.flushEverythingToExecution(event) -> AbstractFlushingEventListener.prepareEntityFlushes(..)

The implementation uses:

for ( Map.Entry me : IdentityMap.concurrentEntries( persistenceContext.getEntityEntries() ) ) {         EntityEntry entry = (EntityEntry) me.getValue();         Status status = entry.getStatus();         if ( status == Status.MANAGED || status == Status.SAVING || status == Status.READ_ONLY ) {             cascadeOnFlush( session, entry.getPersister(), me.getKey(), anything );         }     } 

As you can see a map of all entities in the persistence context is retrieved and iterated.

That means for every invocation of a query you iterate over all former results to check for dirty objects. And even more cascadeOnFlush creates a new Object and does even more things. Here is the code of cascadeOnFlush:

private void cascadeOnFlush(EventSource session, EntityPersister persister, Object object, Object anything) throws HibernateException {     session.getPersistenceContext().incrementCascadeLevel();     try {         new Cascade( getCascadingAction(), Cascade.BEFORE_FLUSH, session )         .cascade( persister, object, anything );     }     finally {         session.getPersistenceContext().decrementCascadeLevel();     } } 

So this is the explanation. Hibernate just checks every object managed by the persistence context every time you issue a query.

So for everyone reading this here is the complexity calculation: 1. Query: 0 entities 2. Query: 1 entity 3. Query: 2 entities .. 100. Query: 100 entities . .. 100k + 1 Query: 100k entries

So we have O(0+1+2...+n) = O(n(n+1)/2) = O(n²).

This explains your observation. In order to maintain a small cpu and memory footprint hibernate managed persistence context should be kept as small as possible. Letting Hibernate manage more than lets say 100 or 1000 entities slows down Hibernate considerably. Here one should consider to change the flush mode, use a second session for query and one for change (if this is possible at all) or use StatelessSession.

So your observation is correct it is O(n²) going on.

like image 172
Martin Kersten Avatar answered Oct 03 '22 20:10

Martin Kersten


Perhaps you are familiar that EntityManager keeps track of persistent objects (i.e the ones created by calling em.createQuery(...).getSingleResult()). They accumulate in so called persistent context or session (the Hibernate term) and allows very neat features. For example you can modify the object by calling the mutator method setName(...) and the EntityManager will synchronize this state change in memory with database (will issue an UPDATE statement) whenever it is appropriate. This happens without requiring you to call explicit save() or update() methods. All you need is to work with the object as if it is normal Java object and EntityManager will take care of persistence.

Why this is slow(er)?

For one, it ensures that there is just one, single instance per primary key in memory. This means that if you load one and same row twice, there will be just one object created in the heap (both results will be ==). This makes a lot of sense - imagine if you have 2 copies of same row, EntityManager can't guarantee that it reliably synchronized the Java object as you can make changes in both objects independently. Perhaps there are a lot of other low-level operations that eventually slow down Entitymanager if there are a lot of objects that has to be tracked. The clear() methods actually removes there objects the persistent context and makes the task easier (less objects to track = faster operation).

How can you get around it?

If your EntityManager implementation is Hibernate, you can use StatelessSession which is designed to workaround these performance penalties. I think you can get it through:

StatelessSession session = ((Session) entityManager.getDelegate()).getSessionFactory().openStatelessSession();

(NB! code not tested, taken from another question)

like image 29
Svilen Ivanov Avatar answered Oct 03 '22 22:10

Svilen Ivanov