I'm trying to cut down the number of n+1 selects incurred by my application, the application uses EclipseLink as an ORM and in as many places as possible I've tried to add the batch read hint to queries. In a large number of places in the app I don't always know exactly what relationships I'll be traversing (My view displays fields based on user preferences). At that point I'd like to run one query to populate all of those relationships for my objects.
My dream is to call something like ReadAllRelationshipsQuery(Collection,RelationshipName) and populate all of these items so that later calls to:
Collection.get(0).getMyStuff will already be populated and not cause a db query. How can I accomplish this? I'm willing to write any code I need to but I can't find a way that work with the eclipselink framework?
Why don't I just batch read all of the possible fields and let them load lazily? What I've found is that the batch value holders that implement batch reads don't behave well with the eclipselink cache. If a batch read value holder isn't "evaluated" and ends up in the eclipse link cache it can become stale and return incorrect data (This behavior was logged as an eclipselink bug but rejected...) edit: I found the link to the bug here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=326197
How do I avoid N+1 selects for objects I already have a reference to?
You have three basic ways to load data into objects from a JPA-based solution. These are:
Each of these has pros and cons.
Regardless, I highly recommend using p6spy ( http://sourceforge.net/projects/p6spy/ ) in conjunction with any JPA-based application to understand the effects of your tuning.
Unfortunately, JPA makes some things easy and some things hard - mainly, side-effects of your usage. For example, you might fix one problem by setting the fetch mode to eager, and then create another problem where the eager fetch pulls in too much data. EclipseLink does provide tooling to help sort this out ( EclipseLink Performance Tools )
In theory, if you wanted to you could write a generic JavaBean property walker by using something like Apache BeanUtils. Usually just calling a method like size() on a collection is enough to force it to load (although using a collection batch fetch size might complicate things a bit).
One thing to pay particular attention to is the scope of your session and your use of caches (EclipseLink cache).
Something not clear from your post is the scope of a session. Is a session a one shot affair (e.g. like a web page request) or is it a long running thing (e.g. like a classic client/server GUI app)?
It is very difficult to optimize the retrieval of relationships if you do not know what relationships you require.
If you application is requesting what relationships it wants, then you must know at some level which relationships you require, and should be able to optimize these in your query for the objects.
For an overview of relationship optimization techniques see,
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
For Batch Fetching, there are three types, JOIN, EXISTS, and IN. The problem you outlined of changes to data affecting the original query for cache batched relationships only applies to JOIN and EXISTS, and only when you have a selection criteria based on updateale fields, (if the query you are optimizing is on id, or all instances you are ok). IN batch fetching does not have this issue, so you can use IN batch fetching for all the relationships and not have this issue.
ReadAllRelationshipsQuery(Collection,RelationshipName)
How about,
Query query = em.createQuery("Select o from MyObject o where o.id in :ids");
query.setParameter(ids, ids);
query.setHint("eclipselink.batch", relationship);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With