Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load recursive object graph without N+1 Cartesian Product with JPA and Hibernate

When converting a project from Ibatis to JPA 2.1, I'm faced with a problem where I have to load a complete object graph for a set of objects, without hitting N+1 selects or using cartesian products for performance reasons.

A users query will yield a List<Task>, and I need to make sure that when I return the tasks, they have all properties populated, including parent, children, dependencies and properties. First let me explain the two entity objects involved.

A Task is part of a hierarchy. It can have a parent Task and it can also have children. A Task can be dependent on other tasks, expressed by the 'dependencies' property. A task can have many properties, expressed by the properties property.

The example objects have been simplified as much as possible and boilerplate code is removed.

@Entity
public class Task {
    @Id
    private Long id;

    @ManyToOne(fetch = LAZY)
    private Task parent;

    @ManyToOne(fetch = LAZY)
    private Task root;

    @OneToMany(mappedBy = "task")
    private List<TaskProperty> properties;

    @ManyToMany
    @JoinTable(name = "task_dependency", inverseJoinColumns = { @JoinColumn(name = "depends_on")})
    private List<Task> dependencies;

    @OneToMany(mappedBy = "parent")
    private List<Task> children;
}

@Entity
public class TaskPropertyValue {
    @Id
    private Long id;

    @ManyToOne(fetch = LAZY)
    private Task task;

    private String name;
    private String value;
}

The Task hierarchy for a given task can be infinitely deep, so to make it easier to get the whole graph, a Task will have a pointer to it's root task via the 'root' property.

In Ibatis, I simply fetched all Tasks for the distinct list of root id's, and then did ad-hoc queries for all properties and dependencies with a "task_id IN ()" query. When I had those, I used Java code to add properties, children and dependencies to all model objects so that the graph was complete. For any size list of tasks, I would then only do 3 SQL queries, and I'm trying to do the same with JPA. Since the 'parent' property indicates where to add the children, I didn't even have to query for those.

I've tried different approaches, including:

Let lazy loading do it's job

  • Performance suicide, no need to elaborate :)

JOIN FETCH children, JOIN FETCH dependences, JOIN FETCH properties

  • This is problematic because the resulting cartesian products are huge, and my JPA implementation (Hibernate) doesn't support List, only Set when fetching multiple bags. A task can have a huge number of properties, making the cartesian products ineffective.

Ad-hoc queries the same way I did in ibatis

  • I cannot add children, dependencies and properties to the Lazy initialized collections on the Task objects, because Hibernate will then try to add them as new objects.

One possible solution could be to create new Task objects that are not managed by JPA and sew my hierarchy together using those, and I guess I can live with that, but it doesn't feel very "JPA", and then I couldn't use JPA for what it's good at - tracking and persisting changes to my objects automatically.

Any hints would be greatly appreciated. I'm open to using vendor spesific extensions if necessary. I'm running in Wildfly 8.1.0.Final (Java EE7 Full Profile) with Hibernate 4.3.5.Final.

like image 416
Edvin Syse Avatar asked Jun 21 '14 12:06

Edvin Syse


1 Answers

Available options

There are some strategies to achieve your goals:

  • sub-select fetching would load all lazy entities with an additional sub-select, the very first time you need a lazy association of that given type. This sounds appealing at first, but it makes your app fragile to the number of additional sub-select entities to fetch and may propagate to other service methods.

  • batch fetching is easier to control since you can enforce the number of entities to be loaded in one batch and might not affect too much other use cases.

  • using a recursive common table expression if your DB supports it.

Plan ahead

In the end, it's all about what you plan on doing with the selected rows. If it's just about displaying them into a view, then a native query is more than enough.

If you need to retain the entities across multiple requests (first the view part, the second for the update part) then entities are a better approach.

From your response, I see you need to issue an EntityManager.merge() and probably rely on cascading to propagate children's state transitions (add/remove).

Since we are talking about 3 JPA queries, and as long as you don't get a Cartesian Product then you should be fine with JPA.

Conclusion

You should strive for the minimum amount of queries but it doesn't mean you will always have to have one and only one query. Two or three queries are not an issue at all.

As long as you control the query number and don't get into an N+1 query issue] you are fine with more than one query too. Trading a Cartesian Product (2 one-to-many fetches) for one join and one additional select is a good deal anyway.

In the end, you should always check the EXPLAIN ANALYZE query plan and reinforce/rethink your strategy.

like image 76
Vlad Mihalcea Avatar answered Nov 02 '22 10:11

Vlad Mihalcea