Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save and load a large Graph structure with JPA and Hibernate?

I'm trying to persist and load the following simple structure (resembling a directed graph) using JPA 2.1, Hibernate 4.3.7 and Spring Data:

Graph.java

@Entity
public class Graph extends PersistableObject {

    @OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "graph")
    private Set<Node> nodes = new HashSet<Node>();

    // getters, setters...
}

Node.java

@Entity
public class Node extends PersistableObject {

    @ManyToMany(fetch = FetchType.LAZY, cascade = { CascadeType.MERGE, CascadeType.PERSIST })
    private Set<Node> neighbors = new HashSet<Node>();

    @ManyToOne(fetch = FetchType.EAGER, cascade = { CascadeType.MERGE })
    private Graph graph;

    // getters, setters...
}

The Problem

In most cases, the lazy loading behaviour is fine. The problem is that, on some occasions in my application, I need to fully load a given graph (including all lazy references) and also persist a full graph in an efficient way, without performing N+1 SQL queries. Also, when storing a new graph, I get a StackOverflowError as soon as the graph becomes too big (> 1000 nodes).

Questions

  1. How can I store a new graph in the database with 10.000+ nodes, given that Hibernate seems to choke on a graph with 1000 nodes with a StackOverflowError already? Any useful tricks?

  2. How can I fully load a graph and resolve all lazy references without performing N+1 SQL queries?

What I tried so far

I have no clue how to solve problem 1). As for problem 2), I tried to use the following HQL query:

I'm currently trying to do it using HQL with fetch joins:

FROM Graph g LEFT JOIN FETCH g.nodes node LEFT JOIN FETCH node.neighbors WHERE g.id = ?1

... where ?1 refers to a string parameter containing the graph id. However, this seems to result in one SQL SELECT per node stored in the graph, which leads to horrible performance on graphs with several thousands of nodes. Using Hibernate's FetchProfiles produced the same result.

Important -EDIT-

EDIT 1: It turns out that Spring Data JpaRepositories perform their save(T) operation by first calling entityManager.merge(...), then calling entityManager.persist(...). The StackOverflowError does not occur on a "raw" entityManager.persist(...), but it does occur in entityManager.merge(...). It still doesn't solve the issue though, why does this happen on a merge?

EDIT 2: I think that this is really a bug in Hibernate. I've filed a bug report with a complete, self-contained JUnit test project. In case somebody is interested, you can find it here: Hibernate JIRA

Supplementary Material

Here's the PersistableObject class which uses a UUID for it's @ID, and an eclipse-generated hashCode() and equals(...) method based on that ID.

PersistableObject.java

@MappedSuperclass
public abstract class PersistableObject {

    @Id
    private String id = UUID.randomUUID().toString();

    // hashCode() and equals() auto-generated by eclipse based on this.id

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + (this.id == null ? 0 : this.id.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj == null) {
            return false;
        }
        if (this.getClass() != obj.getClass()) {
            return false;
        }
        PersistableObject other = (PersistableObject) obj;
        if (this.id == null) {
            if (other.id != null) {
                return false;
            }
        } else if (!this.id.equals(other.id)) {
            return false;
        }
        return true;
    }

    // getters, setters...

}

If you want to try it for yourself, here's a factory that generates a random graph:

GraphFactory.java

public class GraphFactory {

    public static Graph createRandomGraph(final int numberOfNodes, final int edgesPerNode) {
        Graph graph = new Graph();
        // we use this list for random index access
        List<Node> nodes = new ArrayList<Node>();
        for (int nodeIndex = 0; nodeIndex < numberOfNodes; nodeIndex++) {
            Node node = new Node();
            node.setGraph(graph);
            graph.getNodes().add(node);
            nodes.add(node);
        }
        Random random = new Random();
        for (Node node : nodes) {
            for (int edgeIndex = 0; edgeIndex < edgesPerNode; edgeIndex++) {
                int randomTargetNodeIndex = random.nextInt(nodes.size());
                Node targetNode = nodes.get(randomTargetNodeIndex);
                node.getNeighbors().add(targetNode);
            }
        }
        return graph;
    }
}

The Stack Trace

The stack trace of the StackOverflowError repeatedly contains the following sequence (directly one after the other):

at org.hibernate.engine.spi.CascadingActions$6.cascade(CascadingActions.java:277) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascadeToOne(Cascade.java:350) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascadeAssociation(Cascade.java:293) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascadeProperty(Cascade.java:161) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascade(Cascade.java:118) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.AbstractSaveEventListener.cascadeBeforeSave(AbstractSaveEventListener.java:432) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.DefaultMergeEventListener.entityIsTransient(DefaultMergeEventListener.java:248) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.DefaultMergeEventListener.entityIsDetached(DefaultMergeEventListener.java:317) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:186) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.internal.SessionImpl.fireMerge(SessionImpl.java:886) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.internal.SessionImpl.merge(SessionImpl.java:868) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
like image 612
Alan47 Avatar asked Jan 12 '15 12:01

Alan47


1 Answers

During the last 24 hours I did a lot of web research on this topic and I'll try to give a tentative answer here. Please do correct me if I'm wrong on something.

Problem: Hibernate StackOverflowException on entityManager.merge(...)

This seems to be a general issue with ORM. By nature, the "merge" algorithm is recursive. If there is a path (from entity to entity) in your model that has too many entities in it, without ever referencing a known entity in between, the recursion depth of the algorithm is larger than the stack size of your JVM.

Solution 1: Increase the stack size of your JVM

If you know that your model is just slightly too large for the stack size of your JVM, you can increase that value by using the start parameter -Xss (and a suitable value) to increase it. However, note that this value is static, so if you load a larger model than before, you would have to increase it again.

Solution 2: Breaking up the entity chains

This is definitly not a solution in the spirit of Object-Relational Mapping, but to my current knowledge, it is the only solution that effectively scales well with growing model size. The idea is that you replace a normal Java reference in your @Entity classes with a primitive value that contains the @Id value of the target entity instead. So if your target @Entity uses an id value of type long, you would have to store a long value. It is then up to the application layer to resolve the reference as needed (by performing a findById(...) query on the database).

Applied to the graph scenario from the question post, we would have to change the Node class to this:

@Entity
public class Node extends PersistableObject {

    // note this new mapping!
    @ElementCollection(fetch = FetchType.EAGER)
    private Set<String> neighbors = new HashSet<String>();

    @ManyToOne(fetch = FetchType.LAZY, cascade = { CascadeType.MERGE })
    private Graph graph;

    // getters, setters...

}

Problem: N+1 SQL selects

I was actually fooled by Spring and Hibernate here. My Unit test used a JpaRepository and called repository.save(graph) followed by repository.fullyLoadById(graphId) (which had an @Query annotation using the HQL fetch join query from the question post) and measured the time for each operation. The SQL select queries that popped up in my console log did not come from the fullyLoadById query, but from repository.save(graph). What spring repositories do here is to first call entityManager.merge(...) on the object that we want to save. Merge, in turn, fetches the current state of the entity from the database. This fetching results in the large number of SQL select statements that I experienced. My load query actually was performed in a single SQL query, as intended.

Solution:

If you have a fairly large object graph and you know that it is definitly new, not contained in the database, and does not reference any entity that is stored in the database, you can skip the merge(...) step and directly call entityManager.persist(...) on it for better performance. Spring repositories always use merge(...) for safety reasons. persist(...) will attempt an SQL INSERT statement, which will fail if there is already a row with the given ID in the database.

Also, note that Hibernate will always log all queries one by one if you use hibernate.show_sql = true. JDBC batching takes place after the queries have been generated. So if you see lots of queries in your log, it does not necessarily mean that you had as many DB roundtrips.

like image 103
Alan47 Avatar answered Sep 17 '22 13:09

Alan47