I'm trying to persist and load the following simple structure (resembling a directed graph) using JPA 2.1, Hibernate 4.3.7 and Spring Data:
Graph.java
@Entity
public class Graph extends PersistableObject {
@OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "graph")
private Set<Node> nodes = new HashSet<Node>();
// getters, setters...
}
Node.java
@Entity
public class Node extends PersistableObject {
@ManyToMany(fetch = FetchType.LAZY, cascade = { CascadeType.MERGE, CascadeType.PERSIST })
private Set<Node> neighbors = new HashSet<Node>();
@ManyToOne(fetch = FetchType.EAGER, cascade = { CascadeType.MERGE })
private Graph graph;
// getters, setters...
}
In most cases, the lazy loading behaviour is fine. The problem is that, on some occasions in my application, I need to fully load a given graph (including all lazy references) and also persist a full graph in an efficient way, without performing N+1 SQL queries. Also, when storing a new graph, I get a StackOverflowError
as soon as the graph becomes too big (> 1000 nodes).
How can I store a new graph in the database with 10.000+ nodes, given that Hibernate seems to choke on a graph with 1000 nodes with a StackOverflowError
already? Any useful tricks?
How can I fully load a graph and resolve all lazy references without performing N+1 SQL queries?
I have no clue how to solve problem 1). As for problem 2), I tried to use the following HQL query:
I'm currently trying to do it using HQL with fetch joins:
FROM Graph g LEFT JOIN FETCH g.nodes node LEFT JOIN FETCH node.neighbors WHERE g.id = ?1
... where ?1 refers to a string parameter containing the graph id. However, this seems to result in one SQL SELECT per node stored in the graph, which leads to horrible performance on graphs with several thousands of nodes. Using Hibernate's FetchProfiles produced the same result.
EDIT 1: It turns out that Spring Data JpaRepositories perform their save(T)
operation by first calling entityManager.merge(...)
, then calling entityManager.persist(...
). The StackOverflowError
does not occur on a "raw" entityManager.persist(...)
, but it does occur in entityManager.merge(...)
. It still doesn't solve the issue though, why does this happen on a merge?
EDIT 2: I think that this is really a bug in Hibernate. I've filed a bug report with a complete, self-contained JUnit test project. In case somebody is interested, you can find it here: Hibernate JIRA
Here's the PersistableObject
class which uses a UUID for it's @ID
, and an eclipse-generated hashCode()
and equals(...)
method based on that ID.
PersistableObject.java
@MappedSuperclass
public abstract class PersistableObject {
@Id
private String id = UUID.randomUUID().toString();
// hashCode() and equals() auto-generated by eclipse based on this.id
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (this.id == null ? 0 : this.id.hashCode());
return result;
}
@Override
public boolean equals(final Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (this.getClass() != obj.getClass()) {
return false;
}
PersistableObject other = (PersistableObject) obj;
if (this.id == null) {
if (other.id != null) {
return false;
}
} else if (!this.id.equals(other.id)) {
return false;
}
return true;
}
// getters, setters...
}
If you want to try it for yourself, here's a factory that generates a random graph:
GraphFactory.java
public class GraphFactory {
public static Graph createRandomGraph(final int numberOfNodes, final int edgesPerNode) {
Graph graph = new Graph();
// we use this list for random index access
List<Node> nodes = new ArrayList<Node>();
for (int nodeIndex = 0; nodeIndex < numberOfNodes; nodeIndex++) {
Node node = new Node();
node.setGraph(graph);
graph.getNodes().add(node);
nodes.add(node);
}
Random random = new Random();
for (Node node : nodes) {
for (int edgeIndex = 0; edgeIndex < edgesPerNode; edgeIndex++) {
int randomTargetNodeIndex = random.nextInt(nodes.size());
Node targetNode = nodes.get(randomTargetNodeIndex);
node.getNeighbors().add(targetNode);
}
}
return graph;
}
}
The Stack Trace
The stack trace of the StackOverflowError
repeatedly contains the following sequence (directly one after the other):
at org.hibernate.engine.spi.CascadingActions$6.cascade(CascadingActions.java:277) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascadeToOne(Cascade.java:350) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascadeAssociation(Cascade.java:293) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascadeProperty(Cascade.java:161) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.engine.internal.Cascade.cascade(Cascade.java:118) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.AbstractSaveEventListener.cascadeBeforeSave(AbstractSaveEventListener.java:432) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.DefaultMergeEventListener.entityIsTransient(DefaultMergeEventListener.java:248) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.DefaultMergeEventListener.entityIsDetached(DefaultMergeEventListener.java:317) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.event.internal.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:186) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.internal.SessionImpl.fireMerge(SessionImpl.java:886) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
at org.hibernate.internal.SessionImpl.merge(SessionImpl.java:868) ~[hibernate-core-4.3.7.Final.jar:4.3.7.Final]
During the last 24 hours I did a lot of web research on this topic and I'll try to give a tentative answer here. Please do correct me if I'm wrong on something.
This seems to be a general issue with ORM. By nature, the "merge" algorithm is recursive. If there is a path (from entity to entity) in your model that has too many entities in it, without ever referencing a known entity in between, the recursion depth of the algorithm is larger than the stack size of your JVM.
If you know that your model is just slightly too large for the stack size of your JVM, you can increase that value by using the start parameter -Xss (and a suitable value) to increase it. However, note that this value is static, so if you load a larger model than before, you would have to increase it again.
This is definitly not a solution in the spirit of Object-Relational Mapping, but to my current knowledge, it is the only solution that effectively scales well with growing model size. The idea is that you replace a normal Java reference in your @Entity
classes with a primitive value that contains the @Id
value of the target entity instead. So if your target @Entity
uses an id value of type long
, you would have to store a long
value. It is then up to the application layer to resolve the reference as needed (by performing a findById(...)
query on the database).
Applied to the graph scenario from the question post, we would have to change the Node
class to this:
@Entity
public class Node extends PersistableObject {
// note this new mapping!
@ElementCollection(fetch = FetchType.EAGER)
private Set<String> neighbors = new HashSet<String>();
@ManyToOne(fetch = FetchType.LAZY, cascade = { CascadeType.MERGE })
private Graph graph;
// getters, setters...
}
I was actually fooled by Spring and Hibernate here. My Unit test used a JpaRepository
and called repository.save(graph)
followed by repository.fullyLoadById(graphId)
(which had an @Query
annotation using the HQL fetch join query from the question post) and measured the time for each operation. The SQL select queries that popped up in my console log did not come from the fullyLoadById
query, but from repository.save(graph)
. What spring repositories do here is to first call entityManager.merge(...)
on the object that we want to save. Merge, in turn, fetches the current state of the entity from the database. This fetching results in the large number of SQL select statements that I experienced. My load query actually was performed in a single SQL query, as intended.
If you have a fairly large object graph and you know that it is definitly new, not contained in the database, and does not reference any entity that is stored in the database, you can skip the merge(...)
step and directly call entityManager.persist(...)
on it for better performance. Spring repositories always use merge(...)
for safety reasons. persist(...)
will attempt an SQL INSERT
statement, which will fail if there is already a row with the given ID in the database.
Also, note that Hibernate will always log all queries one by one if you use hibernate.show_sql = true
. JDBC batching takes place after the queries have been generated. So if you see lots of queries in your log, it does not necessarily mean that you had as many DB roundtrips.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With