Building a Spring application that fetches data from web using an API I bumped multiple times into OutOfMemoryError: GC overhead limit exceeded
. After some profiling sessions I started to question my model, which is something like this:
@Entity
class A {
@Id
private Integer id;
private String name;
@OneToMany
private Set<B> b1;
@OneToMany
private Set<B> b2;
}
@Entity
Class B {
@Id
private Integer id;
@ManyToOne
private A a1;
@ManyToOne
private A a2;
}
There is a CrudRepository assigned to manage these entities (JPA + EclipseLink). Entity loading is default, which in this case means eager AFAIK.
The program attempts to do the following:
// populates the set with 2500 A instances.
Set<A> aCollection = fetchAFromWebAPI();
for (A a : aCollection) {
// populates b1 and b2 of each A with a 100 of B instances
fetchBFromWebAPI(a);
aRepository.save(a);
}
By the end of this process there would be 500k B instances, except it never reaches the end because of OutOfMemoryError: GC overhead limit exceeded
. Now I could add more memory, but I want to understand why all these instances aren't garbage collected? Save an A to the database and forget it. Is this because A instances have B instances in their b1 or b2 that in their turn reference A instances?
Another observation I made is that the process runs significantly more smoothly for the first time, when there is no data in database.
Is there something fundamentally wrong with this model or this process?
A JPA transaction has an associated session cache of all entities used in the transaction. By saving your entities you keep introducing more instances into that session cache. In your case I'd recommend to use EntityManager.clear()
every n
entities - that detaches the persisted entities from the session and makes them available for garbage collection.
If you want to learn more about the lifecycle of JPA entities you can refer to e.g.
http://www.objectdb.com/java/jpa/persistence/managed
Edit: Additionally the answer of BatScream also is correct: you seem to accumulate more and more data in every iteration that is still referenced by the set. You might want to consider to remove instances you have processed from the set.
The collection aCollection
keeps on growing after each iteration. Each instance of A
will be populated with 200 entries of B
instances after each loop. Hence your heap space gets eaten up.
All the A
instances in the collection aCollection
are always reachable when the garbage collector runs during this period, since you are not removing the just saved A
from the collection.
To avoid this, you can use the Set Iterator
to safely remove the just processed A
instance from the collection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With