Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are JPA entities that are not in use garbage collected and why?

Building a Spring application that fetches data from web using an API I bumped multiple times into OutOfMemoryError: GC overhead limit exceeded. After some profiling sessions I started to question my model, which is something like this:

@Entity
class A {
  @Id
  private Integer id;
  private String name;

  @OneToMany
  private Set<B> b1;

  @OneToMany
  private Set<B> b2;
}

@Entity
Class B {
  @Id
  private Integer id;

  @ManyToOne
  private A a1;

  @ManyToOne
  private A a2;
}

There is a CrudRepository assigned to manage these entities (JPA + EclipseLink). Entity loading is default, which in this case means eager AFAIK.

The program attempts to do the following:

// populates the set with 2500 A instances.
Set<A> aCollection = fetchAFromWebAPI();
for (A a : aCollection) {
  // populates b1 and b2 of each A with a 100 of B instances
  fetchBFromWebAPI(a);
  aRepository.save(a);
}

By the end of this process there would be 500k B instances, except it never reaches the end because of OutOfMemoryError: GC overhead limit exceeded. Now I could add more memory, but I want to understand why all these instances aren't garbage collected? Save an A to the database and forget it. Is this because A instances have B instances in their b1 or b2 that in their turn reference A instances?

Another observation I made is that the process runs significantly more smoothly for the first time, when there is no data in database.

Is there something fundamentally wrong with this model or this process?

like image 770
Limbo Exile Avatar asked Mar 18 '23 18:03

Limbo Exile


2 Answers

A JPA transaction has an associated session cache of all entities used in the transaction. By saving your entities you keep introducing more instances into that session cache. In your case I'd recommend to use EntityManager.clear() every n entities - that detaches the persisted entities from the session and makes them available for garbage collection.

If you want to learn more about the lifecycle of JPA entities you can refer to e.g.

http://www.objectdb.com/java/jpa/persistence/managed

Edit: Additionally the answer of BatScream also is correct: you seem to accumulate more and more data in every iteration that is still referenced by the set. You might want to consider to remove instances you have processed from the set.

like image 170
Alex Stockinger Avatar answered Apr 06 '23 21:04

Alex Stockinger


The collection aCollection keeps on growing after each iteration. Each instance of A will be populated with 200 entries of B instances after each loop. Hence your heap space gets eaten up.

All the A instances in the collection aCollection are always reachable when the garbage collector runs during this period, since you are not removing the just saved A from the collection.

To avoid this, you can use the Set Iterator to safely remove the just processed A instance from the collection.

like image 44
BatScream Avatar answered Apr 06 '23 19:04

BatScream