Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JPA starts to consume more and more memory after each iteration

Currently I try to store some news from a web api with the help of JPA. I have 3 entities i need to store: Webpage, NewsPost and the Query that returned the news post. I have one table for each of the three. My simpliefied JPA entities looking like the following ones:

@Entity
@Data
@Table(name = "NewsPosts", schema = "data")
@EqualsAndHashCode
@NoArgsConstructor
@AllArgsConstructor
@Builder
public class NewsPost {

    @Id
    @Column(name = "id")
    private long id;
    @Basic
    @Column(name = "subject")
    private String subject;
    @Basic
    @Column(name = "post_text")
    private String postText;

    @ManyToOne(fetch = FetchType.LAZY, cascade = CascadeType.MERGE)
    @JoinColumn(name = "newsSite")
    private NewsSite site;

    @ManyToMany(fetch = FetchType.EAGER, cascade = CascadeType.MERGE)
    @JoinTable(name = "query_news_post", joinColumns = @JoinColumn(name = "newsid"), inverseJoinColumns = @JoinColumn(name = "queryid"))
    private Set<QueryEntity> queries;
}


@Entity
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Table(name = "queries", schema = "data")
@EqualsAndHashCode
public class QueryEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "id")
    private int id;
    @EqualsAndHashCode.Exclude
    @Basic
    @Column(name = "query")
    private String query;

    // needs to be exclueded otherwise we can create stack overflow, because of circular references...
    @EqualsAndHashCode.Exclude
    @ToString.Exclude
    @ManyToMany(mappedBy = "queries", fetch = FetchType.LAZY, cascade = CascadeType.MERGE)
    Set<PostsEntity> posts;

}



@Entity
@Data
@Table(name = "sites", schema = "data")
@EqualsAndHashCode
@NoArgsConstructor
@AllArgsConstructor
@Builder
public class newsSite {
    @Id
    @Column(name = "SiteId")
    private long id;
    @Basic
    @Column(name = "SiteName")
    private String site;

}

Currently I'm doing the following: I create the query and retrieve the of the query. Then i start crawling: I get the objects from the web api back in paginated fashion with pagesize of 100 newsPosts i use an object mapper to map the json response to my entity classes.

Afterwards i tried two different thins:

  1. I added the query ID as Set to the NewsPost and wrote it back to DB with the EntityManager's merge option. This works quite well until i came to the point, where I got a NewsPost again for another query, then the new query is overwritten by the old one. To solve this it tried 2.
  2. I check if the NewsPost already exists if it does i retrieved the post added the new query to the existing one and merged it back to the database as i did before. When doing this i works quite well and i get the expected result for the first batches but then suddenly the application start to consume more and more memory for the thrid batch. I attachted a screenshot from JavaVisualVM. Has somebody an idea why this happens? JavaVisualVM

Edit: As some Questions were raised in the comments i would like to provide the answers to the questions here.

I think with the crawling everything works fine. The return of the Webapi comes as json. I'm using jackson mapper to map this to a POJO and afterwards I'm using the Dozer mapper to convert to POJO to the Entity. (Yes i need the step to POJO first for other purposes in the application this is workin fine).

Regarding the writing with the EntityManager I'm not sure if I'm doing that correctly.

At first i created a JPA repo for checking if the post already exists (to get the old query ids and avoid the issue with the overwriting in the queryid, postid table). My JPA repo looks as follows.

@Repository
public interface PostRepo extends JpaRepository<NewsPost, Long> {

    NewsPost getById(long id);
}

To update the posts I'm doing this as follows:

private void updatePosts(List<NewsPost> posts){
    posts.forEach(post->{
                NewsPost foundPost = postRepo.getById(post.getId());
                if(foundPost!=null){
                    post.getQueries().addAll(foundPost.getQueries());   
                }});
}

I'm currently writing my entities as follows i have a list of entities the contains also the updated posts and i have an autowired EntityManagerFactory in my class that handles the writing.

EntityManager em = entityManagerFactory.createEntityManager();
        try {
            EntityTransaction transaction = em.getTransaction();
            transaction.begin();
            entities.forEach(entity->em.merge(entity))
            em.flush();
            transaction.commit();
        } finally {
            em.clear();
            em.close();
        }

I'm pretty sure that it is the writing process. If i keep the logic of my software the same but only skip the merge or just printing or dumping the entities to a file everything works and fast and no error appears so it seems to be an issue with the merge comment?

Regarding the question if my program dies because of the memory consumption it depends. If I run it on my mac is consumes up to 8+ gigabytes of ram but MAC OS is handling this and swaps the ram to disk. If I run it as a docker container von CentOS the process is killed due to to less memory.

Don't now if this is relevant, but I'm using OpenJDK 11, Springboot 2.2.6, and a MYSQL 8 Database.

I configured jpa as follows in my application.yml:

spring:
  main:
    allow-bean-definition-overriding: true
  datasource:
    url: "jdbc:mysql://db"
    username: user
    password: secret
    driver-class-name: com.mysql.cj.jdbc.Driver
    test-while-idle: true
    validation-query: Select 1
  jpa:
    database-platform: org.hibernate.dialect.MySQL8Dialect
    hibernate:
      ddl-auto: none
    properties:
      hibernate:
        event:
          merge:
            entity_copy_observer: allow
    ```
like image 302
Bierbarbar Avatar asked Oct 26 '22 17:10

Bierbarbar


2 Answers

If the merge process is the problem, a quick fix to keep memory consumption low in the entityManager could be add a em.flush(); and em.clear(); after every merge:

EntityTransaction transaction = em.getTransaction();
transaction.begin();
entities.forEach(entity-> {
    em.merge(entity);
    em.flush();
    em.clear();
});
transaction.commit();

However, I think you should change your model. Loading all the existing queries of every post just to add new ones is very inefficient. You could model the N-M relation into a new entity and just persist new relations.

like image 145
areus Avatar answered Nov 15 '22 04:11

areus


Solved it on my own with trying around. I created an entity for the many to many relation. Afterwards i created CRUD repositories for each entity and used saveAll from crud repository. This is working fine also with the memory. The GC now produces the expected chainsaw pattern in the memory visualisation. But I still have no clue why the many to many relation I created before with the join table in the annotation created the issues regarding the memory management. Could somebody explain why this solves my problem is ManyToMany creating circular dependencies? But as far as I know GC also finds circular dependencies.

like image 39
Bierbarbar Avatar answered Nov 15 '22 06:11

Bierbarbar