Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Data JPA - simulate a "create + join" query for an existing collection

Let's say I have a List of entities:

List<SomeEntity> myEntities = new ArrayList<>();

SomeEntity.java:

@Entity
@Table(name = "entity_table")
public class SomeEntity{

@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private long id;
private int score;

public SomeEntity() {}

public SomeEntity(long id, int score) {
    this.id = id;
    this.score = score;
}

MyEntityRepository.java:

@Repository
public interface MyEntityRepository extends JpaRepository<SomeEntity, Long> {

List<SomeEntity> findAllByScoreGreaterThan(int Score);
}

So when I run:

myEntityRepository.findAllByScoreGreaterThan(10);

Then Hibernate will load all of the records in the table into memory for me. There are millions of records, so I don't want that. Then, in order to intersect, I need to compare each record in the result set to my List. In native MySQL, what I would have done in this situation is:

  1. create a temporary table and insert into it the entities' ids from the List.
  2. join this temporary table with the "entity_table", use the score filter and then only pull the entities that are relevant to me (the ones that were in the list in the first place).

This way I gain a big performance increase, avoid any OutOfMemoryErrors and have the machine of the database do most of the work.

Is there a way to achieve such an outcome with Spring Data JPA's query methods (with hibernate as the JPA provider)? I couldn't find in the documentation or in SO any such use case.

like image 363
unlimitednzt Avatar asked Mar 24 '16 13:03

unlimitednzt


2 Answers

I understand you have a set of entity_table identifiers and you want to find each entity_table whose identifier is in that subset and whose score is greater than a given score.

So the obvious question is: how did you arrive to the initial subset of entity_tables and couldn't you just add the criteria of that query to your query that also checks for "score is greater than x"?

But if we ignore that, I think there's two possible solutions. If the list of some_entity identifiers is small (what exactly is "small" depends on your database), you could just use an IN clause and define your method as:

List<SomeEntity> findByScoreGreaterThanAndIdIn(int score, Set<Long) ids)

If the number of identifiers is too large to fit in an IN clause (or you're worried about the performance of using an IN clause) and you need to use a temporary table, the recipe would be:

  1. Create an entity that maps to your temporary table. Create a Spring Data JPA repository for it:

    class TempEntity {
        @Id
        private Long entityId;
    }
    
    interface TempEntityRepository extends JpaRepository<TempEntity,Long> { }
    
  2. Use its save method to save all the entity identifiers into the temporary table. As long as you enable insert batching this should perform all right -- how to enable differs per database and JPA provider, but for Hibernate at the very least set the hibernate.jdbc.batch_size Hibernate property to a sufficiently large value. Also flush() and clear() your entityManager regularly or all your temp table entities will accumulate in the persistence context and you'll still run out of memory. Something along the lines of:

    int count = 0;
    for (SomeEntity someEntity : myEntities) {
        tempEntityRepository.save(new TempEntity(someEntity.getId());
        if (++count == 1000) {
            entityManager.flush();
            entityManager.clear();
        }
    }
    
  3. Add a find method to your SomeEntityRepository that runs a native query that does the select on entity_table and joins to the temp table:

    @Query("SELECT id, score FROM entity_table t INNER JOIN temp_table tt ON t.id = tt.id WHERE t.score > ?1", nativeQuery = true)
    List<SomeEntity> findByScoreGreaterThan(int score);
    
  4. Make sure you run both methods in the same transaction, so create a method in a @Service class that you annotate with @Transactional(Propagation.REQUIRES_NEW) that calls both repository methods in succession. Otherwise your temp table's contents will be gone by the time the SELECT query runs and you'll get zero results.

You might be able to avoid native queries by having your temp table entity have a @ManyToOne to SomeEntity since then you can join in JPQL; I'm just not sure if you'll be able to avoid actually loading the SomeEntitys to insert them in that case (or if creating a new SomeEntity with just an ID would work). But since you say you already have a list of SomeEntity that's perhaps not a problem.

I need something similar myself, so will amend my answer as I get a working version of this.

like image 114
Frans Avatar answered Nov 20 '22 03:11

Frans


You can:

1) Make a paginated native query via JPA (remember to add an order clause to it) and process a fixed amount of records

2) Use a StatelessSession (see the documentation)

like image 1
Matteo Baldi Avatar answered Nov 20 '22 03:11

Matteo Baldi