Let's say I have a List of entities:
List<SomeEntity> myEntities = new ArrayList<>();
SomeEntity.java:
@Entity
@Table(name = "entity_table")
public class SomeEntity{
@Id
@GeneratedValue(strategy = GenerationType.AUTO)
private long id;
private int score;
public SomeEntity() {}
public SomeEntity(long id, int score) {
this.id = id;
this.score = score;
}
MyEntityRepository.java:
@Repository
public interface MyEntityRepository extends JpaRepository<SomeEntity, Long> {
List<SomeEntity> findAllByScoreGreaterThan(int Score);
}
So when I run:
myEntityRepository.findAllByScoreGreaterThan(10);
Then Hibernate will load all of the records in the table into memory for me. There are millions of records, so I don't want that. Then, in order to intersect, I need to compare each record in the result set to my List. In native MySQL, what I would have done in this situation is:
This way I gain a big performance increase, avoid any OutOfMemoryErrors and have the machine of the database do most of the work.
Is there a way to achieve such an outcome with Spring Data JPA's query methods (with hibernate as the JPA provider)? I couldn't find in the documentation or in SO any such use case.
I understand you have a set of entity_table
identifiers and you want to find each entity_table
whose identifier is in that subset and whose score is greater than a given score.
So the obvious question is: how did you arrive to the initial subset of entity_table
s and couldn't you just add the criteria of that query to your query that also checks for "score is greater than x"?
But if we ignore that, I think there's two possible solutions. If the list of some_entity
identifiers is small (what exactly is "small" depends on your database), you could just use an IN
clause and define your method as:
List<SomeEntity> findByScoreGreaterThanAndIdIn(int score, Set<Long) ids)
If the number of identifiers is too large to fit in an IN
clause (or you're worried about the performance of using an IN
clause) and you need to use a temporary table, the recipe would be:
Create an entity that maps to your temporary table. Create a Spring Data JPA repository for it:
class TempEntity {
@Id
private Long entityId;
}
interface TempEntityRepository extends JpaRepository<TempEntity,Long> { }
Use its save
method to save all the entity identifiers into the temporary table. As long as you enable insert batching this should perform all right -- how to enable differs per database and JPA provider, but for Hibernate at the very least set the hibernate.jdbc.batch_size
Hibernate property to a sufficiently large value. Also flush()
and clear()
your entityManager
regularly or all your temp table entities will accumulate in the persistence context and you'll still run out of memory. Something along the lines of:
int count = 0;
for (SomeEntity someEntity : myEntities) {
tempEntityRepository.save(new TempEntity(someEntity.getId());
if (++count == 1000) {
entityManager.flush();
entityManager.clear();
}
}
Add a find
method to your SomeEntityRepository
that runs a native query that does the select on entity_table
and joins to the temp table:
@Query("SELECT id, score FROM entity_table t INNER JOIN temp_table tt ON t.id = tt.id WHERE t.score > ?1", nativeQuery = true)
List<SomeEntity> findByScoreGreaterThan(int score);
@Service
class that you annotate with @Transactional(Propagation.REQUIRES_NEW)
that calls both repository methods in succession. Otherwise your temp table's contents will be gone by the time the SELECT
query runs and you'll get zero results.You might be able to avoid native queries by having your temp table entity have a @ManyToOne
to SomeEntity
since then you can join in JPQL; I'm just not sure if you'll be able to avoid actually loading the SomeEntity
s to insert them in that case (or if creating a new SomeEntity
with just an ID would work). But since you say you already have a list of SomeEntity
that's perhaps not a problem.
I need something similar myself, so will amend my answer as I get a working version of this.
You can:
1) Make a paginated native query via JPA (remember to add an order clause to it) and process a fixed amount of records
2) Use a StatelessSession (see the documentation)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With