Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Spring Data repository: list vs stream

What are recommendations when to define method list and stream in Spring Data repository?



interface UserRepository extends Repository<User, Long> {

  List<User> findAllByLastName(String lastName);

  Stream<User> streamAllByFirstName(String firstName);                    
  // Other methods defined.

Please, note, here I am not asking about Page, Slice - they are clear to me, and I found their description in the documentation.

My assumption (am I wrong?):

  1. Stream does not load all the records into Java Heap. Instead it loads k records into the heap and processes them one by one; then it loads another k records and so on.

  2. List does load all the records into Java Heap at once.

  3. If I need some background batch job (for example calculate analytics), I could use stream operation because I will not load all the records into the heap at once.

  4. If I need to return a REST response with all the records, I will need to load them into RAM anyway and serialize them into JSON. In this case, it makes sense to load a list at once.

I saw that some developers collect the stream into a list before returning a response.

class UserController {

    public ResponseEntity<List<User>> getUsers() {
        return new ResponseEntity(
                        // OK, for mapper - it is nice syntactic sugar. 
                        // Let's imagine there is not map for now...
                        // .map(someMapper)  

For this case, I do not see any advantage of Stream, using list will make the same end result.

Are then any examples when using list is justified?

like image 748
Yan Khonski Avatar asked Jul 27 '20 12:07

Yan Khonski

People also ask

Which is better CrudRepository or JpaRepository?

Crud Repository doesn't provide methods for implementing pagination and sorting. JpaRepository ties your repositories to the JPA persistence technology so it should be avoided. We should use CrudRepository or PagingAndSortingRepository depending on whether you need sorting and paging or not.

What is difference between JpaRepository and PagingAndSortingRepository?

PagingAndSortingRepository provides methods to do pagination and sort records. JpaRepository provides JPA related methods such as flushing the persistence context and delete records in a batch.

What is JPA stream?

JPAstreamer is a library for expressing JPA/Hibernate/Spring queries using standard Java streams. JPAstreamer instantly gives Java developers type-safe, expressive and intuitive means of obtaining data in database applications.

What is spring data repository?

The goal of Spring Data repository abstraction is to significantly reduce the amount of boilerplate code required to implement data access layers for various persistence stores.

Video Answer

1 Answers


The primary difference in Collection VS Stream are the following two aspects:

  1. Time to first result – when does the client code see the first element?
  2. The state of resources while processing - in what state are underlying infrastructure resources while the stream is processed?

Working with collections

Let's talk this through with an example. Let's say we need to read 100k Customer instances from a repository. The way you (have to) handle the result gives a hint at both of the aspects described above.

List<Customer> result = repository.findAllBy();

The client code will receive that list once all elements have been completely read from the underlying data store, not any moment before that. But also, underlying database connections can have been closed. I.e. e.g. in a Spring Data JPA application you will see the underlying EntityManager be closed and the entity detached unless you actively keep that in a broader scope, e.g. by annotating surrounding methods with @Transactional or using an OpenEntityManagerInViewFilter. Also, you don't need to actively close the resources.

Working with streams

A stream will have to be handled like this:

void someMethod() {

  try (Stream result = repository.streamAllBy()) {
    // … processing goes here

With a Stream, the processing can start as soon as the first element (e.g. row in a database) arrives and is mapped. I.e. you will be able to already consume elements while others of the result set are still processed. That also means, that the underlying resources need to actively be kept open and as they're usually bound to the repository method invocation. Note how the Stream also has to actively be closed (try-with-resources) as it binds underlying resources and we somehow have to signal it to close them.

With JPA, without @Transactional the Stream will not be able to be processed properly as the underlying EntityManager is closed on method return. You'd see a few elements processed but an exception in the middle of the processing.

Downstream usage

So while you theoretically can use a Stream to e.g. build up JSON arrays efficiently, it significantly complicates the picture as you need to keep the core resources open until you've written all elements. That usually means writing the code to map objects to JSON and writing them to the response manually (using e.g. Jackson's ObjectMapper and the HttpServletResponse.

Memory footprint

While the memory footprint will likely improve, this mostly stems from the fact that you're like avoiding the intermediate creation of collections and additional collections in mapping steps (ResultSet -> Customer -> CustomerDTO -> JSON Object). Elements already processed are not guaranteed to be evicted from memory as they might be held onto for other reasons. Again, e.g. in JPA you'd have to keep the EntityManager open as it controls the resource lifecycle and thus all elements will stay bound to that EntityManager and will be kept around until all elements are processed.

like image 172
Oliver Drotbohm Avatar answered Oct 03 '22 16:10

Oliver Drotbohm