What are recommendations when to define method list
and stream
in Spring Data repository?
https://docs.spring.io/spring-data/jpa/docs/current/reference/html/#repositories.query-streaming
Example:
interface UserRepository extends Repository<User, Long> {
List<User> findAllByLastName(String lastName);
Stream<User> streamAllByFirstName(String firstName);
// Other methods defined.
}
Please, note, here I am not asking about Page, Slice - they are clear to me, and I found their description in the documentation.
My assumption (am I wrong?):
Stream does not load all the records into Java Heap. Instead it loads k
records into the heap and processes them one by one; then it loads another k
records and so on.
List does load all the records into Java Heap at once.
If I need some background batch job (for example calculate analytics), I could use stream operation because I will not load all the records into the heap at once.
If I need to return a REST response with all the records, I will need to load them into RAM anyway and serialize them into JSON. In this case, it makes sense to load a list at once.
I saw that some developers collect the stream into a list before returning a response.
class UserController {
public ResponseEntity<List<User>> getUsers() {
return new ResponseEntity(
repository.streamByFirstName()
// OK, for mapper - it is nice syntactic sugar.
// Let's imagine there is not map for now...
// .map(someMapper)
.collect(Collectors.toList()),
HttpStatus.OK);
}
}
For this case, I do not see any advantage of Stream, using list
will make the same end result.
Are then any examples when using list
is justified?
Crud Repository doesn't provide methods for implementing pagination and sorting. JpaRepository ties your repositories to the JPA persistence technology so it should be avoided. We should use CrudRepository or PagingAndSortingRepository depending on whether you need sorting and paging or not.
PagingAndSortingRepository provides methods to do pagination and sort records. JpaRepository provides JPA related methods such as flushing the persistence context and delete records in a batch.
JPAstreamer is a library for expressing JPA/Hibernate/Spring queries using standard Java streams. JPAstreamer instantly gives Java developers type-safe, expressive and intuitive means of obtaining data in database applications.
The goal of Spring Data repository abstraction is to significantly reduce the amount of boilerplate code required to implement data access layers for various persistence stores.
The primary difference in Collection
VS Stream
are the following two aspects:
Let's talk this through with an example. Let's say we need to read 100k Customer
instances from a repository. The way you (have to) handle the result gives a hint at both of the aspects described above.
List<Customer> result = repository.findAllBy();
The client code will receive that list once all elements have been completely read from the underlying data store, not any moment before that. But also, underlying database connections can have been closed. I.e. e.g. in a Spring Data JPA application you will see the underlying EntityManager
be closed and the entity detached unless you actively keep that in a broader scope, e.g. by annotating surrounding methods with @Transactional
or using an OpenEntityManagerInViewFilter
. Also, you don't need to actively close the resources.
A stream will have to be handled like this:
@Transactional
void someMethod() {
try (Stream result = repository.streamAllBy()) {
// … processing goes here
}
}
With a Stream
, the processing can start as soon as the first element (e.g. row in a database) arrives and is mapped. I.e. you will be able to already consume elements while others of the result set are still processed. That also means, that the underlying resources need to actively be kept open and as they're usually bound to the repository method invocation. Note how the Stream
also has to actively be closed (try-with-resources) as it binds underlying resources and we somehow have to signal it to close them.
With JPA, without @Transactional
the Stream
will not be able to be processed properly as the underlying EntityManager
is closed on method return. You'd see a few elements processed but an exception in the middle of the processing.
So while you theoretically can use a Stream
to e.g. build up JSON arrays efficiently, it significantly complicates the picture as you need to keep the core resources open until you've written all elements. That usually means writing the code to map objects to JSON and writing them to the response manually (using e.g. Jackson's ObjectMapper
and the HttpServletResponse
.
While the memory footprint will likely improve, this mostly stems from the fact that you're like avoiding the intermediate creation of collections and additional collections in mapping steps (ResultSet
-> Customer
-> CustomerDTO
-> JSON Object). Elements already processed are not guaranteed to be evicted from memory as they might be held onto for other reasons. Again, e.g. in JPA you'd have to keep the EntityManager
open as it controls the resource lifecycle and thus all elements will stay bound to that EntityManager
and will be kept around until all elements are processed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With