I'm rethinking our Spring MVC application behavior, whether it's better to pull (Java8 Stream) data from the database or let the database push (Reactive / Observable) it's data and use backpressure to control the amount.
Current situation:
User requests the 30 most recent articlesService does a database query and puts the 30 results into a ListJackson iterates over the List and generates the JSON responseWhy switch the implementation?
It's quite memory consuming, because we keep those 30 objects in memory all the time. That's not needed, because the application processes one object at a time. Though the application should be able to retrieve one object, process it, throw it away, and get the next one.
Java8 Streams? (pull)
With java.util.Stream this is quite easy: The Service creates a Stream, which uses a database cursor behind the scenes. And each time Jackson has written the JSON String for one element of the Stream, it will ask for the next one, which then triggers the database cursor to return the next entry.
RxJava / Reactive / Observable? (push)
Here we have the opposite scenario: The database has to push entry by entry and Jackson has to create the JSON String for each element until the onComplete method has been called.
i.e. the Controller tells the Service: give me an Observable<Article>. Then Jackson can ask for as many database entries as it can process.
Differences and concern:
With Streams there's always some delay between asking for next database entry and retrieving / processing it. This could slow down the JSON response time if the network connection is slow or there is a huge amount of database requests that have to be made to fulfill the response.
Using RxJava there should be always data available to process. And if it's too much, we can use backpressure to slow down the data transfer from database to our application. In the worst case scenario the buffer/queue will contain all requested database entries. Then the memory consumption will be equal to our current solution using a List.
Why am I asking / What am I asking for?
What did I miss? Are there any other pros / cons?
Why did (especially) the Spring Data Team extend their API to support Stream responses from the database, if there's always a (short) delay between each database request/response? This could sum up to some noticeable delay for a huge amount of requested entries.
Is it recommended to go for RxJava (or some other reactive implementation) for this scenario? Or did I miss any drawbacks?
You seem to be talking about the fetch size for an underlying database engine.
If you reduce it to one (fetching and processing one row at a time), yes you will save some space during the request time...
But it usually makes sense to have a reasonable chunk size. If it is too small you will have a lot of expensive network roundtrips. If the chunk size is too large, you are risking to run out of memory or introduce too much of a latency per fetch. So it is a compromise, and the right chunk/fetch size depends on your specific use case.
Regarding reactive approach or not, I believe it is not relevant. Like with RxJava and say Cassandra, one can create an Observable from an asynchronous result set, and it is up to the query (configuration) how many items should be fetched and pushed at a time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With