I have a straight forward requirement in which, i need to read a list of items(from DB) and need to process the items and once processed, it has to be updated into DB.
I'm thinking of using Spring batch Chunks with reader, processor and writer. My reader will return one item at a time from the list and sends it to processor and once processing is over, it returns to Writer where it updates the DB
I may be multithreading it later with some cost of synchronization in these methods.
Here I foresee a few concerns.
Is this approach correct for this kind of usecase or anything better can be done? Is there anyother way of processing a bunch of items at one call of reader, processor & writer? if so, do i need to create some mechnism where i extract say 10 items from the list and give it to processor? it seems writer updates each records as it comes, batch updates makes sense only if the writer receives a bunch of processed items. any suggestion?
Please throw some lights on this design for better performance.
Thanks,
While Tasklets feel more natural for 'one task after the other' scenarios, chunks provide a simple solution to deal with paginated reads or situations where we don't want to keep a significant amount of data in memory. The complete implementation of this example can be found in the GitHub project.
With enable annotation, you can use Spring batch features and provide a base configuration for setting up batch jobs in a Configuration class. In the above code, the chunk size is set to 5, the default batch chunk size is 1. So, it reads, processes, and writes 5 of the data set each time.
In Spring batch, the Tasklet is an interface, which will be called to perform a single task only, like clean or set up resources before or after any step execution. In this example, we will show you how to use Tasklet to clean up the resource (folders) after a batch job is completed.
Spring Batch uses chunk oriented style of processing which is reading data one at a time, and creating chunks that will be written out within a transaction. The item is read by ItemReader and passed onto ItemProcessor, then it is written out by ItemWriter once the item is ready.
Spring Batch is the perfect tool to do what you need.
The chunk oriented step let you configure how many items you want to read/process/write with the commit-interval property.
<batch:step id="step1" next="step2">
<batch:tasklet transaction-manager="transactionManager" start-limit="100">
<batch:chunk reader="myReader" processor="myProcessor" writer="MyWriter" commit-interval="800" />
<batch:listeners>
<batch:listener ref="myListener" />
</batch:listeners>
</batch:tasklet>
</batch:step>
Let say your reader will call a SELECT statement that returns 10 000 records. And you set a commit-interval=500.
MyReader will call the read() method 500 times. Let say that in reality, the reader implementation might in fact remove items from the resultSet. For each call to read(), it will also call the process() method of MyProcessor.
But it will not call the write() method of MyWriter until the commit-interval is reached.
If you look at the definition of the interface ItemWriter:
public interface ItemWriter<T> {
/**
* Process the supplied data element. Will not be called with any null items
* in normal operation.
*
* @throws Exception if there are errors. The framework will catch the
* exception and convert or rethrow it as appropriate.
*/
void write(List<? extends T> items) throws Exception;
}
You see that the write receive a List of items. This list will be the size of your commit-interval (or less if the end is reached)
And btw, 10 000 of records is nothing. You may consider multithreading if you have to deal with millions of records. But even then, just playing around with the sweet spot of the commit-interval value will probably be enough.
Hope it helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With