I have a straight forward requirement in which, i need to read a list of items(from DB) and need to process the items and once processed, it has to be updated into DB. I'm thinking of using Spring batch Chunks with reader, processor and writer. My reader will return one item at a time from the list and sends it to processor and once processing is over, it returns to Writer where it updates the DB I may be multithreading it later with some cost of synchronization in these methods. Here I foresee a few concerns. <ol> <li>Number of items to be processed could be more. May be in 10,000s or even more.</li> <li>some logical calculation is required in the processor. hence processing 1 item at a time. not sure about the performance even if it is multithreaded with 10 threads.</li> <li>Writer can update the results in the DB for that processed item. Not sure how to do batch updates because it always has only 1 item processed and ready.</li> </ol> Is this approach correct for this kind of usecase or anything better can be done? Is there anyother way of processing a bunch of items at one call of reader, processor & writer? if so, do i need to create some mechnism where i extract say 10 items from the list and give it to processor? it seems writer updates each records as it comes, batch updates makes sense only if the writer receives a bunch of processed items. any suggestion? Please throw some lights on this design for better performance. Thanks,

Spring Batch is the perfect tool to do what you need. The chunk oriented step let you configure how many items you want to read/process/write with the commit-interval property. <pre class="prettyprint"><code> <batch:step id="step1" next="step2"> <batch:tasklet transaction-manager="transactionManager" start-limit="100"> <batch:chunk reader="myReader" processor="myProcessor" writer="MyWriter" commit-interval="800" /> <batch:listeners> <batch:listener ref="myListener" /> </batch:listeners> </batch:tasklet> </batch:step> </code></pre> Let say your reader will call a SELECT statement that returns 10 000 records. And you set a commit-interval=500. MyReader will call the read() method 500 times. Let say that in reality, the reader implementation might in fact remove items from the resultSet. For each call to read(), it will also call the process() method of MyProcessor. But it will not call the write() method of MyWriter until the commit-interval is reached. If you look at the definition of the interface ItemWriter: <pre class="prettyprint"><code>public interface ItemWriter<T> { /** * Process the supplied data element. Will not be called with any null items * in normal operation. * * @throws Exception if there are errors. The framework will catch the * exception and convert or rethrow it as appropriate. */ void write(List<? extends T> items) throws Exception; } </code></pre> You see that the write receive a List of items. This list will be the size of your commit-interval (or less if the end is reached) And btw, 10 000 of records is nothing. You may consider multithreading if you have to deal with millions of records. But even then, just playing around with the sweet spot of the commit-interval value will probably be enough. Hope it helps

Deciding between Spring Batch Step, Tasklet or Chunks

Tags:

spring-batch

I have a straight forward requirement in which, i need to read a list of items(from DB) and need to process the items and once processed, it has to be updated into DB.

I'm thinking of using Spring batch Chunks with reader, processor and writer. My reader will return one item at a time from the list and sends it to processor and once processing is over, it returns to Writer where it updates the DB

I may be multithreading it later with some cost of synchronization in these methods.

Here I foresee a few concerns.

Number of items to be processed could be more. May be in 10,000s or even more.
some logical calculation is required in the processor. hence processing 1 item at a time. not sure about the performance even if it is multithreaded with 10 threads.
Writer can update the results in the DB for that processed item. Not sure how to do batch updates because it always has only 1 item processed and ready.

Is this approach correct for this kind of usecase or anything better can be done? Is there anyother way of processing a bunch of items at one call of reader, processor & writer? if so, do i need to create some mechnism where i extract say 10 items from the list and give it to processor? it seems writer updates each records as it comes, batch updates makes sense only if the writer receives a bunch of processed items. any suggestion?

Please throw some lights on this design for better performance.

Thanks,

366

asked Jun 17 '13 08:06

Vimal

1 Answers

Spring Batch is the perfect tool to do what you need.

The chunk oriented step let you configure how many items you want to read/process/write with the commit-interval property.

        <batch:step id="step1" next="step2">
        <batch:tasklet transaction-manager="transactionManager" start-limit="100">
            <batch:chunk reader="myReader" processor="myProcessor" writer="MyWriter" commit-interval="800" />
            <batch:listeners>
                <batch:listener ref="myListener" />
            </batch:listeners>
        </batch:tasklet>
    </batch:step>

Let say your reader will call a SELECT statement that returns 10 000 records. And you set a commit-interval=500.

MyReader will call the read() method 500 times. Let say that in reality, the reader implementation might in fact remove items from the resultSet. For each call to read(), it will also call the process() method of MyProcessor.

But it will not call the write() method of MyWriter until the commit-interval is reached.

If you look at the definition of the interface ItemWriter:

public interface ItemWriter<T> {

/**
 * Process the supplied data element. Will not be called with any null items
 * in normal operation.
 * 
 * @throws Exception if there are errors. The framework will catch the
 * exception and convert or rethrow it as appropriate.
 */
void write(List<? extends T> items) throws Exception;

}

You see that the write receive a List of items. This list will be the size of your commit-interval (or less if the end is reached)

And btw, 10 000 of records is nothing. You may consider multithreading if you have to deal with millions of records. But even then, just playing around with the sweet spot of the commit-interval value will probably be enough.

Hope it helps

120

answered Sep 22 '22 06:09

Cygnusx1

Related questions
                            
                                How to read all files in a folder with spring-batch and MultiResourceItemReader?
                            
                                How does Spring Batch manage transactions (with possibly multiple datasources)?
                            
                                Change Spring Boot project to inherit custom dependency management
                            
                                How can you restart a failed spring batch job and let it pick up where it left off?
                            
                                Creating Indices name Dynamically in Elasticsearch using Spring-Data Elasticsearch
                            
                                Spring Batch JUnit test for multiple jobs
                            
                                How to set up multi-threading in Spring Batch?
                            
                                Synchronizing table data across databases
                            
                                Spring batch jpaPagingItemReader why some rows are not read?
                            
                                Spring Batch asynchronous processor configuration for best performance
                            
                                Spring batch Job read from multiple sources
                            
                                What is the difference between spring scheduled tasks and spring batch jobs
                            
                                Difference between Batch Status and Exit Status in Spring Batch
                            
                                Spring Batch configuration error in processor
                            
                                Run Spring Batch Job programmatically?
                            
                                Difference between spring batch remote chunking and remote partitioning
                            
                                How to get an ideal number of threads in parallel programs in Java?
                            
                                Spring Batch ResultSet got closed by other before all data being fetched
                            
                                Spring boot integration with spring batch and jpa
                            
                                Overriding bean definition for bean 'X': replacing [Generic bean Y]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With