Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deciding between Spring Batch Step, Tasklet or Chunks

Tags:

spring-batch

I have a straight forward requirement in which, i need to read a list of items(from DB) and need to process the items and once processed, it has to be updated into DB.

I'm thinking of using Spring batch Chunks with reader, processor and writer. My reader will return one item at a time from the list and sends it to processor and once processing is over, it returns to Writer where it updates the DB

I may be multithreading it later with some cost of synchronization in these methods.

Here I foresee a few concerns.

  1. Number of items to be processed could be more. May be in 10,000s or even more.
  2. some logical calculation is required in the processor. hence processing 1 item at a time. not sure about the performance even if it is multithreaded with 10 threads.
  3. Writer can update the results in the DB for that processed item. Not sure how to do batch updates because it always has only 1 item processed and ready.

Is this approach correct for this kind of usecase or anything better can be done? Is there anyother way of processing a bunch of items at one call of reader, processor & writer? if so, do i need to create some mechnism where i extract say 10 items from the list and give it to processor? it seems writer updates each records as it comes, batch updates makes sense only if the writer receives a bunch of processed items. any suggestion?

Please throw some lights on this design for better performance.

Thanks,

like image 366
Vimal Avatar asked Jun 17 '13 08:06

Vimal


People also ask

What is the difference between chunk and Tasklet?

While Tasklets feel more natural for 'one task after the other' scenarios, chunks provide a simple solution to deal with paginated reads or situations where we don't want to keep a significant amount of data in memory. The complete implementation of this example can be found in the GitHub project.

What is the ideal chunk size in Spring Batch?

With enable annotation, you can use Spring batch features and provide a base configuration for setting up batch jobs in a Configuration class. In the above code, the chunk size is set to 5, the default batch chunk size is 1. So, it reads, processes, and writes 5 of the data set each time.

What is the use of Tasklet in Spring Batch?

In Spring batch, the Tasklet is an interface, which will be called to perform a single task only, like clean or set up resources before or after any step execution. In this example, we will show you how to use Tasklet to clean up the resource (folders) after a batch job is completed.

What is chunking in Spring Batch?

Spring Batch uses chunk oriented style of processing which is reading data one at a time, and creating chunks that will be written out within a transaction. The item is read by ItemReader and passed onto ItemProcessor, then it is written out by ItemWriter once the item is ready.


1 Answers

Spring Batch is the perfect tool to do what you need.

The chunk oriented step let you configure how many items you want to read/process/write with the commit-interval property.

        <batch:step id="step1" next="step2">
        <batch:tasklet transaction-manager="transactionManager" start-limit="100">
            <batch:chunk reader="myReader" processor="myProcessor" writer="MyWriter" commit-interval="800" />
            <batch:listeners>
                <batch:listener ref="myListener" />
            </batch:listeners>
        </batch:tasklet>
    </batch:step>

Let say your reader will call a SELECT statement that returns 10 000 records. And you set a commit-interval=500.

MyReader will call the read() method 500 times. Let say that in reality, the reader implementation might in fact remove items from the resultSet. For each call to read(), it will also call the process() method of MyProcessor.

But it will not call the write() method of MyWriter until the commit-interval is reached.

If you look at the definition of the interface ItemWriter:

public interface ItemWriter<T> {

/**
 * Process the supplied data element. Will not be called with any null items
 * in normal operation.
 * 
 * @throws Exception if there are errors. The framework will catch the
 * exception and convert or rethrow it as appropriate.
 */
void write(List<? extends T> items) throws Exception;

}

You see that the write receive a List of items. This list will be the size of your commit-interval (or less if the end is reached)

And btw, 10 000 of records is nothing. You may consider multithreading if you have to deal with millions of records. But even then, just playing around with the sweet spot of the commit-interval value will probably be enough.

Hope it helps

like image 120
Cygnusx1 Avatar answered Sep 22 '22 06:09

Cygnusx1