Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chunk reading in Spring Batch - not only chunk writing

My assumption

In my understanding "chunk oriented processing" in Spring Batch helps me to efficiently process multiple items in a single transaction. This includes efficient use of interfaces from external systems. As external communication includes overhead, it should be limited and chunk-oriented too. That's why we have the commit-level for the ItemWriter. So what I don't get is, why does the ItemReader still have to read item-by-item? Why can't I read chunks also?

Problem description

In my step, the reader has to call a webservice. And the writer will send this information to another webservice. That's why I wan't to do as few calls as necessary.

The interface of the ItemWriter is chunk-oriented - as you know for sure:

public abstract void write(List<? extends T> paramList) throws Exception;

But the ItemReader is not:

public abstract T read() throws Exception;

As a workaround I implemented a ChunkBufferingItemReader, which reads a list of items, stores them and returns items one-by-one whenever its read() method is called.

But when it comes to exception handling and restarting of a job now, this approach is getting messy. I'm getting the feeling that I'm doing work here, which the framework should do for me.

Question

So am I missing something? Is there any existing functionality in Spring Batch I just overlooked?

In another post it was suggested to change the return type of the ItemReader to a List. But then my ItemProcessor would have to emit multiple outputs from a single input. Is this the right approach?

I'm graceful for any best practices. Thanks in advance :-)

like image 290
Peter Wippermann Avatar asked Dec 20 '12 10:12

Peter Wippermann


People also ask

How do you read data in chunks in Spring Batch?

Spring Batch uses chunk oriented style of processing which is reading data one at a time, and creating chunks that will be written out within a transaction. The item is read by ItemReader and passed onto ItemProcessor, then it is written out by ItemWriter once the item is ready.

How does Spring Batch define chunk size?

In the above code, the chunk size is set to 5, the default batch chunk size is 1. So, it reads, processes, and writes 5 of the data set each time. The reader can be defined by using the ItemReader interface which comes from the Spring Batch framework.

How does Spring Batch reader work?

An Item Reader reads data into the spring batch application from a particular source, whereas an Item Writer writes data from Spring Batch application to a particular destination. An Item processor is a class which contains the processing code which processes the data read in to the spring batch.

Which is the correct statement about chunk oriented processing?

Chunk oriented processing refers to reading the data one at a time, and creating 'chunks' that will be written out, within a transaction boundary. One item is read in from an ItemReader , handed to an ItemProcessor , and aggregated.


1 Answers

This is a draft for an implementation of the read() interface method.

public T read() throws Exception {
    while (this.items.isEmpty()) {
        final List<T> newItems = readChunk();
        if (newItems == null) {
            return null;
        }
        this.items.addAll(newItems);
    }
    return this.items.pop();
}

Please note, that items is a buffer for the items read in chunks and not requested by the framework yet.

like image 104
Peter Wippermann Avatar answered Sep 18 '22 13:09

Peter Wippermann