Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Batch documentation about chunk-oriented step versus reality?

Tags:

spring-batch

On the documentation of Spring Batch for configuring a step a clear picture describes how the read process and write is performed.

read
process
...
read
process
// until #amountOfReadsAndProcesses = commit interval
write

Corresponding (according to the doc):

List items = new Arraylist();
for(int i = 0; i < commitInterval; i++){
    Object item = itemReader.read()
    Object processedItem = itemProcessor.process(item);
    items.add(processedItem);
}
itemWriter.write(items);

However when I debug and put a breakpoint in the read method of the reader and a breakpoint in the process method of the processor I see the following behaviour:

read
...
read
// until #amountOfReads = commit interval
process
...
process
// until #amountOfProcesses = commit interval
write

So is the documentation wrong? Or am I missing some configuration to make it behave like the documentation (didn't find anything there).

The problem that I have is that each consequetive read now depends on a status from the processor. The reader is a composite that reads two sources in parallel, depending on a the read items in one of the sources only the first, second or both sources are read during one read operation. But the status of which sources to read is made in the processor. Currently the only solution is going for commit-interval 1, which isn't very optimal for performance.

like image 998
Juru Avatar asked Oct 20 '22 23:10

Juru


1 Answers

The short answer is, you are correct, our documentation isn't accurate on the chunking model. It's something that needs to be updated. There are reasons for why it is the way it is (they mainly have to do with how fault tolerance is handled). But that doesn't address your issue. For your use case, there are a couple options:

  • Configure your job using the JSR-352 configuration - The processing model for JSR-352 is what our documentation says (they took it as gospel instead of what Spring Batch really does). Since Spring Batch supports JSR-352, just changing your configuration and how you launch your jobs, you'd get the same results. There are limitations of JSR-352 which are out of scope for this discussion but it's one option.
  • Another option would be to do what Michael Pralow suggests - While I understand your concerns about the separation of concerns, it sounds like you're already breaking that rule given that your processor is generating output that the reader needs (or are you sharing that state in some other way?).
  • Other options - Without knowing more about your job, there may be other ways to structure your job that work well (things like moving logic into multiple steps, etc) and still achieve the separation of concerns that Spring Batch tries to allow for but I'd need to see more of your configuration to be able to help there.
like image 180
Michael Minella Avatar answered Oct 24 '22 01:10

Michael Minella