I have the following Step:
return stepBuilderFactory.get("billStep")
.allowStartIfComplete(true)
.chunk(20000)
.reader(billReader)
.processor(billProcessor)
.faultTolerant()
.skipLimit(Integer.MAX_VALUE)
.skip(BillSkipException.class)
.listener(billReaderListener)
.listener(billSkipListener)
.writer(billRepoItemWriter)
.build();
Is my understanding correct, that fault tolerant means that when an exception is thrown in billProcessor, it will be processed in skip listener and then the next row/item will be processed in billProcessor?
I noticed upon adding in debug logs - that items/rows were "re-processed" when an exception is thrown in the processor. (probably because of faultTolerant config. But, what if I am processing 2 million records, and 300,000 of them were skipped - or throws a skip exception - isn't it an issue in performance if some of these were "re-processed")
And the big problem is - the next row/item is skipped. They were not processed in the processor at all.
If I remove the faultTolerant and SkipListener - and directly save the skipped records in the database (what skiplistener is doing) - it is working, but is this solution correct?
Spring Batch will first process the whole chunk at once (in your case 2000) if that fails it will fallback to proces each item individually so it is able to determine which items where faulty and skipped.
Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.
By default , if there's an uncaught exception when processing the job, spring batch will stop the job. If the job is restarted with the same job parameters, it will pick up where it left off. The way it knows where the job status is by checking the job repository where it saves all the spring batch job status.
Spring Batch is a lightweight, comprehensive framework designed to facilitate development of robust batch applications. It also provides more advanced technical services and features that support extremely high volume and high performance batch jobs through its optimization and partitioning techniques.
No job is perfect! Errors happen. You may receive bad data. You may forget one null check that causes a NullPointerException at the worst of times. How you handle errors using Spring Batch is our topic today. There are many scenarios where exceptions encountered while processing should not result in Step failure, but should be skipped instead.
Spring batch Skip technique With the skip technique you may specify certain exception types and a maximum number of skipped items, and whenever one of those skippable exceptions is thrown, the batch job doesn’t fail but skip the item and goes on with the next one. Only when the maximum number of skipped items is reached, the batch job will fail. For example, Spring Batch provides the ability to skip a record when a specified Exception is throw when there is an error reading a record from your input. This section will look at how to use this technique to skip records based upon specific Exceptions. There are two pieces involved in choosing when a record is skipped.
1. Exception Under what conditions to skip the record, specifically what exceptions you will ignore. When any error occurs during the reading process, Spring Batch throws an exception. In order to determine what to skip, you need to identify what exceptions to skip.
2. Skipped records How many input records you will allow the step to skip before considering the step execution failed. If you skip one or two records out of a million, not a big deal; however, skipping half a million out of a million is probably wrong. It’s your responsibility to determine the threshold.
(Spring Batch Exception Handling Example)
This entire processing happens at each individual item level, and not a chunk level. Hence, whenever we whenever the spring batch is not able to process an item at a single go. It tries to re-process/dig down into individual items to determine the exact item to skip. This is fine because with batch jobs we expect certain latency as they usually deal with scheduled big data jobs.
I noticed upon adding in debug logs - that items/rows were "re-processed" when an exception is thrown in the processor. (probably because of faultTolerant config. But, what if I am processing 2 million records, and 300,000 of them were skipped - or throws a skip exception - isn't it an issue in performance if some of these were "re-processed")
I occured same issue and I fixed using processorNonTransactional
method.
@Bean
public Step myStep() {
return stepBuilderFactory.get("myStep")
.<MyObject, MyObject>chunk(1000)
.reader(myItemReader())
.processor(myItemProcessor())
.writer(jdbcBatchItemWriter())
.faultTolerant()
.processorNonTransactional()
.skip(MyException.class)
.skipLimit(200)
.build();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With