How does Spring Batch transaction management work?

Tags:

spring-batch

I'm trying to understand how Spring Batch does transaction management. This is not a technical question but more of conceptual one: what approach does Spring Batch use and what are the consequences of that approach?

Let me try to clarify this question a bit. For instance, looking at the TaskletStep, I see that generally a step execution looks something like this:

several JobRepository transactions to prepare the step metadata
a business transaction for every chunk to process
more JobRepository transactions to update the step metadata with the results of chunk processing

This seems to make sense. But what about a failure between 2 and 3? This would mean the business transaction was committed but Spring Batch was unable to record that fact in its internal metadata. So a restart would reprocess the same items again even though they have already been committed. Right?

I'm looking for an explanation of these details and the consequences of the design decisions made in Spring Batch. Is this documented somewhere? The Spring Batch reference guide has very few details on this. It simply explains things from the application developer's point of view.

307

asked Mar 27 '15 14:03

klr8

1 Answers

There are two fundamental types of steps in Spring Batch, a Tasklet Step and a chunk based step. Each has it's own transaction details. Let's look at each:

Tasklet Based Step
When a developer implements their own tasklet, the transactionality is pretty straight forward. Each call to the Tasklet#execute method is executed within a transaction. You are correct in that there are updates before and after a step's logic is executed. They are not technically wrapped in a transaction since rollback isn't something we'd want to support for the job repository updates.

Chunk Based Step
When a developer uses a chunk based step, there is a bit more complexity involved due to the added abilities for skip/retry. However, from a simple level, each chunk is processed in a transaction. You still have the same updates before and after a chunk based step that are non-transactional for the same reasons previously mentioned.

The "What if" scenario
In your question, you ask about what would happen if the business logic completed but the updates to the job repository failed for some reason. Would the previously updated items be re-processed on a restart. As in most things, that depends. If you are using stateful readers/writers like the FlatFileItemReader, with each commit of the business transaction, the job repository is updated with the current state of what has been processed (within the same transaction). So in that case, a restart of the job would pick up where it left off...in this case at the end, and process no additional records.

If you are not using stateful readers/writers or have save state turned off, then it is a bit of buyer beware and you may end up with the situation you describe. The default behavior in the framework is to save state so that restartability is preserved.

106

answered Oct 24 '22 21:10

Michael Minella

Related questions
                            
                                Change Spring Boot project to inherit custom dependency management
                            
                                How can you restart a failed spring batch job and let it pick up where it left off?
                            
                                Creating Indices name Dynamically in Elasticsearch using Spring-Data Elasticsearch
                            
                                Spring Batch JUnit test for multiple jobs
                            
                                How to set up multi-threading in Spring Batch?
                            
                                Synchronizing table data across databases
                            
                                Spring batch jpaPagingItemReader why some rows are not read?
                            
                                Spring Batch asynchronous processor configuration for best performance
                            
                                Spring batch Job read from multiple sources
                            
                                What is the difference between spring scheduled tasks and spring batch jobs
                            
                                Difference between Batch Status and Exit Status in Spring Batch
                            
                                Spring Batch configuration error in processor
                            
                                Run Spring Batch Job programmatically?
                            
                                Difference between spring batch remote chunking and remote partitioning
                            
                                How to get an ideal number of threads in parallel programs in Java?
                            
                                Spring Batch ResultSet got closed by other before all data being fetched
                            
                                Spring boot integration with spring batch and jpa
                            
                                Overriding bean definition for bean 'X': replacing [Generic bean Y]
                            
                                Deciding between Spring Batch Step, Tasklet or Chunks
                            
                                Return multiple items from spring batch ItemProcessor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With