Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Batch difference between pageSize and commit-interval

Tags:

spring-batch

Whats the relationship/difference between Spring-Batch Reader 'pageSize' property and Writer 'commit-interval'.

I may be wrong but I see a pattern in my application that for every pageSize exceeded I get see one commit being made. Is this true.?

Thanks

like image 245
Aditya Avatar asked Mar 02 '15 09:03

Aditya


People also ask

What is Spring Batch commit interval?

The Commit Interval As mentioned previously, a step reads in and writes out items, periodically committing using the supplied PlatformTransactionManager . With a commit-interval of 1, it commits after writing each individual item.

How does chunk size work in Spring Batch?

Spring Batch collects items one at a time from the item reader into a chunk, where the chunk size is configurable. Spring Batch then sends the chunk to the item writer and goes back to using the item reader to create another chunk, and so on until the input is exhausted. This is what we call chunk processing.

What is the ideal chunk size in Spring Batch?

With enable annotation, you can use Spring batch features and provide a base configuration for setting up batch jobs in a Configuration class. In the above code, the chunk size is set to 5, the default batch chunk size is 1. So, it reads, processes, and writes 5 of the data set each time.

What is Tasklet and chunk in Spring Batch?

In this case, there's only one step performing only one tasklet. However, that tasklet defines a reader, a writer and a processor that will act over chunks of data. Note that the commit interval indicates the amount of data to be processed in one chunk. Our job will read, process and write two lines at a time.


2 Answers

The commit-interval defines how many items are processed within a single chunk. That number of items are read, processed, then written within the scope of a single transaction (skip/retry semantics not withstanding).

The page-size attribute on the paging ItemReader implementations (JdbcPagingItemReader for example) defines how many records are fetched per read of the underlying resource. So in the JDBC example, it's how many records are requested with a single hit to the DB.

While there is no direct correlation between the two attributes, it's typically considered a good idea to make them match, however they independently provide two knobs you can turn to modify the performance of your application.

With regards to your direct question, if you have the page-size set to the same as the commit-interval, then yes, I'd expect a single commit for each page.

like image 188
Michael Minella Avatar answered Sep 22 '22 17:09

Michael Minella


Commit interval determines how many items will be processed in a Chunk.

Page size determines how many items will be fetched every time it is needed.

Depending on the numbers you set, the behavior may be the one you describe. They are used for optimization.

like image 37
andreadi Avatar answered Sep 20 '22 17:09

andreadi