I have batch job which reads data from bulk files, process it and insert in DB.
I'm using spring's partitioning features using the default partition handler.
<bean class="org.spr...TaskExecutorPartitionHandler">
<property name="taskExecutor" ref="taskExecutor"/>
<property name="step" ref="readFromFile" />
<property name="gridSize" value="10" />
</bean>
What is the significance of the gridSize
here ? I have configured in such a way that it is equal to the concurrency in taskExecutor.
gridSize
specifies the number of data blocks
to create to be processed by (usually) the same number of workers
. Think about it as a number of mapped data blocks in a map/reduce.
Using a StepExecutionSplitter
, given the data, PartitionHandler
"partitions" / splits the data to a gridSize
parts, and sends each part to an independent worker => thread
in your case.
For example, you have 10 rows in DB that need to be processed. If you set the gridSize
to be 5, and you are using a straightforward partition logic, you'd end up with 10 / 5 = 2 rows per thread => 5 threads working concurrently on 2 rows each.
Per the API,
Passed to the StepExecutionSplitter in the handle(StepExecutionSplitter, StepExecution) method, instructing it how many StepExecution instances are required, ideally. The StepExecutionSplitter is allowed to ignore the grid size in the case of a restart, since the input data partitions must be preserved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With