What is the difference between spring batch remote chunking and remote partitioning?
I can not understand the difference between remote chunking and remote partitioning in spring batch. Could anybody please explain?
Description. In Remote Chunking the Step processing is split across multiple processes, in our case communicating with each other using AWS SQS. This pattern is useful when the Master is not a bottleneck. With Remote Chunking the data is read by the master and sent to the slaves using SQS for processing.
Spring Batch with partitioning provides us the facility to divide the execution of a Step: Partitioning Overview. The above picture shows an implementation of a Job with a partitioned Step. There's a Step called “Master”, whose execution is divided into some “Slave” steps.
One approach is tasklet-based, where a Tasklet supports a simple interface with a single execute() method. The other approach, **chunk-oriented processing**, refers to reading the data sequentially and creating "chunks" that will be written out within a transaction boundary.
A TaskExecutor with a throttle limit which works by delegating to an existing task executor and limiting the number of tasks submitted. A throttle limit is provided to limit the number of pending requests over and above the features provided by the other task executors.
Remote Partitioning
Partitioning is a master/slave step configuration that allows for partitions of data to be processed in parallel. Each partition is described via some metadata. For example, if you were processing a database table, partition 1 may be ids 0-100, partition 2 being 101-200, etc. For Spring Batch, a master step uses a Partitioner to generate ExecutionContexts that contain the metadata for each partition. These ExecutionContexts are distributed to slave step for processing by a PartitionHandler (for remote partitioning, the MessageChannelPartitionHandler is typically used). The slaves execute their step and return the resulting statuses for aggregation by the master.
Things to note about remote partitioning:
Remote Chunking
Remote chunking is similar to remote partitioning in that it is a master/slave configuration. However with remote chunking, the data is read at by the master and sent over the wire to the slave for processing. Once the processing is done, the result of the ItemProcessor is returned to the master for writing.
Things to note about remote chunking:
I did a talk on scaling Spring Batch and do a demonstration of remote partitioning that you can watch here: http://www.youtube.com/watch?v=CYTj5YT7CZU
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With