Using Spring batch, I want my steps to be distributed across nodes and get them executed for a given job. I have a usecase where a job has multiple steps and each step can run in multiple nodes where the app is hosted. Has anybody tried this ? Any ideas on the same would be highly appreciated!
It's a computing paradigm where the tuples/records are batched and then distributed for processing across a cluster of nodes/processing units. Once each node completes the processing of its allocated batch, the results are collated and summarized for the final results.
Passing data between steps. In Spring Batch, ExecutionContext of execution context that can be used in the scope of each step and job is provided. By using the execution context, data can be shared between the components in the step.
The simplest way to start parallel processing is to add a TaskExecutor to your Step configuration. In this example, the taskExecutor is a reference to another bean definition that implements the TaskExecutor interface.
There are two approaches:
Remote chunking - you read data on master node and process/write them on slaves
Remote partitioning - you slice your data set into partitions and read/process/write your partitions in remote nodes. So master is just coordinating and deciding how to slice partitions.
I wrote a book about Enterprise Spring, and I created examples of both approaches. These are hosted on Github. Look into examples 0939 and 0940. Unfortunately all the comments how to run them manually are in the book only. Hopefully you will be able to figure that out from integration tests.
Pre-requirement is to have messaging middleware (e.g. ActiveMQ or HornetQ) available for master-slave communication and it is also using Spring Integration to facilitate this communication.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With