Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to achieve distributed processing of steps using Spring Batch

Using Spring batch, I want my steps to be distributed across nodes and get them executed for a given job. I have a usecase where a job has multiple steps and each step can run in multiple nodes where the app is hosted. Has anybody tried this ? Any ideas on the same would be highly appreciated!

like image 303
Ashu NCS Avatar asked Dec 15 '15 07:12

Ashu NCS


People also ask

What is distributed batch processing?

It's a computing paradigm where the tuples/records are batched and then distributed for processing across a cluster of nodes/processing units. Once each node completes the processing of its allocated batch, the results are collated and summarized for the final results.

How can we share data between the different steps of a job in Spring Batch?

Passing data between steps. In Spring Batch, ExecutionContext of execution context that can be used in the scope of each step and job is provided. By using the execution context, data can be shared between the components in the step.

How do you implement parallel processing in Spring Batch?

The simplest way to start parallel processing is to add a TaskExecutor to your Step configuration. In this example, the taskExecutor is a reference to another bean definition that implements the TaskExecutor interface.


1 Answers

There are two approaches:

  1. Remote chunking - you read data on master node and process/write them on slaves

  2. Remote partitioning - you slice your data set into partitions and read/process/write your partitions in remote nodes. So master is just coordinating and deciding how to slice partitions.

I wrote a book about Enterprise Spring, and I created examples of both approaches. These are hosted on Github. Look into examples 0939 and 0940. Unfortunately all the comments how to run them manually are in the book only. Hopefully you will be able to figure that out from integration tests.

Pre-requirement is to have messaging middleware (e.g. ActiveMQ or HornetQ) available for master-slave communication and it is also using Spring Integration to facilitate this communication.

like image 67
luboskrnac Avatar answered Oct 13 '22 01:10

luboskrnac