Making Existing Spring Batch Application run on multiple nodes

Question

We have existing Spring Batch Application, that we want to make scalable to run on multiple nodes.

The scalabilty docs for Spring Batch involves code changes and configuration changes.

I am just wondering if this can be achieved by just configuration changes ( adding new classes and wiring it in configuration is fine but just want to avoid code changes to existing classes).

Thanks a lot for the help in advance.

Michael Minella · Accepted Answer

It really depends on your situation. Specifically, why do you do you want to run on multiple nodes? What is the bottle neck you're attempting to overcome? The typical two scenarios that Spring Batch handles out of the box for scaling across multiple nodes are remote chunking and remote partitioning. Both are master/slave configurations, but each have a different use case.

Remote chunking is used when the processor in a step is the bottle neck. In this case, the master node reads the input and sends it via a Spring Integration channel to remote nodes for processing. Once the item has been processed, the result is returned to the master for writing. In this case, reading and writing are done locally to the master. While this helps parallelize processing, it takes an I/O hit because every item is being sent over the wire (and requires guaranteed delivery, ala JMS for example).

Remote partitioning is the other scenario. In this case, the master generates a description of the input to be processed for each slave and only that description is sent over the wire. For example, if you're processing records in a database, the master may send a range of row ids to each slave (1-100, 101-200, etc). Reading and writing occur local to the slaves and guaranteed delivery is not required (although useful in certain situations).

Both of these options can be done with minimal (or no) new classes depending on your use case. There are a couple different places to look for information on these capabilities:

Spring Batch Integration Github repository - Spring Batch Integration is the project that supports the above use cases. You can read more about it here: https://github.com/spring-projects/spring-batch-admin/tree/master/spring-batch-integration
My remote partitioning example - This talk walks though remote partitioning and provides a working example to run on CloudFoundry (currently only works on CF v1 but updates for CF2 are coming in a couple days). The configuration is almost the same, only the connection pool for Rabbit is different: https://github.com/mminella/Spring-Batch-Talk-2.0 The video for this presentation can be found on YouTube here: http://www.youtube.com/watch?v=CYTj5YT7CZU
Gunnar Hillert's presentation on Spring Batch and Spring Integration: This was presented at SpringOne2GX 2013 and contains a number of examples: https://github.com/ghillert/spring-batch-integration-sample

In any of these cases, remote chunking should be accomplishable with zero new classes. Remote partitioning typically requires you to implement one new class (the Partitioner).

Making Existing Spring Batch Application run on multiple nodes

Tags:

java

spring

scalability

spring-batch

Sai

1 Answers

Michael Minella

Recent Activity

Donate For Us

Making Existing Spring Batch Application run on multiple nodes

Tags:

java

spring

scalability

spring-batch

Sai

1 Answers

Michael Minella

Related questions

Recent Activity

Donate For Us