Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Making Existing Spring Batch Application run on multiple nodes

We have existing Spring Batch Application, that we want to make scalable to run on multiple nodes.

The scalabilty docs for Spring Batch involves code changes and configuration changes.

I am just wondering if this can be achieved by just configuration changes ( adding new classes and wiring it in configuration is fine but just want to avoid code changes to existing classes).

Thanks a lot for the help in advance.

like image 258
Sai Avatar asked Sep 16 '13 10:09

Sai


1 Answers

It really depends on your situation. Specifically, why do you do you want to run on multiple nodes? What is the bottle neck you're attempting to overcome? The typical two scenarios that Spring Batch handles out of the box for scaling across multiple nodes are remote chunking and remote partitioning. Both are master/slave configurations, but each have a different use case.

Remote chunking is used when the processor in a step is the bottle neck. In this case, the master node reads the input and sends it via a Spring Integration channel to remote nodes for processing. Once the item has been processed, the result is returned to the master for writing. In this case, reading and writing are done locally to the master. While this helps parallelize processing, it takes an I/O hit because every item is being sent over the wire (and requires guaranteed delivery, ala JMS for example).

Remote partitioning is the other scenario. In this case, the master generates a description of the input to be processed for each slave and only that description is sent over the wire. For example, if you're processing records in a database, the master may send a range of row ids to each slave (1-100, 101-200, etc). Reading and writing occur local to the slaves and guaranteed delivery is not required (although useful in certain situations).

Both of these options can be done with minimal (or no) new classes depending on your use case. There are a couple different places to look for information on these capabilities:

  1. Spring Batch Integration Github repository - Spring Batch Integration is the project that supports the above use cases. You can read more about it here: https://github.com/spring-projects/spring-batch-admin/tree/master/spring-batch-integration
  2. My remote partitioning example - This talk walks though remote partitioning and provides a working example to run on CloudFoundry (currently only works on CF v1 but updates for CF2 are coming in a couple days). The configuration is almost the same, only the connection pool for Rabbit is different: https://github.com/mminella/Spring-Batch-Talk-2.0 The video for this presentation can be found on YouTube here: http://www.youtube.com/watch?v=CYTj5YT7CZU
  3. Gunnar Hillert's presentation on Spring Batch and Spring Integration: This was presented at SpringOne2GX 2013 and contains a number of examples: https://github.com/ghillert/spring-batch-integration-sample

In any of these cases, remote chunking should be accomplishable with zero new classes. Remote partitioning typically requires you to implement one new class (the Partitioner).

like image 134
Michael Minella Avatar answered Oct 19 '22 19:10

Michael Minella