Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File processing on two different machine using spring batch

My file processing scenario is ,

 read input file -> process -> generated output file

but i have to two physically different machines which are connected to one storage area where i receive all the input files and one database server,There are two application servers running on these machine(1 on each server).

enter image description here

so how can i use spring batch to process input files on both these application server parallelly ? i mean if there are 10 files the 5 on server1 (P1) and 5 on (P2) ,can it be done ?

like image 393
neel.1708 Avatar asked May 02 '13 12:05

neel.1708


2 Answers

You could schedule a job per input file (input file location would be a parameter of the job). Spring Batch will guarantee no two job instances with the same job parameters are created. You'll get a JobExecutionAlreadyRunningException or JobInstanceAlreadyCompleteException if the other node has already started processing the same file.

like image 63
Jimmy Praet Avatar answered Oct 11 '22 12:10

Jimmy Praet


The first thing would be to decide whether you actually want to split the files in half (5 and 5), or do you want each server processing until it's done? If the files are various sizes with some small and others larger, you may end up with optimal parallelization having 6 processed on one server and 4 on the other, or 7 and 3, if the 3 take as long as the other 7 because of differences in size.

A very rudimentary way would be to have a database table that could represent active processing. Your job could read the directory, grab the first file name, and then insert into the table that it was being processed by that JVM. If the primary key of the table is the filename, then if they both try at the same time, one would fail and one would succeed. The one that succeeds at inserting the entry in the table wins and gets to process the file. The other has to handle that exception, pick the next file, and attempt to insert it as a processing entry. This way each essentially establishes a centralized lock (in the db table), and you get more efficient processing that considers file size rather than even file distribution.

like image 37
IceBox13 Avatar answered Oct 11 '22 12:10

IceBox13