Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Batch - Duplicate STEP message

Tags:

spring-batch

I have a spring batch job which is expected to process 'N' job-ids sequentially, based on FIFO. There are 5 steps in this spring batch job.
We use DECIDER to determine any more job-id is present.If yes, go to the first step and run all the steps for that job-id.
I see "duplicate step" message in the log emitted by spring-batch, which appears to be fine until and unless the step in the first job (say job-id=1) gets an UNKNOWN state. In such event, the same step for second job (job-id =2) fails to start stating "Step is in UNKNOWN state, it is dangerous to restart....". Is there a better approach to define spring-batch job to process 'N' job-ids.

There is a table which holds the job information. Each Job places orders in to Order table. It is possible that two jobs needs to be processed on the same day. Job can insert/update the same order number having same revision(with difference in other details) or different revision of same order number. The batch program must process these jobs in the FIFO model based on success_time in the job table.

Assume table structure as below

Job_Id      job_name    success_time
1           job1        2014-09-29 10:00:00
2           job2        2014-09-29 13:00:00

Order_id    order_number    order_revision  order_details   job_id
1           ABC             1               Test1            1
2           XYZ             1               Test2            1
3           ABC             2               Test1-Rev2       2

Sample configuration is shown below. For brevity, I have removed metadata definitions and reused the reader and writer.

<batch:step id="abstractParentStep" abstract="true">
    <batch:tasklet>
        <batch:chunk commit-interval="100" />
    </batch:tasklet>
</batch:step>

<-- Using same reader and writer to simplify scenario depiction --> 
<batch:job id="OrderProcessingJob">
    <batch:step id="Collect-Statistics-From-Staging-Tables" next="Validate-Order-Mandatory-Fields" parent="abstractParentStep">
        <batch:tasklet>
            <batch:chunk reader="orderReader" writer="orderWriter" />
        </batch:tasklet>
    </batch:step>
    <batch:step id="Validate-Order-Mandatory-Fields" next="Validate-Item-Mandatory-Fields" parent="abstractParentStep">
        <batch:tasklet>
            <batch:chunk reader="orderReader" writer="orderWriter" />
        </batch:tasklet>
    </batch:step>
    <batch:step id="Validate-Item-Mandatory-Fields" next="decision" parent="abstractParentStep">
        <batch:tasklet>
            <batch:chunk reader="orderReader" writer="orderWriter" />
        </batch:tasklet>
    </batch:step>
    <batch:decision id="decision" decider="processMoreJobsDecider">
        <batch:next on="REPEAT" to="Validate-Order-Mandatory-Fields" />
        <batch:end on="COMPLETED" />
    </batch:decision>

</batch:job>

In the first step, we would check how many jobs (count) needs to be processed and places that in to ExecutionContext. In the decider, we check if the total no of jobs processed matches the count and returns REPEAT status if there are more job_ids to process.

We ran into exception as mentioned above when the first job's step remained in UNKNOWN state and second job (since decider decided there is one more job_id to process) got the exception message as shown above.

like image 576
tronline Avatar asked Aug 21 '14 11:08

tronline


People also ask

What is the use of partitioner in Spring Batch?

The Partitioner is an interface which provides the facility to define a set of input values for each of the slaves. In other words, logic to divide tasks into respective threads goes here. We will define the slave step, just like any other step with the reader and the writer.

What is remote chunking in Spring Batch?

Description. In Remote Chunking the Step processing is split across multiple processes, in our case communicating with each other using AWS SQS. This pattern is useful when the Master is not a bottleneck. With Remote Chunking the data is read by the master and sent to the slaves using SQS for processing.

What is TaskExecutor Spring Batch?

ThreadPoolTaskExecutor is a java bean that allows for configuring a ThreadPoolExecutor in a bean style by setting up the values for the instance variables like corePoolSize, maxPoolSize, keepAliveSeconds, queueCapacity and exposing it as a Spring TaskExecutor.

What is Chunksize in Spring Batch?

Spring Batch collects items one at a time from the item reader into a chunk, where the chunk size is configurable. Spring Batch then sends the chunk to the item writer and goes back to using the item reader to create another chunk, and so on until the input is exhausted. This is what we call chunk processing.


2 Answers

You should give each step a unique name. If you use partitioning, this is done for you automatically.

See this gist, file partitionedSimple.groovy (you can run all the examples just by downloading the files and running groovy <filename.groovy>). In step1, we determine the number of steps we'll need subsequently (there hardcoded to 3) and save it in the job context (first in the step context and then we promote). The we create a partitioned step partitionedStep, which will launch 3 steps. Their name will be repeatedStep:<partition name>. In the partition, we also put a key named partitionIndex in the context, so we can retrieve it in the tasklet where we implement the repeated step.

Then we run a example where we force it to fail when it's processing item 2. We get these step executions:

Status is: FAILED
Step executions: 
  1: step1 
  2: partitionedStep FAILED
  4: repeatedStep:partition_1 
  5: repeatedStep:partition_2 FAILED
  3: repeatedStep:partition_3 

If we then restart this job and remove the error triggering, only the second item will be processed:

Status is: COMPLETED
Step executions: 
  6: partitionedStep 
  null: repeatedStep:partition_1 STARTING
  7: repeatedStep:partition_2 
  null: repeatedStep:partition_3 STARTING

I also added a slightly more complicated example where the repeated step is actually a flow step and where the step names are dynamically generated by hand -- this is important if you want to repeat a flow, as you'll have to give unique names to the steps inside each execution of the flow.

This can also be done without partitioning, with a looping decider. The idea here is that you have a wrapping step that repeats (allowStartIfComplete) and wraps a flow with your desired steps. These steps are created on-demand thanks to the step scoped bean factories. The reason for the seemingly redundant wrapping step is that the flow builder inside the job() bean factory needs to know step names ahead of time to build the transition states, so we "hide" the at that point unknown step names inside another step. Maybe there's a way to simplify it. The executions for the first run are:

Step executions: 
  1: step1 
  2: wrappingStep 
  3: repeated-1 
  4: wrappingStep FAILED
  5: repeated-2 FAILED

(notice repeated-3 is never executed)

and on the second run:

Step executions: 
  6: wrappingStep 
  7: wrappingStep 
  8: repeated-2 
  9: wrappingStep 
  10: repeated-3
like image 108
Artefacto Avatar answered Sep 28 '22 03:09

Artefacto


Your problem is that you start your flow with a 'next' instead of a start.

I use Java config rather than XML, but got a similar exception (not particularly helpful error output) with:

@Bean
public Flow insertGbDatabaseRecordsFlow(final Step populateFpSettlementsStep, final Step populateGbDatabaseStep) {
    FlowBuilder<Flow> flowBuilder = new FlowBuilder<>("insertGbDatabaseRecordsFlow");
    flowBuilder.next(populateFpSettlementsStep);
    flowBuilder.next(populateGbDatabaseStep);
    return flowBuilder.build();
}

The fix was the first next -> start

@Bean
public Flow insertGbDatabaseRecordsFlow(final Step populateFpSettlementsStep, final Step populateGbDatabaseStep) {
    FlowBuilder<Flow> flowBuilder = new FlowBuilder<>("insertGbDatabaseRecordsFlow");
    flowBuilder.start(populateFpSettlementsStep);
    flowBuilder.next(populateGbDatabaseStep);
    return flowBuilder.build();
}

presumably the same applies for Spring Batch xml config.

like image 21
jorrocks Avatar answered Sep 28 '22 05:09

jorrocks