Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spring Batch Write processed records to file

This is in continuation of my previous question. As the original question is closed

As per accepted answer, tasklet can be used , I have also tried implementing custom item writer in a chunk oriented step which uses jackson / JsonFileItemWriter, can we use this or does it have any performance impact ?

public void write(final List<? extends Person> persons) throws Exception {
            
       for (Person  person: persons) {
            objectMapper.writeValue(new File("D:/cp/dataTwo.json"), person);
       }
            
}

Question 1 : "Is the above approach recommended ?"

Question 2 : "Can we generate file in item processor itself and use no-op item writer ?"

Can some one please help ?

like image 536
ravicandy1234 Avatar asked Jul 29 '20 17:07

ravicandy1234


People also ask

What is ItemWriter in Spring Batch?

ItemWriter. It is the element of the step of a batch process which writes data. An ItemWriter writes one item a time. Spring Batch provides an Interface ItemWriter. All the writers implement this interface.

What is Stepscope in Spring Batch?

The step scope means that Spring will create the bean only when the step asks for it and that values will be resolved then (this is the lazy instantiation pattern; the bean isn't created during the Spring application context's bootstrapping).

What is partitioner in Spring Batch?

In Spring Batch, “Partitioning” is “multiple threads to process a range of data each”. For example, assume you have 100 records in a table, which has “primary id” assigned from 1 to 100, and you want to process the entire 100 records. Normally, the process starts from 1 to 100, a single thread example.

How do I share data between Tasklets?

Data passing between steps using tasklet model In order to save and fetch passing data, get ExecutionContext from ChunkContext and pass the data between the steps. Sr. No. Set the value to be passed to the after step in the ExecutionContext of the step execution context.


2 Answers

Question 1 : Is the above approach recommended?

  • Your case is sequential and you have to write one file per record. So you are not achieving any additional advantage by having a writer that receives chunk of records.

  • If any error happens in your writer, spring batch will have to retry the whole chunk and rewrite the files that succeeded so far in that chunk as spring batch wouldn't which record in the check failed to write. So I see it as downside compared to tasklet based answer on the other question.

Can we generate file in item processor itself and use no-op item writer?

  • I don't see a big performance issue here or error handling issue here as it is record by record even if an empty no-op writer invoked for every chunk. But Spring must be caching the chunk before passing it to writer so incase writer throws skippable exception, it can retry. So even if you use no-op writer, it will be caching it but I don't know how quickly it will clear it as yours is a no-op writer.

  • I am very uneasy about this approach from best practices point of view, as if a new dev join, he will not go looking into your processor to understand it is acting as writer.

Summary

I will go with Tasklet based approach on the other question

like image 181
Kavithakaran Kanapathippillai Avatar answered Oct 23 '22 08:10

Kavithakaran Kanapathippillai


If you look at the spring batch framework, it contains three steps, as mentioned here -

enter image description here

Which means it is separating input and output as a separate operation. So, if you plan to mix up writing and processing together, basically its a violation of purpose, and will introduce tight coupling which might impact your performance in long run. (think of it as a map-reduce operation. Those need to be mutually exclusive, clearly defined input and outputs.)

Now, the question about recommendation, Yes. If you are using spring batch, this is the best way to process the records, read them in chunk, and then write them in chunk. Usually batch is used to process isolated tasks, so that it time comes, the stuff can be executed in parallel. So, as long as you are not modifying the same file concurrently, you should be good to go with this approach..

like image 23
Anand Vaidya Avatar answered Oct 23 '22 09:10

Anand Vaidya