Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using FlatFileItemReader with a TaskExecutor (Thread Safety)

There are a lot of examples which use FlatFileItemReader along with TaskExecutor. I provide samples below (both with XML and Java Config):

  • Using Oracle Coherence with Spring Batch
  • Spring Batch Multithreading Example

I have used it my self with XML configuration for large CSVs (GB size) writing to database with the out-of-the-box JpaItemWriter. There seem to be no issues even without setting save-state = false or taking any kind of special handling.

Now, FlatFileItemReader is documented as not thread-safe.

My guess was that JpaItemWriter was "covering" the issue by persisting Sets i.e. collections with no duplicates if the hashCode() and equals() were covering the business key of the Entity. However, even this way it is not enough to prevent duplicates due to non-thread safe reading and processing.

Could you please clarify: is it proper/correct/safe to use the out-of-the-box FlatFileItemReader within a Tasklet which has assigned a TaskExecutor? Regardless of the Writer. If not, how could we explain in theory the lack of errors when a JPAItemWriter is used?

P.S: The example links that I give above, use FlatFileItemReader with TaskExecutor without mentioning at all possible thread-safety issues...

like image 410
kmandalas Avatar asked Feb 16 '17 10:02

kmandalas


People also ask

Is FlatFileItemReader thread safe?

A FlatFileItemReader is not thread safe because it maintains state in the form of a ResourceLineReader . Be careful to configure a FlatFileItemReader using an appropriate factory or scope so that it is not shared between threads.

Is JpaItemWriter thread safe?

Class JpaItemWriter<T>The writer is thread-safe after its properties are set (normal singleton behaviour), so it can be used to write in multiple concurrent transactions.


1 Answers

TL;DR It is safe to use a FlatFileItemReader with a TaskExecutor provided the Writer is thread-safe. (Assuming that you are not concerned with restarting jobs, retrying steps, skipping, etc at the moment).

Update : There is now a JIRA that officially confirms that saveState needs to be set to false (i.e disable restartability) if one wants to use FlatFileItemReader with a TaskExecutor in a thread safe manner.


Let's first hear it from the horses mouth by seeing what the Spring documentation says about using multi-threaded steps with a TaskExecutor.

Spring Batch provides some implementations of ItemWriter and ItemReader. Usually they say in the Javadocs if they are thread safe or not, or what you have to do to avoid problems in a concurrent environment. If there is no information in Javadocs, you can check the implementation to see if there is any state

Let's address your questions now :

Could you please clarify: is it proper/correct/safe to use the out-of-the-box FlatFileItemReader within a Tasklet which has assigned a TaskExecutor? Regardless of the Writer. If not, how could we explain in theory the lack of errors when a JPAItemWriter is used?

The statement "Regardess of the writer" is incorrect. The Writer you use must be thread-safe. The JpaItemWriter is thread-safe accroding to the Java docs and can safely be used with a FlatFileItemReader that is not thread-safe. Explaining how JpaItemWriter is thread-safe would make this answer long. I recommend that you post another question if you are interested in how specific writers handle thread-safety. (As mentioned by the Spring Batch docs as well)

P.S: The example links that I give above, use FlatFileItemReader with TaskExecutor without mentioning at all possible thread-safety issues..

If you take a look at the coherence example, you will see that they clearly modify the CoherenceBatchWriter.java in Figure 6. They first make mapBatch local variable so that multiple threads have their own copy of this Map. Moreover, if you dig further into the Coherence API, you should find that the NamedCache being returned would be thread safe.

The second link that you provide looks really dicey since the Writer does not do anything to avoid race conditions. That example is indeed an incorrect use of a multi-threaded step.


like image 164
Chetan Kinger Avatar answered Sep 19 '22 21:09

Chetan Kinger