I read a flat file (for example a .csv file with 1 line per User, Ex: UserId;Data1;Date2).
But how to handle duplicated User item in the reader (where is no list of previus readed users...)
stepBuilderFactory.get("createUserStep1")
.<User, User>chunk(1000)
.reader(flatFileItemReader) // FlatFileItemReader
.writer(itemWriter) // For example JDBC Writer
.build();
When the job having error, is to be recovered by only re-running the target job, tasklet model can be chooseed to make recovery simple. In chunk model, it should be dealt by returning the processed data to the state before executing the job and by creating a job to process only the unprocessed data.
Spring Batch Parallel Processing is each chunk in its own thread by adding a task executor to the step. If there are a million records to process and each chunk is 1000 records, and the task executor exposes four threads, you can handle 4000 records in parallel instead of 1000 records.
An Item Reader reads data into the spring batch application from a particular source, whereas an Item Writer writes data from Spring Batch application to a particular destination. An Item processor is a class which contains the processing code which processes the data read in to the spring batch.
An ExecutionContext is a set of key-value pairs containing information that is scoped to either StepExecution or JobExecution . Spring Batch persists the ExecutionContext , which helps in cases where you want to restart a batch run (e.g., when a fatal error has occurred, etc.).
Filtering is typically done with an ItemProcessor
. If the ItemProcessor
returns null, the item is filtered and not passed to the ItemWriter
. Otherwise, it is. In your case, you could keep a list of previously seen users in the ItemProcessor
. If the user hasn't been seen before, pass it on. If it has been seen before, return null. You can read more about filtering with an ItemProcessor
in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#filiteringRecords
/**
* This implementation assumes that there is enough room in memory to store the duplicate
* Users. Otherwise, you'd want to store them somewhere you can do a look-up on.
*/
public class UserFilterItemProcessor implements ItemProcessor<User, User> {
// This assumes that User.equals() identifies the duplicates
private Set<User> seenUsers = new HashSet<User>();
public User process(User user) {
if(seenUsers.contains(user)) {
return null;
}
seenUsers.add(user);
return user;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With