Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Only half of the MongoDB database is being processed in Spring batch

I have a spring boot batch working with a MongoDB database to feed a MySQL database. I have approximately half of my database being processed by the program but only something like 200 errors in my logs.

The BATCH_STEP_EXECUTION table let me know that the process went well (status completed) and display a READ_COUNT of 5692 although I have 11800 documents in the database.

Did I forget something in the configuration to prevent from not going through the entire database?

Here is my configuration class:

@Configuration
@EnableBatchProcessing
@Import(PersistenceConfig.class)
public class BatchConfiguration {
    @Autowired
    MongoTemplate mongoTemplate;

    @Autowired
    SessionFactory sessionFactory;

    @Bean
    @StepScope
    public ItemReader<CourseData> reader() {
        MongoItemReader<CourseData> mongoItemReader = new MongoItemReader<>();
        mongoItemReader.setTemplate(mongoTemplate);
        mongoItemReader.setCollection("foo");
        mongoItemReader.setQuery("{}");
        mongoItemReader.setTargetType(CourseData.class);
        Map<String, Sort.Direction> sort = new HashMap<>();
        sort.put("_id", Sort.Direction.ASC);
        mongoItemReader.setSort(sort);

        return mongoItemReader;
    }

    @Bean
    public ItemProcessor<CourseData, MatrixOne> processor() {
        return new CourseDataMatrixOneProcessor();
    }

    @Bean
    public ItemWriter<MatrixOne> writer() {
        HibernateItemWriter writer = new HibernateItemWriter();
        writer.setSessionFactory(sessionFactory);
        System.out.println("writing stuff");
        return writer;
    }

    @Bean
    public Job importUserJob(JobBuilderFactory jobs, Step s1) {
        return jobs.get("importRawCourseJob")
                .incrementer(new RunIdIncrementer())
                .flow(s1)
                .end()
                .build();
    }

    @Bean
    @Transactional
    public Step step1(StepBuilderFactory stepBuilderFactory, ItemReader<CourseData> reader, ItemWriter<MatrixOne> writer, ItemProcessor<CourseData, MatrixOne> processor) {
        return stepBuilderFactory.get("step1")
                .<CourseData, MatrixOne>chunk(10)
                .reader(reader)
                .processor(processor)
                .writer(writer)
                .build();
    }
}
like image 737
Labe Avatar asked Apr 07 '16 20:04

Labe


1 Answers

OK so I solved it today by returning an empty POJO instead of null in my converter when something is wrong with the data. Then I just skip it in the processor.

It is kind of strange that it doesn't stop on the first null encountered though. Maybe some parallelisation of the chunk elements made me read the logs wrong

like image 76
Labe Avatar answered Nov 08 '22 17:11

Labe