Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batch with Spring Boot & JPA - use in-memory datasource for batch-related tables

Context

I'm trying to develop a batch service with Spring Boot, using JPA Repository. Using two different datasources, I want the batch-related tables created in a in-memory database, so that it does not pollute my business database.

Following multiple topics on the web, I came up with this configuration of my two datasources :

@Configuration
public class DataSourceConfiguration {

    @Bean(name = "mainDataSource")
    @Primary
    @ConfigurationProperties(prefix="spring.datasource")
    public DataSource mainDataSource(){
        return DataSourceBuilder.create().build();
    }

    @Bean(name = "batchDataSource")
    public DataSource batchDataSource( @Value("${batch.datasource.url}") String url ){
        return DataSourceBuilder.create().url( url ).build();
    }   
}

The first one, mainDataSource, uses the default Spring database configuration. The batchDataSource defines an embedded HSQL database, in which I want the batch and step tables to be created.

# DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url=jdbc:mariadb://localhost:3306/batch_poc
spring.datasource.username=root
spring.datasource.password=
spring.datasource.driver-class-name=org.mariadb.jdbc.Driver
spring.datasource.max-age=10000

spring.datasource.initialize=false

# JPA (JpaBaseConfiguration, HibernateJpaAutoConfiguration)
spring.jpa.generate-ddl=false
spring.jpa.show-sql=true
spring.jpa.database=MYSQL

# SPRING BATCH (BatchDatabaseInitializer)
spring.batch.initializer.enabled=false

# ----------------------------------------
# PROJECT SPECIFIC PROPERTIES
# ----------------------------------------

# BATCH DATASOURCE
batch.datasource.url=jdbc:hsqldb:file:C:/tmp/hsqldb/batchdb

Here is my batch config :

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    private static final Logger LOG = Logger.getLogger( BatchConfiguration.class );

    @Bean
    public BatchConfigurer configurer(){
        return new CustomBatchConfigurer();
    }

    @Bean
    public Job importElementsJob( JobBuilderFactory jobs, Step step1 ){
        return jobs.get("importElementsJob")
                .incrementer( new RunIdIncrementer() )
                .flow( step1 )
                .end()
                .build();               
    }

    @Bean
    public Step step1( StepBuilderFactory stepBuilderFactory, ItemReader<InputElement> reader,
            ItemWriter<List<Entity>> writer, ItemProcessor<InputElement, List<Entity>> processor ){

        return stepBuilderFactory.get("step1")
                .<InputElement, List<Entity>> chunk(100)
                .reader( reader )
                .processor( processor )
                .writer( writer )
                .build();
    }

    @Bean
    public ItemReader<InputElement> reader() throws IOException {       
        return new CustomItemReader();
    }

    @Bean
    public ItemProcessor<InputElement, List<Entity>> processor(){
        return new CutsomItemProcessor();
    }

    @Bean
    public ItemWriter<List<Entity>> writer(){
        return new CustomItemWriter();
    }

}

The BatchConfigurer, using the in-memory database :

public class CustomBatchConfigurer extends DefaultBatchConfigurer {

    @Override
    @Autowired
    public void setDataSource( @Qualifier("batchDataSource") DataSource dataSource) {
        super.setDataSource(dataSource);
    }

}

And, finally, my writer :

public class CustomItemWriter implements ItemWriter<List<Entity>> {

    private static final Logger LOG = Logger.getLogger( EntityWriter.class );

    @Autowired
    private EntityRepository entityRepository;

    @Override
    public void write(List<? extends List<Entity>> items)
            throws Exception {
        if( items != null && !items.isEmpty() ){

            for( List<Entity> entities : items ){
                for( Entity entity : entities ){                        
                    Entity fromDb = entityRepository.findById( entity.getId() );

                    // Insert
                    if( fromDb == null ){
                        entityRepository.save( entity );
                    }

                    // Update
                    else {
                        // TODO : entityManager.merge()
                    }
                }
            }

        }
    }

}

The EntityRepository interface extends JpaRepository.

Problem

When I separate the datasources this way, nothing happens when I call the save method of the repository. I see the select queries from the call of findById() in the logs. But nothing for the save. And my output database is empty at the end.

When I come back to a unique datasource configuration (removing the configurer bean and letting Spring Boot manage the datasource alone), the insert queries work fine.

Maybe the main datasource configuration is not good enough for JPA to perform the inserts correctly. But what is missing ?

like image 550
Eria Avatar asked Oct 13 '15 07:10

Eria


People also ask

Why use Spring Boot batch?

That's where a framework like Spring Batch can be very handy. Spring Boot Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.

What is batch processing in spring?

In this article, I am going to demonstrate batch processing using one of the projects of Spring which is Spring Batch. Spring Batch provides functions for processing large volumes of data in batch jobs.

How do I create a Spring Batch file?

Some knowledge in Spring Boot. On your browser, navigate to spring intializr. Set the project name to springbatch. Add lombok, spring web, h2 database, spring data jpa, and spring batch as the project dependencies. Click on generate to download the generated project zip file. Decompress the downloaded file and open it on your preferred IDE.

Why do we need to include Spring-Boot-starter-batch dependency?

We need to include spring-boot-starter-batch dependency. Spring batch relies on job repository which is persistent data store. So we need one DB as well. I am using H2 (in-memory database) which integrates well with spring batch.


1 Answers

I finally solved the problem implementing my own BatchConfigurer, on the base of the Spring class BasicBatchConfigurer, and forcing the use of Map based jobRepository and jobExplorer. No more custom datasource configuration, only one datasource which I let Spring Boot manage : it's easier that way.

My custom BatchConfigurer :

public class CustomBatchConfigurer implements BatchConfigurer {

    private static final Logger LOG = Logger.getLogger( CustomBatchConfigurer.class );

    private final EntityManagerFactory entityManagerFactory;

    private PlatformTransactionManager transactionManager;

    private JobRepository jobRepository;

    private JobLauncher jobLauncher;

    private JobExplorer jobExplorer;

    /**
     * Create a new {@link CustomBatchConfigurer} instance.
     * @param entityManagerFactory the entity manager factory
     */
    public CustomBatchConfigurer( EntityManagerFactory entityManagerFactory ) {
        this.entityManagerFactory = entityManagerFactory;
    }

    @Override
    public JobRepository getJobRepository() {
        return this.jobRepository;
    }

    @Override
    public PlatformTransactionManager getTransactionManager() {
        return this.transactionManager;
    }

    @Override
    public JobLauncher getJobLauncher() {
        return this.jobLauncher;
    }

    @Override
    public JobExplorer getJobExplorer() throws Exception {
        return this.jobExplorer;
    }

    @PostConstruct
    public void initialize() {
        try {
            // transactionManager:
            LOG.info("Forcing the use of a JPA transactionManager");
            if( this.entityManagerFactory == null ){
                throw new Exception("Unable to initialize batch configurer : entityManagerFactory must not be null");
            }
            this.transactionManager = new JpaTransactionManager( this.entityManagerFactory );

            // jobRepository:
            LOG.info("Forcing the use of a Map based JobRepository");
            MapJobRepositoryFactoryBean jobRepositoryFactory = new MapJobRepositoryFactoryBean( this.transactionManager );
            jobRepositoryFactory.afterPropertiesSet();
            this.jobRepository = jobRepositoryFactory.getObject();

            // jobLauncher:
            SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
            jobLauncher.setJobRepository(getJobRepository());
            jobLauncher.afterPropertiesSet();
            this.jobLauncher = jobLauncher;

            // jobExplorer:
            MapJobExplorerFactoryBean jobExplorerFactory = new MapJobExplorerFactoryBean(jobRepositoryFactory);
            jobExplorerFactory.afterPropertiesSet();
            this.jobExplorer = jobExplorerFactory.getObject();
        }
        catch (Exception ex) {
            throw new IllegalStateException("Unable to initialize Spring Batch", ex);
        }
    }

}

My configuration class looks like this now :

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

    @Bean
    public BatchConfigurer configurer( EntityManagerFactory entityManagerFactory ){
        return new CustomBatchConfigurer( entityManagerFactory );
    }

    [...]

}

And my properties files :

# DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url=jdbc:mariadb://localhost:3306/inotr_poc
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.driver-class-name=org.mariadb.jdbc.Driver
spring.datasource.max-age=10000

spring.datasource.initialize=true

# JPA (JpaBaseConfiguration, HibernateJpaAutoConfiguration)
spring.jpa.generate-ddl=false
spring.jpa.show-sql=true
spring.jpa.database=MYSQL

# SPRING BATCH (BatchDatabaseInitializer)
spring.batch.initializer.enabled=false
like image 79
Eria Avatar answered Apr 02 '23 21:04

Eria