Context
I'm trying to develop a batch service with Spring Boot, using JPA Repository. Using two different datasources, I want the batch-related tables created in a in-memory database, so that it does not pollute my business database.
Following multiple topics on the web, I came up with this configuration of my two datasources :
@Configuration
public class DataSourceConfiguration {
@Bean(name = "mainDataSource")
@Primary
@ConfigurationProperties(prefix="spring.datasource")
public DataSource mainDataSource(){
return DataSourceBuilder.create().build();
}
@Bean(name = "batchDataSource")
public DataSource batchDataSource( @Value("${batch.datasource.url}") String url ){
return DataSourceBuilder.create().url( url ).build();
}
}
The first one, mainDataSource
, uses the default Spring database configuration. The batchDataSource
defines an embedded HSQL database, in which I want the batch and step tables to be created.
# DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url=jdbc:mariadb://localhost:3306/batch_poc
spring.datasource.username=root
spring.datasource.password=
spring.datasource.driver-class-name=org.mariadb.jdbc.Driver
spring.datasource.max-age=10000
spring.datasource.initialize=false
# JPA (JpaBaseConfiguration, HibernateJpaAutoConfiguration)
spring.jpa.generate-ddl=false
spring.jpa.show-sql=true
spring.jpa.database=MYSQL
# SPRING BATCH (BatchDatabaseInitializer)
spring.batch.initializer.enabled=false
# ----------------------------------------
# PROJECT SPECIFIC PROPERTIES
# ----------------------------------------
# BATCH DATASOURCE
batch.datasource.url=jdbc:hsqldb:file:C:/tmp/hsqldb/batchdb
Here is my batch config :
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
private static final Logger LOG = Logger.getLogger( BatchConfiguration.class );
@Bean
public BatchConfigurer configurer(){
return new CustomBatchConfigurer();
}
@Bean
public Job importElementsJob( JobBuilderFactory jobs, Step step1 ){
return jobs.get("importElementsJob")
.incrementer( new RunIdIncrementer() )
.flow( step1 )
.end()
.build();
}
@Bean
public Step step1( StepBuilderFactory stepBuilderFactory, ItemReader<InputElement> reader,
ItemWriter<List<Entity>> writer, ItemProcessor<InputElement, List<Entity>> processor ){
return stepBuilderFactory.get("step1")
.<InputElement, List<Entity>> chunk(100)
.reader( reader )
.processor( processor )
.writer( writer )
.build();
}
@Bean
public ItemReader<InputElement> reader() throws IOException {
return new CustomItemReader();
}
@Bean
public ItemProcessor<InputElement, List<Entity>> processor(){
return new CutsomItemProcessor();
}
@Bean
public ItemWriter<List<Entity>> writer(){
return new CustomItemWriter();
}
}
The BatchConfigurer, using the in-memory database :
public class CustomBatchConfigurer extends DefaultBatchConfigurer {
@Override
@Autowired
public void setDataSource( @Qualifier("batchDataSource") DataSource dataSource) {
super.setDataSource(dataSource);
}
}
And, finally, my writer :
public class CustomItemWriter implements ItemWriter<List<Entity>> {
private static final Logger LOG = Logger.getLogger( EntityWriter.class );
@Autowired
private EntityRepository entityRepository;
@Override
public void write(List<? extends List<Entity>> items)
throws Exception {
if( items != null && !items.isEmpty() ){
for( List<Entity> entities : items ){
for( Entity entity : entities ){
Entity fromDb = entityRepository.findById( entity.getId() );
// Insert
if( fromDb == null ){
entityRepository.save( entity );
}
// Update
else {
// TODO : entityManager.merge()
}
}
}
}
}
}
The EntityRepository interface extends JpaRepository.
Problem
When I separate the datasources this way, nothing happens when I call the save method of the repository. I see the select queries from the call of findById() in the logs. But nothing for the save. And my output database is empty at the end.
When I come back to a unique datasource configuration (removing the configurer bean and letting Spring Boot manage the datasource alone), the insert queries work fine.
Maybe the main datasource configuration is not good enough for JPA to perform the inserts correctly. But what is missing ?
That's where a framework like Spring Batch can be very handy. Spring Boot Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.
In this article, I am going to demonstrate batch processing using one of the projects of Spring which is Spring Batch. Spring Batch provides functions for processing large volumes of data in batch jobs.
Some knowledge in Spring Boot. On your browser, navigate to spring intializr. Set the project name to springbatch. Add lombok, spring web, h2 database, spring data jpa, and spring batch as the project dependencies. Click on generate to download the generated project zip file. Decompress the downloaded file and open it on your preferred IDE.
We need to include spring-boot-starter-batch dependency. Spring batch relies on job repository which is persistent data store. So we need one DB as well. I am using H2 (in-memory database) which integrates well with spring batch.
I finally solved the problem implementing my own BatchConfigurer, on the base of the Spring class BasicBatchConfigurer, and forcing the use of Map based jobRepository and jobExplorer. No more custom datasource configuration, only one datasource which I let Spring Boot manage : it's easier that way.
My custom BatchConfigurer :
public class CustomBatchConfigurer implements BatchConfigurer {
private static final Logger LOG = Logger.getLogger( CustomBatchConfigurer.class );
private final EntityManagerFactory entityManagerFactory;
private PlatformTransactionManager transactionManager;
private JobRepository jobRepository;
private JobLauncher jobLauncher;
private JobExplorer jobExplorer;
/**
* Create a new {@link CustomBatchConfigurer} instance.
* @param entityManagerFactory the entity manager factory
*/
public CustomBatchConfigurer( EntityManagerFactory entityManagerFactory ) {
this.entityManagerFactory = entityManagerFactory;
}
@Override
public JobRepository getJobRepository() {
return this.jobRepository;
}
@Override
public PlatformTransactionManager getTransactionManager() {
return this.transactionManager;
}
@Override
public JobLauncher getJobLauncher() {
return this.jobLauncher;
}
@Override
public JobExplorer getJobExplorer() throws Exception {
return this.jobExplorer;
}
@PostConstruct
public void initialize() {
try {
// transactionManager:
LOG.info("Forcing the use of a JPA transactionManager");
if( this.entityManagerFactory == null ){
throw new Exception("Unable to initialize batch configurer : entityManagerFactory must not be null");
}
this.transactionManager = new JpaTransactionManager( this.entityManagerFactory );
// jobRepository:
LOG.info("Forcing the use of a Map based JobRepository");
MapJobRepositoryFactoryBean jobRepositoryFactory = new MapJobRepositoryFactoryBean( this.transactionManager );
jobRepositoryFactory.afterPropertiesSet();
this.jobRepository = jobRepositoryFactory.getObject();
// jobLauncher:
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(getJobRepository());
jobLauncher.afterPropertiesSet();
this.jobLauncher = jobLauncher;
// jobExplorer:
MapJobExplorerFactoryBean jobExplorerFactory = new MapJobExplorerFactoryBean(jobRepositoryFactory);
jobExplorerFactory.afterPropertiesSet();
this.jobExplorer = jobExplorerFactory.getObject();
}
catch (Exception ex) {
throw new IllegalStateException("Unable to initialize Spring Batch", ex);
}
}
}
My configuration class looks like this now :
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
@Bean
public BatchConfigurer configurer( EntityManagerFactory entityManagerFactory ){
return new CustomBatchConfigurer( entityManagerFactory );
}
[...]
}
And my properties files :
# DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url=jdbc:mariadb://localhost:3306/inotr_poc
spring.datasource.username=root
spring.datasource.password=root
spring.datasource.driver-class-name=org.mariadb.jdbc.Driver
spring.datasource.max-age=10000
spring.datasource.initialize=true
# JPA (JpaBaseConfiguration, HibernateJpaAutoConfiguration)
spring.jpa.generate-ddl=false
spring.jpa.show-sql=true
spring.jpa.database=MYSQL
# SPRING BATCH (BatchDatabaseInitializer)
spring.batch.initializer.enabled=false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With