I've created a Spring batch application using Spring boot, and I have a Job
with 9 steps. These steps are using a DataSource
which I created its bean in a configuration file as follows:
@Configuration
public class DatabaseConfig {
@ConfigurationProperties(prefix = "spring.datasource")
@Bean
@Primary
public DataSource dataSource(){
return DataSourceBuilder.create().build();
}
}
This DataSource
is using properties declared in the application.yml
file:
spring:
datasource:
url: jdbc:mysql://localhost:3306/db_01?zeroDateTimeBehavior=convertToNull
username: xxxx
password: ****
So far, all works as expected.
What I want to do, is that I have 4 databases parameterized in a 5th database (db_settings), which I select using an SQL query. This query will return the 4 databases with their usernames and passwords as follows:
+--------+-----------------------------------+-----------------+-----------------+
| id | url | username_db | password_db |
+--------+-----------------------------------+-----------------+-----------------+
| 243 | jdbc:mysql://localhost:3306/db_01 | xxxx | **** |
| 244 | jdbc:mysql://localhost:3306/db_02 | xxxx | **** |
| 245 | jdbc:mysql://localhost:3306/db_03 | xxxx | **** |
| 247 | jdbc:mysql://localhost:3306/db_04 | xxxx | **** |
+--------+-----------------------------------+-----------------+-----------------+
So instead of running the steps using the database declared in 'application.yml', I want to run them on all the 4 databases. And considering the volume processed, it is necessary to be able to launch the batch processing on these databases in parallel.
Does anyone know how to implement this?
Spring Batch Parallel Processing is each chunk in its own thread by adding a task executor to the step. If there are a million records to process and each chunk is 1000 records, and the task executor exposes four threads, you can handle 4000 records in parallel instead of 1000 records.
Spring boot allows you to connect to multiple databases by configuring multiple data sources in a single spring boot application using hibernate and JPA. Spring boot enables repositories to connect to multiple databases using JPA from a single application.
Where is the bounty? :-)
Thanks KeatsPeeks, AbstractRoutingDataSource
is a good starter for the solution, and here is a good tutorial on this part.
Mainly the important parts are:
public class MyRoutingDataSource extends AbstractRoutingDataSource {
@Override
protected Object determineCurrentLookupKey() {
String language = LocaleContextHolder.getLocale().getLanguage();
System.out.println("Language obtained: "+ language);
return language;
}
}
register the multiple datasource
<bean id="abstractDataSource" class="org.apache.commons.dbcp.BasicDataSource"
destroy-method="close"
p:driverClassName="${jdbc.driverClassName}"
p:username="${jdbc.username}"
p:password="${jdbc.password}" />
<bean id="concreteDataSourceOne"
parent="abstractDataSource"
p:url="${jdbc.databaseurlOne}"/>
<bean id="concreteDataSourceTwo"
parent="abstractDataSource"
p:url="${jdbc.databaseurlTwo}"/>
So after that, the problem is become to:
How to load datasource config properties when spring startup and config the corresponding dataSource
using the config properties in database.
How to use multiple dataSource
in spring batch
Actually when I try to google it, seems this is a most common case, google give the suggestion search words - "spring batch multiple data sources", there are a lots articles, so I choose the answer in
How to define the lookup code based on the spring batch jobs(steps)
Typically this should be a business point, You need define the lookup strategy and can be injected to the com.example.demo.datasource.CustomRoutingDataSource#determineCurrentLookupKey
to routing to the dedicated data source.
The really interesting is actually it is supports the multiple dataSource
, but the db settings cannot store in the DB indeed. The reason is it will get the cycle dependencies issue:
The dependencies of some of the beans in the application context form a cycle:
batchConfiguration (field private org.springframework.batch.core.configuration.annotation.JobBuilderFactory com.example.demo.batch.BatchConfiguration.jobs)
↓
org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration (field private java.util.Collection org.springframework.batch.core.configuration.annotation.AbstractBatchConfiguration.dataSources)
┌─────┐
| routingDataSource defined in class path resource [com/example/demo/datasource/DataSourceConfiguration.class]
↑ ↓
| targetDataSources defined in class path resource [com/example/demo/datasource/DataSourceConfiguration.class]
↑ ↓
| myBatchConfigurer (field private java.util.Collection org.springframework.batch.core.configuration.annotation.AbstractBatchConfiguration.dataSources)
└─────┘
So obviously the solution is break the dependency between dataSource
and routingDataSource
dataSource
https://scattercode.co.uk/2013/11/18/spring-data-multiple-databases/ https://numberformat.wordpress.com/2013/12/27/hello-world-with-spring-batch-3-0-x-with-pure-annotations/
http://spring.io/guides/gs/batch-processing/
How to java-configure separate datasources for spring batch data and business data? Should I even do it?
Github to get the codes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With