Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to setup Hibernate to read/write to different datasources?

Tags:

Using Spring and Hibernate, I want to write to one MySQL master database, and read from one more more replicated slaves in cloud-based Java webapp.

I can't find a solution that is transparent to the application code. I don't really want to have to change my DAOs to manage different SessionFactories, as that seems really messy and couples the code with a specific server architecture.

Is there any way of telling Hibernate to automatically route CREATE/UPDATE queries to one datasource, and SELECT to another? I don't want to do any sharding or anything based on object type - just route different types of queries to different datasources.

like image 343
Deejay Avatar asked Dec 08 '10 09:12

Deejay


2 Answers

An example can be found here: https://github.com/afedulov/routing-data-source.

enter image description here

Spring provides a variation of DataSource, called AbstractRoutingDatasource. It can be used in place of standard DataSource implementations and enables a mechanism to determine which concrete DataSource to use for each operation at runtime. All you need to do is to extend it and to provide an implementation of an abstract determineCurrentLookupKey method. This is the place to implement your custom logic to determine the concrete DataSource. Returned Object serves as a lookup key. It is typically a String or en Enum, used as a qualifier in Spring configuration (details will follow).

package website.fedulov.routing.RoutingDataSource

import org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource;

public class RoutingDataSource extends AbstractRoutingDataSource {
    @Override
    protected Object determineCurrentLookupKey() {
        return DbContextHolder.getDbType();
    }
}

You might be wondering what is that DbContextHolder object and how does it know which DataSource identifier to return? Keep in mind that determineCurrentLookupKey method will be called whenever TransactionsManager requests a connection. It is important to remember that each transaction is "associated" with a separate thread. More precisely, TransactionsManager binds Connection to the current thread. Therefore in order to dispatch different transactions to different target DataSources we have to make sure that every thread can reliably identify which DataSource is destined for it to be used. This makes it natural to utilize ThreadLocal variables for binding specific DataSource to a Thread and hence to a Transaction. This is how it is done:

public enum DbType {
   MASTER,
   REPLICA1,
}

public class DbContextHolder {

   private static final ThreadLocal<DbType> contextHolder = new ThreadLocal<DbType>();

   public static void setDbType(DbType dbType) {
       if(dbType == null){
           throw new NullPointerException();
       }
      contextHolder.set(dbType);
   }

   public static DbType getDbType() {
      return (DbType) contextHolder.get();
   }

   public static void clearDbType() {
      contextHolder.remove();
   }
}

As you see, you can also use an enum as the key and Spring will take care of resolving it correctly based on the name. Associated DataSource configuration and keys might look like this:

  ....
<bean id="dataSource" class="website.fedulov.routing.RoutingDataSource">
 <property name="targetDataSources">
   <map key-type="com.sabienzia.routing.DbType">
     <entry key="MASTER" value-ref="dataSourceMaster"/>
     <entry key="REPLICA1" value-ref="dataSourceReplica"/>
   </map>
 </property>
 <property name="defaultTargetDataSource" ref="dataSourceMaster"/>
</bean>

<bean id="dataSourceMaster" class="org.apache.commons.dbcp.BasicDataSource">
  <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
  <property name="url" value="${db.master.url}"/>
  <property name="username" value="${db.username}"/>
  <property name="password" value="${db.password}"/>
</bean>
<bean id="dataSourceReplica" class="org.apache.commons.dbcp.BasicDataSource">
  <property name="driverClassName" value="com.mysql.jdbc.Driver"/>
  <property name="url" value="${db.replica.url}"/>
  <property name="username" value="${db.username}"/>
  <property name="password" value="${db.password}"/>
</bean>

At this point you might find yourself doing something like this:

@Service
public class BookService {

  private final BookRepository bookRepository;
  private final Mapper               mapper;

  @Inject
  public BookService(BookRepository bookRepository, Mapper mapper) {
    this.bookRepository = bookRepository;
    this.mapper = mapper;
  }

  @Transactional(readOnly = true)
  public Page<BookDTO> getBooks(Pageable p) {
    DbContextHolder.setDbType(DbType.REPLICA1);   // <----- set ThreadLocal DataSource lookup key
                                                  // all connection from here will go to REPLICA1
    Page<Book> booksPage = callActionRepo.findAll(p);
    List<BookDTO> pContent = CollectionMapper.map(mapper, callActionsPage.getContent(), BookDTO.class);
    DbContextHolder.clearDbType();               // <----- clear ThreadLocal setting
    return new PageImpl<BookDTO>(pContent, p, callActionsPage.getTotalElements());
  }

  ...//other methods

Now we can control which DataSource will be used and forward requests as we please. Looks good!

...Or does it? First of all, those static method calls to a magical DbContextHolder really stick out. They look like they do not belong the business logic. And they don't. Not only do they not communicate the purpose, but they seem fragile and error-prone (how about forgetting to clean the dbType). And what if an exception is thrown between the setDbType and cleanDbType? We cannot just ignore it. We need to be absolutely sure that we reset the dbType, otherwise Thread returned to the ThreadPool might be in a "broken" state, trying to write to a replica in the next call. So we need this:

  @Transactional(readOnly = true)
  public Page<BookDTO> getBooks(Pageable p) {
    try{
      DbContextHolder.setDbType(DbType.REPLICA1);   // <----- set ThreadLocal DataSource lookup key
                                                    // all connection from here will go to REPLICA1
      Page<Book> booksPage = callActionRepo.findAll(p);
      List<BookDTO> pContent = CollectionMapper.map(mapper, callActionsPage.getContent(), BookDTO.class);
       DbContextHolder.clearDbType();               // <----- clear ThreadLocal setting
    } catch (Exception e){
      throw new RuntimeException(e);
    } finally {
       DbContextHolder.clearDbType();               // <----- make sure ThreadLocal setting is cleared         
    }
    return new PageImpl<BookDTO>(pContent, p, callActionsPage.getTotalElements());
  }

Yikes >_< ! This definitely does not look like something I would like to put into every read only method. Can we do better? Of course! This pattern of "do something at the beginning of a method, then do something at the end" should ring a bell. Aspects to the rescue!

Unfortunately this post has already gotten too long to cover the topic of custom aspects. You can follow up on the details of using aspects using this link.

like image 193
Alex Fedulov Avatar answered Oct 01 '22 22:10

Alex Fedulov


I don't think that deciding that SELECTs should go to one DB (one slave) and CREATE/UPDATES should go to a different one (master) is a very good decision. The reasons are:

  • replication is not instantaneous, so you could CREATE something in the master DB and, as part of the same operation, SELECT it from the slave and notice that the data hasn't yet reached the slave.
  • if one of the slaves is down, you shouldn't be prevented from writing data in the master, because as soon as the slave is back up, its state will be synchronized with master. In your case though, your write operations are dependent on both master and slave.
  • How would you then define transactionality if you're in fact using 2 dbs?

I would advise using the master DB for all the WRITE flows, with all the instructions they might require (whether they are SELECTs, UPDATE or INSERTS). Then, the application dealing with the read-only flows can read from the slave DB.

I'd also advise having separate DAOs, each with its own methods, so that you'll have a clear distinction between read-only flows and write/update flows.

like image 25
octav Avatar answered Oct 01 '22 23:10

octav