Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Multiple Databases Fallback to Master if Slave is down

I have master - slave replication setup for MySQL db backend for Django. Currently I am reading and writing for Master DB only, but my dashboards are quite query intensive. I was searching for an option, where in I can Define like following DATABASES

DATABASES = {
'default_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '3306',
    },
'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  

for dashboards, reports and various other apps, what I want to do is:

Try connection : default_slave : using default_slave if reachable else using default

That is, if slave is up, fetch the reports from slave database itself, if not fetch the reports from master database.

Catch is, slave can be up or down, and i want this to be dynamically selectable regarding which database to use for fetching reports, based on rechability.

Is this possible ? Can I test connection before hand and move forward ?

With this I would write and sync_db in Master, and always read from Slave, if slave is up.

Need some solution/hint for raw queries as well as orm queries

The router concept seems nice, but fallback on slave not reachable, i don't know the possibility.

UPDATE

How to go about MULTI-DATABASE

DATABASES

DATABASES = {
'default_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '3306',
    },
'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'linux': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'linux_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'mac': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'mac_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'pc': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  
'pc_slave': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'PASSWORD': '',
        'HOST': '', 
        'PORT': '3306',
    },
}  

Now, I have 1. Static Data 2. Dynamic Data

Static Data has to be stored in 'default' which will be replicated to 'default_slave'

For dynamic data, the query first needs to judge where might the Dynamic Data be lying : in 'mac' or in 'pc' or in 'linux'

To Achieve that, I added one field in 'static table' : 'query_on' which contains either ['mac' or 'linux' or 'pc']

Now, whicl using query set, I am simply writing static = Static.objects.get(pk = 1)
query_on = static.query_on dynamic = Dynamic.objects.get(static = static).using(alias=query_on)

This works well, the query routes to the database it needs to get executed, here I need to judge :

  1. If <'query_on'>_slave : connection is up : use : <'query_on'>_slave OR
  2. If <'query_on'>_slave : connection is down : use : <'query_on'>

How to go about that ?

Further details for the application:

  1. There is one database : default (configuration & analytics database) : for maintaining the configuration data and report analytics data
  2. There are 20 databases (raw databases) : as example says : mac, linux, rhel, windows, pc .... (example name) : for collecting the raw data, which is not processes for analytics
  3. each database has one or multiple slaves, naming convention would be : default_slave_0, default_slave_1, default_slave_2 and so for other databases as well

Now the analytics data needs to be first queried per 5 minutes, 30 minutes, 1 hour .... and that query needs to be sent out to specific database, because not every database will be carrying specific dataset required for analytics.

To do that, we need to

  1. get the configuration data from (default or any one of its slave (slave part is the question) )
  2. once we have the configuration, we can easily look into where the "raw" data might be
  3. query for raw data, and gather results and analyse --> store it in "default" database.

Now all the 30 (raw) and 1 default database, would require "sync", as we maintain the same data abse structure throughout all the nodes.

Now, since we are looking at CPU spikes on all the databases, it makes sense to use "slave" databases to query for "raw" data.

hence the requirement of using . I am not able to imagine how routers would be of help here ?

like image 681
Joddy Avatar asked Oct 28 '14 12:10

Joddy


1 Answers

You are on the right track with using a router. I'm assuming the fact that your two db definitions are identical is just a typo.

(FYI, I'm going to refer to the database hierarchy using the more sensitive master->follower)

In your db_for_read() functions, you can check for connectivity to your follower. This might incur a bit more overhead, but that's the cost of having auto-failover for a database. An example database definition would be:

DATABASES = {
'follower': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'follower',
        'USER': 'root',
        'HOST': '54.34.65.24',
        'PORT': '3306',
    },
'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'application',
        'USER': 'root',
        'HOST': '54.34.65.23',
        'PORT': '3306',
    },
}  

You can test the connection with a quick try/except like this example. A router using this that does what you need would look like:

from django.conf import settings
import socket


def test_connection_to_db(database_name):
    try:
        db_definition = getattr(settings, 'DATABASES')[database_name]
        s = socket.create_connection((db_definition['HOST'], db_definition['PORT']), 5)
        s.close()
        return True
    except (AttributeError, socket.timeout) as e:
        return False


class FailoverRouter(object):
    """A router that defaults reads to the follower but provides a failover back to the default"""

    def db_for_read(self, model, **hints):
        if test_connection_to_db('follower'):
            return 'follower'
        return 'default'

    def db_for_write(self, model, **hints):
        "Point all writes to the default db"
        return 'default'

    def allow_syncdb(self, db, model):
        "Make sure only the default db allows syncdb"
        return db == 'default'

This will still syncdb in master like you want. Also, you could make the logic for both db_for_read() and db_for_write() more complicated (like pick the follower db only for certain models that are queried for your reports.

I don't know what overhead this test_connection() will cause for every read, since that will depend on the MySQL server and the timeout. Perhaps a better architecture is to cache these reports using memcached, or just work out the issues with the slave ever going down and update your database definitions in settings first.

like image 151
krimkus Avatar answered Sep 26 '22 01:09

krimkus