Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

django multi db routing doesnt work with multiple schemas

I have django running on oracle backend. I need to use two schemas - one for a legacy DB, second one for all django related tables.

So this is my settings.DATABASES:

APPS_DB = 'apps'
DATABASES = {

    'default' : { 
        'ENGINE': 'django.db.backends.oracle'
        'NAME': 'django',                      
        'USER': 'django-tables',                      
        'PASSWORD': '****',                  
        'HOST': 'localhost',                       
        'PORT': '1531',                     
    },

    APPS_DB : { 
        'ENGINE': 'django.db.backends.oracle', 
        'NAME': 'django',                      
        'USER': 'legacy-stuff',                      
        'PASSWORD': '****',                 
        'HOST': 'localhost',                     
        'PORT': '1531',                      
     },
}

I also defined router:

class MyRouter(object):
    """A router to control all database operations on models"""

def __init__(self):
    aux = []
    for app in settings.INSTALLED_APPS:
        if not app.endswith('myapp'):
            aux.append(app)
    self.djangoStuff = tuple(map(lambda x: x[x.rfind('.')+1:], aux))

def is_django_stuff(self, model):
    return model._meta.app_label in self.djangoStuff

def db_for_read(self, model, **hints):
    "Point all django apps models to separate DB"
    logger.info("READ from " + model._meta.app_label)
    if self.is_django_stuff(model):
        logger.info("Will be directed to default DB")
        return None
    logger.info("Will be directed to legacy DB")    
    return settings.APPS_DB

def db_for_write(self, model, **hints):
    "Point all django apps models to separate DB"
    logger.info("WRITE")
    if self.is_django_stuff(model):
        return None
    return settings.APPS_DB

def allow_relation(self, obj1, obj2, **hints):
    "Allow any relation"
    logger.info("ALLOW REL")
    return True

def allow_syncdb(self, db, model):
    "Allow syncdb for all managed objects"
    logger.info("ALLOW SYNC")
    if db == 'default' and self.is_django_stuff(model):
        return True
    if db != 'default' and not self.is_django_stuff(model):
        return True
    return False

Now I have very simple model:

class Poll(models.Model):
    question = models.CharField(max_length=200)
    user = models.ForeignKey(User)
    pub_date = models.DateTimeField('date published')

I make two syncdbs:

python manage.py syncdb
python manage.py syndb --database apps

Everything goes fine. Then I create poll object using 'python manage.py shell'

superuser = User.objects.all()[0]
p = Poll(question="foo", user = superuser, pub_date = datetime.now())
p.save()

And I try to retrieve user from poll:

  a = Poll.objects.all()
  b = len(a)
  b = a[0]
  c = b.artist

I have logging enabled in router so I see that last query will be directed to correct DB:

READ from myapp
Will be directed to apps DB
READ from myapp
Will be directed to apps DB
READ from auth
Will be directed to default DB

I can even see actual SQL statement:

(0.005) SELECT "AUTH_USER"."ID", "AUTH_USER"."USERNAME", "AUTH_USER"."FIRST_NAME",    "AUTH_USER"."LAST_NAME", "AUTH_USER"."EMAIL", "AUTH_USER"."PASSWORD", "AUTH_USER"."IS_STAFF", "AUTH_USER"."IS_ACTIVE", "AUTH_USER"."IS_SUPERUSER", "AUTH_USER"."LAST_LOGIN", "AUTH_USER"."DATE_JOINED" FROM "AUTH_USER" WHERE "AUTH_USER"."ID" = :arg0 ; args=(1,)

But I'm getting error:

  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/models/fields/related.py", line 350, in __get__
    rel_obj = qs.get(**params)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/models/query.py", line 361, in get
    num = len(clone)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/models/query.py", line 85, in __len__
    self._result_cache = list(self.iterator())
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/models/query.py", line 291, in iterator
    for row in compiler.results_iter():
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/models/sql/compiler.py", line 763, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/models/sql/compiler.py", line 818, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/backends/util.py", line 40, in execute
return self.cursor.execute(sql, params)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.4.1-py2.7.egg/django/db/backends/oracle/base.py", line 675, in execute
    return self.cursor.execute(query, self._param_generator(params))
DatabaseError: ORA-00942: table or view does not exist

So my question is - what am I doing wrong?

like image 536
mnowotka Avatar asked Oct 12 '12 15:10

mnowotka


1 Answers

Cross-database foreign keys are essentially invalid, because Django is modelling a "proper" relational database with referential integrity, and that can't be enforced at database level if the models are stored in completely different physical stores.

Anyway, for that reason Django must assume that any objects exist in the same database as the one you originally retrieved. In your case, it got your Poll object from your legacy database, so it must look for your user (or artist or whatever it is) there too.

For simple queries like this it's very easy to work around, for example:

poll = Poll.objects.all()[0]
user_id = poll.user_id # _id after the name of your "Foreign Key" field - which cannot really be an FK
user = User.objects.get(user_id) # This will be a new query and can use a different database, it will check the router

For more complex queries (joins etc) you will often find that you need to build lists or sets of ids, and do queries using filter(id__in=your_list_of_ids).

Depending on the number of records, there may be performance or memory usage penalties for doing this. (But in some cases your queries will actually be much faster than the original joins, at all depends on your application.) You might need to break up your lists of id's into batches or your queries may become too long, etc etc. But none of these problems are insurmountable.

When you are dealing with id's from another database, it's up to you to enforce referential integrity. Sometimes you need to set up batch processes to deal with tidying up the data.

This all makes it sounds wrong, but such separation of concerns, particularly if you can keep the dependencies limited, and in just one direction, can be exactly the right approach.

like image 187
George Lund Avatar answered Sep 28 '22 19:09

George Lund