I'm creating a REST-centric application that will use a NoSQL data store of some kind for most of the domain-specific models. For the primary site that I intend to build around the REST data framework, I still want to use a traditional relational database for users, billing info, and other metadata that's outside the scope of the domain data model.
I've been advised that this approach is only a good idea if I can avoid performing I/O to both the RDBMS and NoSQL data stores on the same request as much as possible.
My questions:
I'm pondering using the same approach for my application and I think it is generally safe but requires special care to tackle cache consistency issues.
The way Django normally operates is that when request is received, a query is run against a Session table to find a session associated with a cookie from the request. Then, when you access request.user
, a query is run against a User table to find a user for a given session (if any, because Django supports anonymous sessions). So, by default, Django needs two queries to associate each request with a user, which is expensive.
A nice thing about Django session is that it can be used as a key, value store without extending any model class (unlike for example User class that is hard to extend with additional fields). So you can for example put request.session['email'] = user.email
to store additional data in the session. This is safe, in a sense, that what you read from request.session
dictionary is for sure what you have put there, client has no way to change these values. So you can indeed use this technique to avoid query to the User table.
To avoid query to the Session table, you need to enable session caching (or store session data in the client cookie with django.contrib.sessions.backends.signed_cookies
, which is safe, because such cookies are cryptographically protected against modification by a client).
With caching enabled, you need 0 queries to associate a request with user data. But the problem is cache consistency. If you use local in memory cache with write through option (django.core.cache.backends.locmem.LocMemCache
with django.contrib.sessions.backends.cached_db
) the session data will be written to a DB on each modification, but it won't be read from a DB if it is present in the cache. This introduces a problem if you have multiple Django processes. If one process modifies a session (for example changes session['email']
), other process can still use an old, cached value.
You can solve it by using shared cache (Memcached backend), which guarantees that changes done by one process are visible to all other processes. In this way, you are replacing a query to a Session table with a request to a Memcached backend, which should be much faster.
Storing session data in a client cookie can also solve cache consistency issues. If you modify an email field in the cookie, all future requests send by the client should have a new email. Although client can deliberately send an old cookie, which still carries old values. Whether this is a problem is application depended.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With