Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching Flask-Login user_loader

I had this.

@login_manager.user_loader
def load_user(id=None):
     return User.query.get(id)

It was working fine until I introduced Flask-Principal.

@identity_loaded.connect_via(app)
def on_identity_loaded(sender, identity):

    # Set the identity user object
    identity.user = current_user
    # return
    if hasattr(current_user, 'id'):
        identity.provides.add(UserNeed(current_user.id))

    # Assuming the User model has a list of roles, update the
    # identity with the roles that the user provides
    if hasattr(current_user, 'roles'):
        for role in current_user.roles:
            identity.provides.add(RoleNeed(role.name))

Adding this caused a serious performance issue. SQLALCHEMY_ECHO showed that User table was queried every time a static file is loaded.

#Warning: Dummy Cache
users = {}
@login_manager.user_loader
def load_user(uid=None):
    if uid not in users:
        users[uid] = User.query.get(uid)
    return users[uid]

After this experiment which solved the repetitive query issue, I came to realize that I need to introduce cache to my Flask app. Here are the quesitons.

  1. How do I cache User.query.get(id)?
  2. When do I need to clear this user cache?
like image 235
lustdante Avatar asked Nov 22 '14 14:11

lustdante


1 Answers

Old question but appears no other answer on SO or via google, and took me a while to solve this, so maybe this answer will help someone.

First, you need some cache backend, I use flask-caching with redis with the python redis library from pypi sudo pip install redis.

Next, do a from flask_caching import Cache and then cache = Cache() which I do in another file called extensions.py. This is important if you are using app factory pattern because you will need to import cache later and this helps to avoid circular reference issues for larger flask apps.

After this, you need to register the flask-caching extension on the flask app, which I do a separate app.py file like this:

from flask import Flask
from extensions import cache

def create_app(config_obj=None):
    """An application factory"""

    app = Flask(__name__)
    app.config.from_object(config_obj)
    cache.init_app(app, config={'CACHE_TYPE': 'redis',
                               'CACHE_REDIS_HOST': '127.0.0.1',
                               'CACHE_REDIS_PORT': '6379',
                               'CACHE_REDIS_URL': 'redis://127.0.0.1:6379'})
    return app

So now that the cache is registered in Flask it can be imported from extensions.py and used throughout the app, without circular reference issues. Moving on to whatever file you are using the user_loader:

import pickle
from flask import current_user
from extensions import cache
from models.user_models import User

@login_manager.user_loader
def load_user(user_id):
    """Load user by ID from cache, if not in cache, then cache it."""
    # make a unique cache key for each user
    user = 'user_{}'.format(user_id)
    # check if the user_object is cached
    user_obj = pickle.loads(cache.get(user)) if cache.get(user) else None
    if user_obj is None:
        query = User.query.get(int(user_id))
        user_obj = pickle.dumps(query)
        cache.set(user, user_obj, timeout=3600)
        return query
    return user_obj

Finally, when you log out the user, then you can remove them from cache:

@blueprint.route('/logout/')
@login_required
def logout():
    """Logout."""
    # remove the user information from redis cache
    user = 'user_{}'.format(current_user.id)
    cache.delete(user)
    # remove the user information from the session
    logout_user()
    # Remove session keys set by Flask-Principal
    for key in ('identity.name', 'identity.auth_type'):
        session.pop(key, None)
    flash('You are logged out.', 'info')
    return redirect(url_for('public.home')

This seems to work great, it reduced query hits to SQLAlchemy by three queries per page per user and improved my page load speeds by 200ms in a few parts of my app, while eliminating a nasty issue reaching SQLAlchemy connection pool limits.

One last important point for this solution. If you change the user object for any reason, for example if assigning the user new roles or abilities, you must clear the user object from cache. For example like below:

# set the user_id from current_user.id CACHE object
user_id = current_user.id
# remove the old USER object from cache since you will change it
# first set the cache key to user_{id}
cache_user = 'user_{}'.format(user_id)
# now delete the cache key
cache.delete(cache_user)

BACKGROUND:

My need to consider caching flask-login user_loader arose from fact that I've implemented access control list management by extending the flask-login Classes UserMixin and AnonymousUserMixin with a few class methods like get_roles and get_abilities. I'm also using flask-sqlalchemy and postgresql backend and there is a role table and an ability table with relationships to the user object. These user roles and abilities are checked mainly in the templates to present various views based on user roles and abilities.

At some point I noticed when opening multiple browser tabs or simply browser reloading a page in my app, I started getting an error TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30. Flask-sqlalchemy has settings for SQLALCHEMY_POOL_SIZE and SQLALCHEMY_MAX_OVERFLOW but increasing these values just masked the problem for me, the error still occurred simply by loading more tabs or doing more page reloads.

Digging deeper to figure out the root cause, I queried my postgresql DB with SELECT * FROM pg_stat_activity; and found on every request I was accumulating multiple connections with a state idle in transaction where the SQL query was clearly linked to user, role, ability access checks. These idle in transaction connections was causing my DB connection pool to hit capacity.

Further testing found that caching the flask-login user_loader User object eliminated the idle in transaction connections and then even if I left SQLALCHEMY_POOL_SIZE and SQLALCHEMY_MAX_OVERFLOW to default values, I did not suffer the TimeoutError: QueuePool limit again. Problem Solved!

like image 164
Bob Jordan Avatar answered Oct 13 '22 05:10

Bob Jordan