Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Huge Django Session table, normal behaviour or bug?

Perhaps this is completely normal behaviour, but I feel like the django_session table is much larger than it should have to be.

First of all, I run the following cleanup command daily so the size is not caused by expired sessions:

DELETE FROM %s WHERE expire_date < NOW()

The numbers:

  • We've got about 5000 unique visitors (bots excluded) every day.
  • The SESSION_COOKIE_AGE is set to the default, 2 weeks
  • The table has a little over 1,000,000 rows

So, I'm guessing that Django also generates session keys for all bots that visits the site and that the bots don't store the cookies so it continuously generates new cookies.

But... is this normal behaviour? Is there a setting so Django won't generate sessions for anonymous users, or atleast... no sessions for users that aren't using sessions?

like image 660
Wolph Avatar asked Dec 14 '10 22:12

Wolph


2 Answers

After a bit of debugging I've managed to trace cause of the problem. One of my middlewares (and most of my views) have a request.user.is_authenticated() in them.

The django.contrib.auth middleware sets request.user to LazyUser()

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/middleware.py?rev=14919#L13 (I don't see why there is a return None there, but ok...)

class AuthenticationMiddleware(object):
    def process_request(self, request):
        assert hasattr(request, 'session'), "The Django authentication middleware requires session middleware to be installed. Edit your MIDDLEWARE_CLASSES setting to insert 'django.contrib.sessions.middleware.SessionMiddleware'."
        request.__class__.user = LazyUser()
        return None

The LazyUser calls get_user(request) to get the user:

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/middleware.py?rev=14919#L5

class LazyUser(object):
    def __get__(self, request, obj_type=None):
        if not hasattr(request, '_cached_user'):
            from django.contrib.auth import get_user
            request._cached_user = get_user(request)
       return request._cached_user

The get_user(request) method does a user_id = request.session[SESSION_KEY]

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/auth/init.py?rev=14919#L100

def get_user(request):
    from django.contrib.auth.models import AnonymousUser
    try:
        user_id = request.session[SESSION_KEY]
        backend_path = request.session[BACKEND_SESSION_KEY]
        backend = load_backend(backend_path)
        user = backend.get_user(user_id) or AnonymousUser()
    except KeyError:
        user = AnonymousUser()
    return user

Upon accessing the session sets accessed to true:

Source: http://code.djangoproject.com/browser/django/trunk/django/contrib/sessions/backends/base.py?rev=14919#L183

def _get_session(self, no_load=False):
    """
    Lazily loads session from storage (unless "no_load" is True, when only
    an empty dict is stored) and stores it in the current instance.
    """
    self.accessed = True
    try:
        return self._session_cache
    except AttributeError:
        if self._session_key is None or no_load:
            self._session_cache = {}
        else:
            self._session_cache = self.load()
    return self._session_cache

And that causes the session to initialize. The bug was caused by a faulty session backend that also generates a session when accessed is set to true...

like image 137
Wolph Avatar answered Nov 15 '22 10:11

Wolph


Is it possible for robots to access any page where you set anything in a user session (even for anonymous users), or any page where you use session.set_test_cookie() (for example Django's default login view in calls this method)? In both of these cases a new session object is created. Excluding such URLs in robots.txt should help.

like image 36
Bartosz Avatar answered Nov 15 '22 08:11

Bartosz