Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running multiple sites on the same python process

In our company we make news portals for a pretty big number of local newspapers (currently 13, going to 30 next month and more in the future), each with 2k to 100k page views/day. Since we are evolving from a situation where each site was heavily customized to one where each difference is a matter of configuration or custom template, our software is already pretty much the same for all sites. Right now our deployment strategy is one gunicorn instance for each site (with 1-17 workers each, depending on the site traffic), on a 16-core server and 12GB RAM. The problem with this setup is that each worker (regular pre-forked gunicorn) takes 110MB, whether its being used or not. Now with the new sites we would need to add more RAM to serve not that much many requests, so basically it doesn't scale. Also, since we are moving from this model where each site is independent, each site has its own database and I quite like it that way, especially since we are using relational databases (mysql, but migrating to pgsql), so its much easier to shard this way.

I'm doing some research and experimenting with running all sites on one gunicorn instance, so I could use the servers fully and add more servers behind a load balancer when it came to it. The problem is that django assumes in a lot of places that only one site is running per process, so for what I've thought of so far I'd have to implement:

  • A middleware that takes the HTTP_HOST from the request and places an identifier on a threadlocal variable.
  • A template loader that uses that variable to load custom templates accordingly.
  • Monkey patch django.db.model.Model, probably adding a metaclass (not even sure that's possible, but I think I would need it because of the custom managers we sometimes need to use) that would overwrite the managers for one that would first call db_manager(identifier) on the original manager and then call the intended method. I would also need to overwrite the save and delete methods to always include the using=identifier parameter.
  • I guess I would need to stop using inclusion_tag decorators, not a big problem, but I need to think of other cases like this.
  • Heavy and ugly patching of urlresolvers if I need custom or extra urls for each site. I don't need them now, but probably will at some point.

And this is just is what I came up with without even implementing it and seeing where it breaks, I'm sure I'd need many more changes for it to work. So I really don't want to do it, especially with the extra maintenance effort I'll need, but I don't see any alternatives and would love to learn that someone already solved this in a better way. Of course I could also stop using django altogether (I already have many reasons to do so) but that would mean a major rewrite and having two maintain two incompatible branches of the software until the new one reached feature parity with the django version, so to me it seems even worse than all the ugly hacks.

like image 283
Luiz Geron Avatar asked Jun 24 '11 18:06

Luiz Geron


1 Answers

I've recently developed an e-commerce system with similar requirements -- many instances running from the same project sharing almost everything. The previous version of the system was a bunch of independent installations (~30) so it was pretty unmaintainable. I'm sure the requirements still differ from yours (for example, all instances shared the same models in my case), but it still might be useful to share my experience.

You are right that Django doesn't help with scenarios like this out of the box, but it's actually surprisingly easy to work it around. Here is a brief description of what I did.

I could see a synergy between what I wanted to achieve and django.contrib.sites. Also because many third-party Django apps out there know how to work with it and use it, for example, to generate absolute URLs to the current site. The major problem with sites is that it wants you to specify the current site id in settings.SITE_ID, which a very naive approach to the multi host problem. What one naturally wants, and what you also mention, is to determine the current site from the Host request header. To fix this problem, I borrowed the hook idea from django-multisite: https://github.com/shestera/django-multisite/blob/master/multisite/threadlocals.py#L19

Next I created an app encapsulating all the functionality related to the multi host aspect of my project. In my case the app was called stores and among other things it featured two important classes: stores.middleware.StoreMiddleware and stores.models.Store.

The model class is a subclass of django.contrib.sites.models.Site. The good thing about subclassing Site is that you can pass a Store to any function where a Site is expected. So you are effectively still just using the old, well documented and tested sites framework. To the Store class I added all the fields needed to configure all the different stores. So it's got fields like urlconf, theme, robots_txt and whatnot.

The middleware class' function was to match the Host header with the corresponding Store instance in the database. Once the matching Store was retrieved, It would patch the SITE_ID in a way similar to https://github.com/shestera/django-multisite/blob/master/multisite/middleware.py. Also, it looked at the store's urlconf and if it was not None, it would set request.urlconf to apply its special URL requirements. After that, the current Store instance was stored in request.store. This has proven to be incredibly useful, because I was able to do things like this in my views:

def homepage(request):
    featured = Product.objects.filter(featured=True, store=request.store)
    ...

request.store became a natural additional dimension of the request object throughout the project for me.

Another thing that was defined on the Store class was a function get_absolute_url whose implementation looked roughly like this:

def get_absolute_url(self, to='/'):
    """
    Return an absolute url to this `Store` or to `to` on this store.

    The URL includes http:// and the domain name of the store.

    `to` can be an object with `get_absolute_url()` or an absolute path as string.

    """
    if isinstance(to, basestring):
        path = to
    elif hasattr(to, 'get_absolute_url'):
        path = to.get_absolute_url()
    else:
        raise ValueError(
            'Invalid argument (need a string or an object with get_absolute_url): %s' % to
        )

    url = 'http://%s%s%s' % (
        self.domain,
        # This setting allowed for a sane development environment
        # where I just set it to ".dev:8000" and configured `dnsmasq`.
        # The same value was also removed from the `Host` value in the middleware
        # before looking up the `Store` in database. 
        settings.DOMAIN_SUFFIX,
        path
    )

    return url

So I could easily generate URLs to objects on other than the current store, e.g.:

# Redirect to `product` on `store`.
redirect(store.get_absolute_url(product)) 

This was basically all I needed to be able to implement a system allowing users to create a new e-shop living on its own domain via the Django admin.

like image 148
Jakub Roztocil Avatar answered Nov 02 '22 07:11

Jakub Roztocil