Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django for social networking [closed]

Tags:

python

django

I know this is a relatively broad question, but is Django robust enough to build a social network on? I am concerned mainly with performance/speed. For example, for a site with a small user base (<10,000 users), is it possible to create a Django-backed site that would perform at a speed similar to Facebook?

What are its potential weaknesses, and things that need to be focused on in order to make it as fast as possible?

like image 908
David542 Avatar asked Apr 27 '11 21:04

David542


People also ask

Is Django good for social network?

Absolutely! I would say that Django and Rails are probably the most popular frameworks upon which you can quickly build an extensive web app like a social network.

Is Facebook using Django?

Facebook has a simple and robust API which allows users to build applications for the Facebook platform. We being the intelligent developers will use the superb Django framework to build our application.


2 Answers

This question was asked in 2011 and Django has come a long way since then. I've previously build a social network with 2 million users on Django and found the process to be quite smooth. Part of getstream.io's infrastructure also runs on Django and we've been quite happy with it. Here are some tips for getting most out of your Django installation. It wasn't quite clear from the question but I'll assume your starting from a completely unoptimized Django installation.

Static files & CDN

Start by hosting your static files on S3 and stick the Cloudfront CDN in front of it. Hosting static files from your Django instance is a terrible idea, please don't do it.

Database & ORM: Select related

The 2nd most common mistake is not optimizing your usage of the ORM. You'll want to have a look at the documentation regarding select related and apply it as needed. Most pages on your site should only take 2-3 queries and not N queries as you'll typically see if you don't use select related correctly: https://docs.djangoproject.com/en/1.11/ref/models/querysets/

Database: PGBouncer

Creating a new connection to your postgres database is a rather heavy operation. You'll want to run PGBouncer on localhost to ensure you don't have any unneeded overhead when creating database connections. This was more urgent with older versions of Django, but in general is still a good idea.

Basic Monitoring & Debugging

Next you'll want to get some basic monitoring and debugging up and running. The django debug toolbar is your first friend: https://github.com/jazzband/django-debug-toolbar

After that you'll want to have a look at tools such as NewRelic, Datadog, Sentry and StatsD/Graphite to get you more insights.

Separate concerns

Another first step is separating out concerns. You'll want to run your database on its own server, your search server on it's own server, web on their own servers etc. If you run everything on one machine it's hard to see what's causing your app to break. Servers are cheap, split stuff up.

Load balancer

If you've never used a load balancer before, start here: https://aws.amazon.com/elasticloadbalancing/

Use the right tools

If you're doing tag clouds, tag search or search use a dedicated tool such as Elastic for this.

If you have a counter that is frequently changing or a list that is rapidly changing use Redis instead of your database to cache the latest version

Celery and RabbitMQ

Use a task queue to do anything that doesn't need to be done right now in the background. The most widely used task queue is Celery: http://www.celeryproject.org/

Denormalize everything

You don't want to compute counts such as likes and comments on reads. Simple update the like and comment count every time someone adds a new like or comment. This makes the write operation heavier, but the read lighter. Since you'll probably have a lot of reads and very few writes, that's exactly what you want.

News feeds and activity streams

If you're building feeds have a look at this service for building news feeds & activity streams or the open source Stream-Framework

In 2011 you had to build your own feed technology, nowadays this is no longer the case. Build a social network with PHP

Now that we've gone over the basics lets review some more advanced tips.

CDN and 2 stage loading

You are already using Cloudfront for your static files. As a next step you'll want to stick Cloudfront in front of your web traffic as well. This allows you to cache certain pages on the CDN and reduce the load on your servers.

You can even cache pages for logged in users on the CDN. Simply use Javascript to load in all the page customizations and user specific details after the page is served from the CDN.

Database: PGBadger

Tools such as PGBadger give you great insights into what your database is actually doing. You'll want to run daily reports on part of your log data.

Database: Indexes

You'll want to start reading up on database indexes. Most early scaling problems can be fixed by applying the right index and optimizing your database a little bit. If you get your indexes right you'll be doing better than most people. There is a lot more room for database optimization and these books by the 2nd quadrant folks are awesome. https://www.2ndquadrant.com/en/books/

Database: Tuning

If you're not using RDS you'll want to run a quick PGTune check on your database. By default postgres' configuration is pretty sluggish, PGTune tells you the right settings to use: https://github.com/gregs1104/pgtune

Cache everything

Scaling your database is a pain. Eventually you'll get around to having multiple slave databases, handling sharding and partitioning etc. Scaling your database is time consuming and your best way to avoid spending tons of time on that is caching. Redis is your go to cache nowadays, but memcached is also a decent option. Basically you'll want to cache everything. A page shows a list of posts: Read from Redis, Looking up user profiles? Read from Redis. You want to use your database as little as possible and put most of the load on your cache layer since it's extremely simple to scale your cache layer

Offsets

Postgres doesn't like large offsets. Use ID filtering when you're paginating through large result sets.

Deadlocks

With a lot of traffic you'll eventually get deadlocks. This happens when multiple transactions on postgress try to lock a piece of information and A waits for B while B waits for C and C waits for A. The obvious solution is to use smaller transactions. That reduces the chance for deadlocks to occur. Next, you'll want to batch updates to your most popular data. IE. Instead of updating counts whenever someone likes a post, you'll want store a list like changes and sync that to the count every 5 minutes or so.

Those are some of the basic tips, have fun dealing with rapidly growing social networks :)

like image 28
Thierry Avatar answered Oct 03 '22 22:10

Thierry


"What are its potential weaknesses, and things that need to be focused on in order to make it as fast as possible?"

The one thing you might be worried about further down the road is that depending on how you create your models and connect them to one another, you may run into an issue where a single page generates many, many, many queries.

This is especially true if you're using a model that involves a generic relation.

Let's say you're using django-activity-stream to create a list of recent events (similar to Facebook's News Feed). django-activity-stream basically creates a list of generic relations. For each of these generic relations you're going to have to run a query to get information about that object. And, since it's generic (i.e. you're not writing a custom query for each kind of object), if that object has its own relations that you want to output, you might be looking at something like 40-100 queries for an activity feed with just 20-30 items.

Running 40-100 queries for a single request is not optimal behavior.

The good news is that Django is really just a bunch of classes and functions written in python. Almost anything you write in python can be added into Django, so you can always write your own functions or code to optimize a given request.

Choosing another framework is not going to avoid the problem of scalability; it's just going to present different difficulties in different areas.

Also, you can look into things like caching in order to speed up responses and prevent server load.

like image 104
Jordan Reiter Avatar answered Oct 03 '22 22:10

Jordan Reiter