Django only allows you to use one database in settings.py. Does that prevent you from scaling up? (millions of users)

Django now has support for multiple databases.

The database isn't your bottleneck. Check your browser carefully. For each page of HTML you're sending (on average) 8 other files, some of which may be quite large. These are your JS, CSS, graphics, etc. The actual performance bottleneck is the browser requesting those files and accepting the bytes s... l... o... w... l... y... To scale, then, do this. <ol> <li>Use multiple front-ends balanced with a pure software solution like wackamole. http://www.backhand.org/wackamole/</li> <li>Use proxy servers like squid to send the "other" files. They're largely static. This is where 7/8ths of the work is done downloading to the client. Don't scrimp on getting these right.</li> <li>Use multiple, concurrent mod_wsgi/Django to create the -- rare -- piece of dynamic HTML based on DB queries. Be sure that mod_wsgi is in daemon mode so that you can have multiple Django servers available to Apache. Build as many of these as you need. They're all identical, all in parallel, and all shared by Wackamole.</li> <li>Use a single, fast database like MySQL for the few things that must come from a database. MySQL will make use of multiple cores on it's server, so it will scale reasonably well without you having to do anything other than buy memory. Put this on a separate box, all by itself, dedicated and tuned for just this.</li> </ol> You'll find that this scales nicely. You'll find that the load is shared nicely between squid, apache, the Django daemons and the actual database. You'll also find that each part of the load (from the boring static parts to the interesting database query) happens separately and concurrently. Finally, buy Schlossnagle's book. http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X

A few miscellaneous pieces of advice: <ul> <li>I'm surprised no one's mentioned this yet. Use memcached. If you're getting a lot of repetitive types of queries (which most webapps do), this can make a huge difference.</li> <li>Consider using Oracle's failover and load balancing. It allows you to add support for multiple databases on a single db connection.</li> <li>Another thing to consider is using a system similar to FriendFeed's. This solves the problem of "how do we make changes to the database without halting the world?" more than anything else.</li> </ul>

Can you really scale up with Django...given that you can only use one database? (In the models.py and settings.py)

5 Answers

Django now has support for multiple databases.

112

answered Oct 20 '22 01:10

code-zoop

The database isn't your bottleneck.

Check your browser carefully.

For each page of HTML you're sending (on average) 8 other files, some of which may be quite large. These are your JS, CSS, graphics, etc.

The actual performance bottleneck is the browser requesting those files and accepting the bytes s... l... o... w... l... y...

To scale, then, do this.

Use multiple front-ends balanced with a pure software solution like wackamole. http://www.backhand.org/wackamole/
Use proxy servers like squid to send the "other" files. They're largely static. This is where 7/8ths of the work is done downloading to the client. Don't scrimp on getting these right.
Use multiple, concurrent mod_wsgi/Django to create the -- rare -- piece of dynamic HTML based on DB queries. Be sure that mod_wsgi is in daemon mode so that you can have multiple Django servers available to Apache. Build as many of these as you need. They're all identical, all in parallel, and all shared by Wackamole.
Use a single, fast database like MySQL for the few things that must come from a database. MySQL will make use of multiple cores on it's server, so it will scale reasonably well without you having to do anything other than buy memory. Put this on a separate box, all by itself, dedicated and tuned for just this.

You'll find that this scales nicely. You'll find that the load is shared nicely between squid, apache, the Django daemons and the actual database. You'll also find that each part of the load (from the boring static parts to the interesting database query) happens separately and concurrently.

Finally, buy Schlossnagle's book. http://www.amazon.com/Scalable-Internet-Architectures-Theo-Schlossnagle/dp/067232699X

answered Oct 20 '22 00:10

S.Lott

Read scaling to millions of users is not a database problem, but is fixed with load balancing and caching, etc, see S. Lott above.

Write scaling can indeed be a database problem. "Sharding" and having multiple databases can be one solution, but that's hard with SQL while still retaining the relationality of the database. Popular solutions there are the new types of "nosql" databases. But if you really have those problems, then you need serious expert help, not just answers from dudes Stackoverflow. :)

answered Oct 20 '22 01:10

Lennart Regebro

Some great answers already (S. Lott for example), however I thought I should pipe in with some more things:

Make sure not to use the database for logical operations

I understand the attractiveness of Order By or SQL Procedures however you only have one database but you have multiple django servers, let the servers handle this if you can.

Of course, if you only want the last ten rows according to a certain criterion (date), then by all means do precise it in the request ;) Just make sure not to overload your database with operations that could be handled elsewhere.

Throw more hardware to the problem

MySQL and Oracle scale quite well with hardware, if you have a small problem of performance you could begin by adding more hardware.

Split your database

I know that for relationships and all you have to manage some tables together, however if you ever have a load problem, try to group your tables, for example if you have a "history" group of tables, perhaps that it could work without the others and be on a separate server.

Do consider tuning, and watch out for your requests/index

You would need experts advises here, but I can tell from experience that even a single badly tuned request can wreak havoc... and it's quite difficult to find out. You can consider the Ask Tom website for example of diagnosis / fine tuning.

Don't decide on your tables architecture in isolation, but do consider the requests

Hierarchical requests and multiple joins can be really costly. You don't have to build a fully normalized relations schema and may consider some denormalization in order to better accomodate the type of requests the database will face.

Just a couple of thoughts :)

answered Oct 20 '22 01:10

Matthieu M.

A few miscellaneous pieces of advice:

I'm surprised no one's mentioned this yet. Use memcached. If you're getting a lot of repetitive types of queries (which most webapps do), this can make a huge difference.
Consider using Oracle's failover and load balancing. It allows you to add support for multiple databases on a single db connection.
Another thing to consider is using a system similar to FriendFeed's. This solves the problem of "how do we make changes to the database without halting the world?" more than anything else.

answered Oct 20 '22 00:10

Jason Baker

Related questions
                            
                                How divide or multiply every non-string columns of a PySpark dataframe with a float constant?
                            
                                'module' object has no attribute 'unescape'
                            
                                Is there an easy and beautiful way to cast the items in a list to different types?
                            
                                Create user from the command line in Django
                            
                                Get list of all JIRA issues (python)
                            
                                nearest neighbour search kdTree
                            
                                Encode array of integers into unique int
                            
                                Python on MacOS "dyld: Library not loaded" - error
                            
                                How to concatenate values of all rows in a dataframe into a single row without altering the columns?
                            
                                Python format phone number
                            
                                Split string into list of two words, repeating the last word
                            
                                AttributeError: module 'profile' has no attribute 'run'
                            
                                Python module functions used in unexpected ways
                            
                                Python string formatting
                            
                                How do I prevent execution of arbitrary commands from a Django app making system calls?
                            
                                Case-insensitive comparison of sets in Python
                            
                                Why does my Python class claim that I have 2 arguments instead of 1?
                            
                                python regular expression across multiple lines
                            
                                How to redirect complete output of a cron script
                            
                                C#, Pass Array As Function Parameters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can you really scale up with Django...given that you can only use one database? (In the models.py and settings.py)

Tags:

python

web-services

django

scalability

TIMEX

People also ask

5 Answers

code-zoop

S.Lott

Lennart Regebro

Matthieu M.

Jason Baker

Recent Activity

Donate For Us