Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to speed up the authenticate function in django?

We're using django to make a json webservice front-end for mysql. We have apache and django running on an EC2 instance and MySQL running on an RDS instance. We've started benchmarking performance using apache bench and got some really poor performance numbers. We also noticed that while running the tests, our apache/django instance goes to 100% cpu usage at very low load and the MySQL instance never gets above 2% cpu usage.

We're trying to make sense of this and isolate the problem, so we did several ab tests:

  1. A request for a static html page from apache -- ~2000 requests/second.
  2. A request that executes a small python function in django, and no db interaction -- ~1000 requests/second.
  3. A request that executes one of our django webservice functions that calls authenticate and then does a very simple query to fetch one record from a table -- 11 requests/second
  4. Same as 3, but commented the call to authenticate -- 95 requests/second.

Why is authenticate so slow? Is it writing data to the db, finding a billion digits of pi, what?

We would like to keep the call to authenticate in these functions, because we don't want to leave them open to anyone that can guess the url, etc. Has anyone here noticed that authenticate is slow, and can anyone suggest a way to remedy it?

Thank you very much!

like image 358
HansG600 Avatar asked Jan 22 '13 23:01

HansG600


People also ask

How does Django handle authentication?

The Django authentication system handles both authentication and authorization. Briefly, authentication verifies a user is who they claim to be, and authorization determines what an authenticated user is allowed to do. Here the term authentication is used to refer to both tasks.

How do I authenticate a request in Django?

If you have an authenticated user you want to attach to the current session - this is done with a login() function. To log a user in, from a view, use login() . It takes an HttpRequest object and a User object. login() saves the user's ID in the session, using Django's session framework.

What is authenticate return Django?

Authenticating Users It checks the credentials against the authentication backend and returns User objects if they are valid. If they are not valid for a backend or they have no permissions, Django will return “none.”

How do I authenticate username and password in Django?

auth import authenticate, login def my_view(request): username = request. POST['username'] password = request. POST['password'] user = authenticate(username=username, password=password) if user is not None: if user. is_active: login(request, user) # Redirect to a success page.


1 Answers

I am no expert in authentication and security but the following are some ideas as to why this might be happening and possibly how you can increase the performance somewhat.

Since passwords are stored in the db, to make their storage secure, plaintext password are not stored but their hash is stored instead. This way you can still validate user logging in by comparing the computed hash from the typed password to the one stored in the db. This increases security so that if a malicious party will get a copy of the db, the only way to decode the plaintext passwords is by either using rainbow-tables or doing a brute-force attack.

This is where things get interesting. According to Moore's Law, computers are becoming exponentially faster, hence computing hash functions becomes much cheaper in terms of time, especially quick hash functions like md5 or sha1. This poses a problem because having all of the computing power available today combined with fast hash functions, hackers can brute-force hashed passwords relatively easy. To combat this, two things can be done. One it to loop the hash function multiple times (output of the hash is fed back into the hash). This however is not very effective because it only increases the complexity of the hash function by a constant. That's why the second approach is preferred which is to make the actual hash function be more complex and computationally expensive. Having more complex function, it takes more time for the hash to be computed. Even if it takes a second to compute, it is not a big deal for end-users, but it is a big deal for brute-force attack because millions of hashes have to be computed. That's why starting with Django 1.4, it uses a pretty computationally expensive function called PBKDF2.

To get back to your answer. It's because of this function, when you enable authentication, your benchmark number drastically goes down and your CPU goes up.

Here are some ways you can increase the performance.

  • Starting with Django 1.4, you can change the default authentication function (docs). If you don't need much security, you can change the default function to be either SHA1 or MD5. This should increase the performance however keep in mind that the security will be much weaker. My personal opinion is that security is important and is worth the extra time but if it not warranted in your application, it's something you might want to consider.
  • Use sessions. The expensive hash function is only computed on the initial login. Once the user logs in, a session is created for that session and a cookie is send to the user with the session id. Then on subsequent requests, user uploads a cookie and if the session has not expired yet, the user is automatically authenticated (don't worry about security since session data is signed...). The point is that verifying session is A LOT less computationally expensive compared to computing the expensive hash function. I guess that in ab tests you did not send a session cookie. Try to do some tests with an addition of sending a session cookie and see how it performs. If sending cookies is not really an option since you are making a JSON API, then you can modify the session back-end to accept the session data via a session GET parameter instead of a cookie. Not sure however what are security ramifications of doing that.
  • Switch to nginx. I am not an expert in deployment but in my experience nginx is much faster and more friendly to Django compared to Apache. One advantage which I think might be of particular interest to you is nginx ability to have multiple worker processes and its ability to use proxy_pass to hand of requests to Django process(es). If you will have multiple worker processes, you can point each worker to a separate Django process via proxy_pass which will effectively add multiprocessing to Django. Another alternative is if you use something like gevent WSGI server, you can make a pool in Django process which also might increase performance. Not sure if any of these will increase your performance drastically since your CPU load is already at 100% but it might be something to look into.
like image 130
miki725 Avatar answered Nov 15 '22 16:11

miki725