Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django databases and threads

In one model I've got update() method which updating few fields and creates one object of some other model. The problem is that data I use to update is fetched from another host (unique for each object) and it could take a moment (host may be offline, and timeout is set to 3sec). And now, I need to update couple of hundred objects, 3-4 times per hour - of course updating every one in a row is not an option, because it could take all day. My first thought was split it up for 50-100 threads so each one could update its own part of objects. 99% of update function time is waiting for server respond (there is few bytes of data only, so pings are the problem), I think the CPU won't be a problem, I'm more worried about:

  • Django ORM. Can it handle it? Getting all objects, splitting it up, and updating from >50 threads?
  • Is it a good idea to solve this? If it is - how to do it and don't screw a database? Or maybe I shouldn't care about so little records?
  • If it isn't a good way, how to do it right?
like image 508
Kiro Avatar asked Aug 24 '12 15:08

Kiro


2 Answers

You can perform actions from different thread manually (eg with Queue and executors pool), but you should note, that Django's ORM manages database connections in thread-local variables. So each new thread = new connection to database (which will be not good idea for 50-100 threads for one request - too many connections). On the other hand, you should check database "bandwith".

like image 103
Alexey Kachayev Avatar answered Nov 16 '22 07:11

Alexey Kachayev


Threads should work wonderfully for this kind of work. (@g19fanatic: the GIL is not going to be a problem, of course, since these tasks are not cpu-bound -- it's the same reason that there is no point in using multiprocessing.. or worrying about the number of cores)

The Django ORM can handle this, but depending on what you're doing you might need to use transactions -- but try not to hold a transaction open for 3 seconds if you can avoid it.

Normally, I would suggest using threading.Queue and a producer/consumer pattern (e.g. the bottom of this page), however, since you know that the number of tasks is reasonable and your tasks take a long time (3 seconds) you might as well just spawn them all and let the OS figure it out :-)

like image 29
thebjorn Avatar answered Nov 16 '22 08:11

thebjorn