In one model I've got update() method which updating few fields and creates one object of some other model. The problem is that data I use to update is fetched from another host (unique for each object) and it could take a moment (host may be offline, and timeout is set to 3sec). And now, I need to update couple of hundred objects, 3-4 times per hour - of course updating every one in a row is not an option, because it could take all day. My first thought was split it up for 50-100 threads so each one could update its own part of objects. 99% of update function time is waiting for server respond (there is few bytes of data only, so pings are the problem), I think the CPU won't be a problem, I'm more worried about:
You can perform actions from different thread manually (eg with Queue
and executors pool), but you should note, that Django's ORM manages database connections in thread-local variables. So each new thread = new connection to database (which will be not good idea for 50-100 threads for one request - too many connections). On the other hand, you should check database "bandwith".
Threads should work wonderfully for this kind of work. (@g19fanatic: the GIL is not going to be a problem, of course, since these tasks are not cpu-bound -- it's the same reason that there is no point in using multiprocessing.. or worrying about the number of cores)
The Django ORM can handle this, but depending on what you're doing you might need to use transactions -- but try not to hold a transaction open for 3 seconds if you can avoid it.
Normally, I would suggest using threading.Queue and a producer/consumer pattern (e.g. the bottom of this page), however, since you know that the number of tasks is reasonable and your tasks take a long time (3 seconds) you might as well just spawn them all and let the OS figure it out :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With