I have a simple function that go over a list of URLs, using GET
to retrieve some information and update the DB (PostgresSQL
) accordingly. The function works perfect. However, going over each URL one at a time talking too much time.
Using python, I'm able to do to following to parallel these tasks:
from multiprocessing import Pool
def updateDB(ip):
code goes here...
if __name__ == '__main__':
pool = Pool(processes=4) # process per core
pool.map(updateDB, ip)
This is working pretty well. However, I'm trying to find how do the same on django project. Currently I have a function (view) that go over each URL to get the information, and update the DB.
The only thing I could find is using Celery, but this seems to be a bit overpower for the simple task I want to perform.
Is there anything simple that i can do or do I have to use Celery?
Essentially Django serves WSGI request-response cycle which knows nothing of multiprocessing or background tasks.
In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.
The multiprocessing module has its own logger with the name “multiprocessing“. This logger is used within objects and functions within the multiprocessing module to log messages, such as debug messages that processes are running or have shutdown. We can get this logger and use it for logging.
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
Currently I have a function (view) that go over each URL to get the information, and update the DB.
It means response time does not matter for you and instead of doing it in the background (asynchronously), you are OK with doing it in the foreground if your response time is cut by 4 (using 4 sub-processes/threads). If that is the case you can simply put your sample code in your view. Like
from multiprocessing import Pool
def updateDB(ip):
code goes here...
def my_view(request):
pool = Pool(processes=4) # process per core
pool.map(updateDB, ip)
return HttpResponse("SUCCESS")
But, if you want to do it asynchronously in the background then you should use Celery or follow one of @BasicWolf's suggestions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With