Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use python multiprocessing module in django view

I have a simple function that go over a list of URLs, using GET to retrieve some information and update the DB (PostgresSQL) accordingly. The function works perfect. However, going over each URL one at a time talking too much time.

Using python, I'm able to do to following to parallel these tasks:

from multiprocessing import Pool

def updateDB(ip):
     code goes here...

if __name__ == '__main__':
    pool = Pool(processes=4)              # process per core
    pool.map(updateDB, ip)

This is working pretty well. However, I'm trying to find how do the same on django project. Currently I have a function (view) that go over each URL to get the information, and update the DB.

The only thing I could find is using Celery, but this seems to be a bit overpower for the simple task I want to perform.

Is there anything simple that i can do or do I have to use Celery?

like image 415
Yakir Mordehay Avatar asked Feb 29 '16 12:02

Yakir Mordehay


People also ask

Does Django use multiprocessing?

Essentially Django serves WSGI request-response cycle which knows nothing of multiprocessing or background tasks.

How do you use multiprocessing in Python?

In this example, at first we import the Process class then initiate Process object with the display() function. Then process is started with start() method and then complete the process with the join() method. We can also pass arguments to the function using args keyword.

Does logging work with multiprocessing?

The multiprocessing module has its own logger with the name “multiprocessing“. This logger is used within objects and functions within the multiprocessing module to log messages, such as debug messages that processes are running or have shutdown. We can get this logger and use it for logging.

What is multiprocess Library in Python?

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.


1 Answers

Currently I have a function (view) that go over each URL to get the information, and update the DB.

It means response time does not matter for you and instead of doing it in the background (asynchronously), you are OK with doing it in the foreground if your response time is cut by 4 (using 4 sub-processes/threads). If that is the case you can simply put your sample code in your view. Like

from multiprocessing import Pool

def updateDB(ip):
     code goes here...

def my_view(request):
    pool = Pool(processes=4)              # process per core
    pool.map(updateDB, ip)
    return HttpResponse("SUCCESS")

But, if you want to do it asynchronously in the background then you should use Celery or follow one of @BasicWolf's suggestions.

like image 191
Muhammad Tahir Avatar answered Oct 12 '22 09:10

Muhammad Tahir