Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mutliprocessing Queue vs. Pool

I'm having the hardest time trying to figure out the difference in usage between multiprocessing.Pool and multiprocessing.Queue.

To help, this is bit of code is a barebones example of what I'm trying to do.

def update():
    def _hold(url):
        soup = BeautifulSoup(url)
        return soup
    def _queue(url):
        soup = BeautifulSoup(url)
        li = [l for l in soup.find('li')]
        return True if li else False

    url = 'www.ur_url_here.org'
    _hold(url)
    _queue(url)

I'm trying to run _hold() and _queue() at the same time. I'm not trying to have them communicate with each other so there is no need for a Pipe. update() is called every 5 seconds.

I can't really rap my head around the difference between creating a pool of workers, or creating a queue of functions. Can anyone assist me?

The real _hold() and _queue() functions are much more elaborate than the example so concurrent execution actually is necessary, I just thought this example would suffice for asking the question.

like image 250
aseylys Avatar asked Feb 23 '18 01:02

aseylys


1 Answers

The Pool and the Queue belong to two different levels of abstraction.

The Pool of Workers is a concurrent design paradigm which aims to abstract a lot of logic you would otherwise need to implement yourself when using processes and queues.

The multiprocessing.Pool actually uses a Queue internally for operating.

If your problem is simple enough, you can easily rely on a Pool. In more complex cases, you might need to deal with processes and queues yourself.

For your specific example, the following code should do the trick.

def hold(url):
    ...
    return soup

def queue(url):
    ...
    return bool(li)

def update(url):
    with multiprocessing.Pool(2) as pool:
        hold_job = pool.apply_async(hold, args=[url])
        queue_job = pool.apply_async(queue, args=[url])

        # block until hold_job is done
        soup = hold_job.get()
        # block until queue_job is done
        li = queue_job.get()

I'd also recommend you to take a look at the concurrent.futures module. As the name suggest, that is the future proof implementation for the Pool of Workers paradigm in Python.

You can easily re-write the example above with that library as what really changes is just the API names.

like image 129
noxdafox Avatar answered Sep 24 '22 13:09

noxdafox