Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Multiprocessing - Just not getting it

I've been spending some time trying to understand multiprocessing, though its finer points evade my untrained mind. I've been able to get a pool to return a simple integer, but if the function doesn't just return a result like all of the examples I can find (even in the documentation, it's some obscure example I can't quite understand.

Here is an example I'm trying to get working. BUT, I can't get it working as intended, and I'm sure there's a simple reason why. I may need to use a queue or shared memory or a manager, but as many times as I read the documentation I can't seem to wrap my brain around what it actually means and what it does. All I've been able to get an understanding of so far is the pool function.

Also, I'm using a class as I need to avoid using global variables as in this question's answer.

import random

class thisClass:
    def __init__(self):
        self.i = 0

def countSixes(myClassObject):
    newNum = random.randrange(0,10)
    #print(newNum) #this proves the function is being run if enabled
    if newNum == 6:
        myClassObject.i += 1

if __name__ == '__main__':
    import multiprocessing
    pool = multiprocessing.Pool(1) #use one core for now

    counter = thisClass()

    myList = []
    [myList.append(x) for x in range(1000)]

    #it must be (args,) instead of just i, apparently
    async_results = [pool.apply_async(countSixes, (counter,)) for i in myList]

    for x in async_results:
        x.get(timeout=1)

    print(counter.i)

Can someone explain in dumb-dumb what needs to be done so I can finally understand what I'm missing and what it does?

like image 918
squid808 Avatar asked Jun 15 '11 15:06

squid808


1 Answers

It took me a while to understand what you want to happen. The problem has to do with the way multiprocessing works. Basically, you need to write your program in a functional style, instead of relying on side-effects as you do now.

Right now, you're sending out objects to your pool to be modified and returning nothing from countSixes. That won't work with multiprocessing, because in order to sidestep the GIL, multiprocessing creates a copy of counter and sends it to a brand new interpreter. So when you increment i, you're actually incrementing a copy of i, and then, because you return nothing, you are discarding it!

To do something useful, you have to return something from countSixes. Here's a simplified version of your code that does something similar to what you want. I left an argument in, just to show what you ought to be doing, but really this could be done with a zero-arg function.

import random

def countSixes(start):
    newNum = random.randrange(0,10)
    if newNum == 6:
        return start + 1
    else:
        return start

if __name__ == '__main__':
    import multiprocessing
    pool = multiprocessing.Pool(1) #use one core for now

    start = 0
    async_results = [pool.apply_async(countSixes, (start,)) for i in range(1000)]

    print(sum(r.get() for r in async_results))
like image 104
senderle Avatar answered Sep 18 '22 01:09

senderle