Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the multiprocessing module

I'm trying to use the multiprocessing module in python 2.6, but apparently there is something I do not understand. I would expect the class below to add up the numbers sent to it by add() and return the sum in the get_result() method. The code below prints "0", I'd like it to print "2". What have I missed?

import multiprocessing

class AdderProcess(multiprocessing.Process):

    def __init__(self):
        multiprocessing.Process.__init__(self)
        self.sum = 0
        self.queue = multiprocessing.JoinableQueue(5)
        self.daemon = True
        self.start()

    def run(self):
        while True:
            number = self.queue.get()
            self.sum += number
            self.queue.task_done()

    def add(self, number):
        self.queue.put(number)

    def get_result(self):
        self.queue.join()
        return self.sum


p = AdderProcess()
p.add(1)
p.add(1)
print p.get_result()

PS. This problem has been solved. Thanks for the answers! Just to make it easier for any readers, here's the complete working version:

import multiprocessing

class AdderProcess(multiprocessing.Process):

    def __init__(self):
        multiprocessing.Process.__init__(self)
        self.sum = multiprocessing.Value('d', 0.0)
        self.queue = multiprocessing.JoinableQueue(5)
        self.daemon = True
        self.start()

    def run(self):
        while True:
            number = self.queue.get()
            self.sum.value += number
            self.queue.task_done()

    def add(self, number):
        self.queue.put(number)

    def get_result(self):
        self.queue.join()
        return self.sum.value

p = AdderProcess()
p.add(1)
p.add(1)
print p.get_result()
like image 715
Mats Ekberg Avatar asked Oct 15 '11 20:10

Mats Ekberg


People also ask

How do you use multiprocessing in Python?

Python multiprocessing Process classAt first, we need to write a function, that will be run by the process. Then, we need to instantiate a process object. If we create a process object, nothing will happen until we tell it to start processing via start() function. Then, the process will run and return its result.

When should I use multiprocessing in Python?

If your program is IO-bound, both multithreading and multiprocessing in Python will work smoothly. However, If the code is CPU-bound and your machine has multiple cores, multiprocessing would be a better choice.

When would you use a multiprocessing pool?

Use the multiprocessing pool if your tasks are independent. This means that each task is not dependent on other tasks that could execute at the same time. It also may mean tasks that are not dependent on any data other than data provided via function arguments to the task.

Why do we use multiprocessing?

Multiprocessing is useful for CPU-bound processes, such as computationally heavy tasks since it will benefit from having multiple processors; similar to how multicore computers work faster than computers with a single core.


2 Answers

Change self.sum = 0 to self.sum = multiprocessing.Value('d', 0.0), and use self.sum.value to access or change the value.

class AdderProcess(multiprocessing.Process):    
    def __init__(self):
        ...
        self.sum = multiprocessing.Value('d', 0.0) 
        ...
    def run(self):
        while True:
            number = self.queue.get()
            self.sum.value += number    # <-- use self.sum.value
            self.queue.task_done()
    def get_result(self):
        self.queue.join()
        return self.sum.value           # <-- use self.sum.value

The problem is this: Once you call self.start() in __init__, the main process forks off a child process. All values are copied. Now there are two versions of p. In the main process, p.sum is 0. In the child process, the run method is called and p.sum is augmented to 2. But when the main process calls p.get_result(), its version of p still has p.sum equal to 0. So 0 is printed.

When you want to share a float value between processes, you need to use a sharing mechanism, such as mp.Value.

See "Sharing state between processes" for more options on how to share values.

like image 133
unutbu Avatar answered Sep 21 '22 02:09

unutbu


self.sum is 2... in that process:

def run(self):
    while True:
        number = self.queue.get()
        print "got %s from queue" % number
        print "Before adding - self.sum = %d" % self.sum
        self.sum += number
        print "After adding - self.sum = %d" % self.sum
        self.queue.task_done()

[ 13:56 jon@host ~ ]$ ./mp.py
got 1 from queue
Before adding - self.sum = 0
After adding - self.sum = 1
got 1 from queue
Before adding - self.sum = 1
After adding - self.sum = 2

See multiprocessing 16.3.1.4. - Sharing state between processes on how to get self.sum to be the same in different processes.

like image 35
chown Avatar answered Sep 21 '22 02:09

chown