Use the multiprocessing pool if your tasks are independent. This means that each task is not dependent on other tasks that could execute at the same time. It also may mean tasks that are not dependent on any data other than data provided via function arguments to the task.
Pool is generally used for heterogeneous tasks, whereas multiprocessing. Process is generally used for homogeneous tasks. The Pool is designed to execute heterogeneous tasks, that is tasks that do not resemble each other. For example, each task submitted to the process pool may be a different target function.
Python multiprocessing is_aliveThe is_alive method determines if the process is running. When we wait for the child process to finish with the join method, the process is already dead when we check it. If we comment out the join , the process is still alive.
So I have an algorithm I am writing, and the function multiprocess
is supposed to call another function, CreateMatrixMp()
, on as many processes as there are cpus, in parallel. I have never done multiprocessing before, and cannot be certain which one of the below methods is more efficient. The word "efficient" being used in the context of the function CreateMatrixMp()
needing to potentially be called thousands of times.I have read all of the documentation on the python multiprocessing
module, and have come to these two possibilities:
First is using the Pool
class:
def MatrixHelper(self, args):
return self.CreateMatrix(*args)
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
poolCount = cpus*2
args = [(sigmaI, sigmaX, i) for i in range(self.numPixels)]
pool = mp.Pool(processes = poolCount, maxtasksperchild= 2)
tempData = pool.map(self.MatrixHelper, args)
pool.close()
pool.join()
And next is using the Process
class:
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
processes = [mp.Process(target = self.CreateMatrixMp, args = (sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
for p in processes:
p.start()
for p in processes:
p.join()
Pool
seems to be the better choice. I have read that it causes less overhead. And Process
does not consider the number of cpus on the machine. The only problem is that using Pool
in this manner gives me error after error, and whenever I fix one, there is a new one underneath it. Process
seems easier to implement, and for all I know it may be the better choice. What does your experience tell you?
If Pool
should be used, then am I right in choosing map()
? It would be preferred that order is maintained. I have tempData = pool.map(...)
because the map
function is supposed to return a list of the results of every process. I am not sure how Process
handles its returned data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With