Python multiprocessing: How to know to use Pool or Process?

People also ask

When would you use a multiprocessing pool?

Use the multiprocessing pool if your tasks are independent. This means that each task is not dependent on other tasks that could execute at the same time. It also may mean tasks that are not dependent on any data other than data provided via function arguments to the task.

What is the difference between pool and process in Python?

Pool is generally used for heterogeneous tasks, whereas multiprocessing. Process is generally used for homogeneous tasks. The Pool is designed to execute heterogeneous tasks, that is tasks that do not resemble each other. For example, each task submitted to the process pool may be a different target function.

How do you check if a process is alive in Python multiprocessing?

Python multiprocessing is_aliveThe is_alive method determines if the process is running. When we wait for the child process to finish with the join method, the process is already dead when we check it. If we comment out the join , the process is still alive.

So I have an algorithm I am writing, and the function multiprocess is supposed to call another function, CreateMatrixMp(), on as many processes as there are cpus, in parallel. I have never done multiprocessing before, and cannot be certain which one of the below methods is more efficient. The word "efficient" being used in the context of the function CreateMatrixMp() needing to potentially be called thousands of times.I have read all of the documentation on the python multiprocessing module, and have come to these two possibilities:

First is using the Pool class:

def MatrixHelper(self, args):
    return self.CreateMatrix(*args)

def Multiprocess(self, sigmaI, sigmaX):

    cpus = mp.cpu_count()
    print('Number of cpu\'s to process WM: %d' % cpus)
    poolCount = cpus*2
    args = [(sigmaI, sigmaX, i) for i in range(self.numPixels)]

    pool = mp.Pool(processes = poolCount, maxtasksperchild= 2)
    tempData = pool.map(self.MatrixHelper, args)
    pool.close()
    pool.join()

And next is using the Process class:

def Multiprocess(self, sigmaI, sigmaX):

    cpus = mp.cpu_count()
    print('Number of cpu\'s to process WM: %d' % cpus)

    processes = [mp.Process(target = self.CreateMatrixMp, args = (sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
    for p in processes:
        p.start()
    for p in processes:
        p.join()

Pool seems to be the better choice. I have read that it causes less overhead. And Process does not consider the number of cpus on the machine. The only problem is that using Pool in this manner gives me error after error, and whenever I fix one, there is a new one underneath it. Process seems easier to implement, and for all I know it may be the better choice. What does your experience tell you?

If Pool should be used, then am I right in choosing map()? It would be preferred that order is maintained. I have tempData = pool.map(...) because the map function is supposed to return a list of the results of every process. I am not sure how Process handles its returned data.

Related questions
                            
                                Debug WatchKit on real Apple Watch - Nothing happens
                            
                                Correct microdata markup for breadcrumbs
                            
                                Convert any given function into an awaitable task
                            
                                AWS - How my single EC2 micro instance can cross 750 hour limit?
                            
                                How can I remove logback from a library's dependency while keeping SLF4J?
                            
                                Updating excel file using Apache POI
                            
                                JavaFX datepicker not updating value
                            
                                http://localhost/undefined 404 (Not Found)
                            
                                PySpark DataFrames - way to enumerate without converting to Pandas?
                            
                                How to hide a <li> item in html and make it not occupy any space?
                            
                                Increment Cell value by one by clicking
                            
                                Send empty HTTP header with libcurl

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python multiprocessing: How to know to use Pool or Process?

Tags:

People also ask

Recent Activity

Donate For Us