Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing seems near impossible to do within classes/using any class instances. What is its intended use?

I have an alogirithm that I am trying to parallelize, because of very long run times in serial. However, the function that needs to be parallelized is inside a class. multiprocessing.Pool seems to be the best and fastest way to do this, but there is a problem. It's target function can not be a function of an object instance. Meaning this; you declare a Pool in the following way:

import multiprocessing as mp
cpus = mp.cpu_count()
poolCount = cpus*2
pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)

And then actually use it as so:

pool.map(self.TargetFunction, args)

But this throws an error, because object instances cannot be pickled, as the Pool function does to pass information to all of its child processes. But I have to use self.TargetFunction

So I had an idea, I would create a new Python file named parallel and simply write a couple of functions without putting them in a class, and call those functions from within my original class (of whose function I want to parallelize)

So I tried this:

import multiprocessing as mp

def MatrixHelper(args):
    WM = args[0][0]
    print(WM.CreateMatrixMp(*args))
    return WM.CreateMatrixMp(*args)

def Start(sigmaI, sigmaX, numPixels, WM):

    cpus = mp.cpu_count()
    poolCount = cpus * 2
    args = [(WM, sigmaI, sigmaX, i) for i in range(numPixels)]
    print('Number of cpu\'s to process WM:%d'%cpus)

    pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)
    tempData = pool.map(MatrixHelper, args)

    return tempData

These functions are not part of a class, using MatrixHelper in Pools map function works fine. But I realized while doing this that it was no way out. The function in need of parallelization (CreateMatrixMp) expects an object to be passed to it (it is declared as def CreateMatrixMp(self, sigmaI, sigmaX, i))

Since it is not being called from within its class, it doesn't get a self passed to it. To solve this, I passed the Start funtion the calling object itself. As in, I say parallel.Start(sigmaI, sigmaX, self.numPixels, self). The object self then becomes WM so that I will be able to finally call the desired function as WM.CreateMatrixMp().

I'm sure that that is a very sloppy way of coding, but I just wanted to see if it would work. But nope, pickling error again, the map function cannot handle any objects instances at all.

So my question is, why is it designed this way? It seems useless, it seems to be completely disfunctional in any program that uses classes at all.

I tried using Process rather than Pool, but this requires the array that I am ultimately writing to to be shared, which requires processes waiting for eachother. If I don't want it to be shared, then I have each process write its own smaller array, and do one big write at the end. But both of these result in slower run times than when I was doing this serially! Pythons builtin multiprocessing seems absolutely useless!

Can someone please give me some guidance as to how to actually save time with multiprocessing, in the context of my tagret function being inside a class? I have read on posts here to use pathos.multiprocessing instead, but I am on Windows, and am working on this project with multiple people who all have different set ups. Having everyone try to install it would be inconveinient.

like image 941
pretzlstyle Avatar asked Jul 30 '15 16:07

pretzlstyle


People also ask

What is multiprocessing used for in Python?

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.

Is multiprocessing possible in Python?

The Python language allows for something called multiprocess, a term that describes the act of running many processes simultaneously. With it, you can write a program and assign many tasks to be completed at the same time, saving time and energy.

How does multiprocessing queue work in Python?

The multiprocessing. Queue provides a first-in, first-out FIFO queue, which means that the items are retrieved from the queue in the order they were added. The first items added to the queue will be the first items retrieved.

Does Python multiprocessing use multiple cores?

Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Does Python multiprocessing use shared memory?

multiprocessing is a drop in replacement for Python's multiprocessing module. It supports the exact same operations, but extends it, so that all tensors sent through a multiprocessing. Queue , will have their data moved into shared memory and will only send a handle to another process.


1 Answers

I was having a similar issue with trying to use multiprocessing within a class. I was able to solve it with a relatively easy workaround I found online. Basically you use a function outside of your class that unwraps/unpacks the method inside your function that you're trying to parallelize. Here are the two websites I found that explain how to do it.

Website 1 (joblib example)

Website 2 (multiprocessing module example)

like image 153
DataMan Avatar answered Sep 29 '22 12:09

DataMan