Python multiprocessing seems near impossible to do within classes/using any class instances. What is its intended use?

Tags:

I have an alogirithm that I am trying to parallelize, because of very long run times in serial. However, the function that needs to be parallelized is inside a class. multiprocessing.Pool seems to be the best and fastest way to do this, but there is a problem. It's target function can not be a function of an object instance. Meaning this; you declare a Pool in the following way:

import multiprocessing as mp
cpus = mp.cpu_count()
poolCount = cpus*2
pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)

And then actually use it as so:

pool.map(self.TargetFunction, args)

But this throws an error, because object instances cannot be pickled, as the Pool function does to pass information to all of its child processes. But I have to use self.TargetFunction

So I had an idea, I would create a new Python file named parallel and simply write a couple of functions without putting them in a class, and call those functions from within my original class (of whose function I want to parallelize)

So I tried this:

import multiprocessing as mp

def MatrixHelper(args):
    WM = args[0][0]
    print(WM.CreateMatrixMp(*args))
    return WM.CreateMatrixMp(*args)

def Start(sigmaI, sigmaX, numPixels, WM):

    cpus = mp.cpu_count()
    poolCount = cpus * 2
    args = [(WM, sigmaI, sigmaX, i) for i in range(numPixels)]
    print('Number of cpu\'s to process WM:%d'%cpus)

    pool = mp.Pool(processes = poolCount, maxtasksperchild = 2)
    tempData = pool.map(MatrixHelper, args)

    return tempData

These functions are not part of a class, using MatrixHelper in Pools map function works fine. But I realized while doing this that it was no way out. The function in need of parallelization (CreateMatrixMp) expects an object to be passed to it (it is declared as def CreateMatrixMp(self, sigmaI, sigmaX, i))

Since it is not being called from within its class, it doesn't get a self passed to it. To solve this, I passed the Start funtion the calling object itself. As in, I say parallel.Start(sigmaI, sigmaX, self.numPixels, self). The object self then becomes WM so that I will be able to finally call the desired function as WM.CreateMatrixMp().

I'm sure that that is a very sloppy way of coding, but I just wanted to see if it would work. But nope, pickling error again, the map function cannot handle any objects instances at all.

So my question is, why is it designed this way? It seems useless, it seems to be completely disfunctional in any program that uses classes at all.

I tried using Process rather than Pool, but this requires the array that I am ultimately writing to to be shared, which requires processes waiting for eachother. If I don't want it to be shared, then I have each process write its own smaller array, and do one big write at the end. But both of these result in slower run times than when I was doing this serially! Pythons builtin multiprocessing seems absolutely useless!

Can someone please give me some guidance as to how to actually save time with multiprocessing, in the context of my tagret function being inside a class? I have read on posts here to use pathos.multiprocessing instead, but I am on Windows, and am working on this project with multiple people who all have different set ups. Having everyone try to install it would be inconveinient.

941

asked Jul 30 '15 16:07

pretzlstyle

1 Answers

I was having a similar issue with trying to use multiprocessing within a class. I was able to solve it with a relatively easy workaround I found online. Basically you use a function outside of your class that unwraps/unpacks the method inside your function that you're trying to parallelize. Here are the two websites I found that explain how to do it.

Website 1 (joblib example)

Website 2 (multiprocessing module example)

153

answered Sep 29 '22 12:09

DataMan

Related questions
                            
                                Why do you need lambda to nest defaultdict?
                            
                                Scikit-learn : roc_auc_score
                            
                                Why does `class X: mypow = pow` work? What about `self`?
                            
                                Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings'
                            
                                How to present numpy array into pygame surface?
                            
                                ZMQ: No subscription message on XPUB socket for multiple subscribers (Last Value Caching pattern)
                            
                                Python YAML preserving newline without adding extra newline
                            
                                Export scraping data in multiple formats using scrapy
                            
                                Routine to extract linear independent rows from a rank deficient matrix
                            
                                pyplot x-axis being sorted
                            
                                Make tkinter Window appear in the taskbar
                            
                                Sending photo from URL with Telegram Bot
                            
                                Replace None in list with leftmost non none value
                            
                                python set comprehension for 2.6
                            
                                Running multiple Kivy apps at same time that communicate with each other
                            
                                How does HttpResponse(status=<code>) work in django?
                            
                                Getting the maximum accuracy for a binary probabilistic classifier in scikit-learn
                            
                                How to get the current behave step with Python?
                            
                                why can't numpy compute long objects?
                            
                                pytesseract don't work with one digit image

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python multiprocessing seems near impossible to do within classes/using any class instances. What is its intended use?

Tags:

python

parallel-processing

multiprocessing

threadpool

pretzlstyle

People also ask

1 Answers

DataMan

Recent Activity

Donate For Us