Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing: shared memory and pickle issue

I have already done some multiprocessing in the past, but this time, I can't figure out a workaround.

I know that I can only pickle functions if they are at the top level of a module. This has always worked well so far, but now I have to work with shared memory in an instance and I don't see a way to move the function to the top level.

Consider this

import numpy as np
import multiprocessing
from itertools import repeat

class Test:

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def my_task(self):

        # Create process pool
        p = multiprocessing.Pool(4)

        # Create shared memory arrays
        share1 = multiprocessing.Array("d", self.x, lock=False)
        share2 = multiprocessing.Array("d", self.y, lock=False)

        def mp(xc, yc, c):

            # This is just some random weird statement
            foo = np.sum(share1) + np.sum(share2) +xc + yc + c
            return foo


        def mp_star(args):
            return mp(*args)

        # Define some input for multiprocessing
        xs = [1,2,3,4,5]
        ys = [5,6,7,8,9]
        c = 10

        # Submit tasks
        result = p.map(mp_star, zip(xs, ys, repeat(c)))

        # Close pool
        p.close()

        return result



# Get some input data
x = np.arange(10)
y = x**2

# Run the thing
cl = Test(x=x, y=y)
cl.my_task()

You can see that I need to access shared data from the instance itself. For this reason I put the multiprocessing parts within the method 'my_task'. For this reason I get the typical pickle error

_pickle.PicklingError: Can't pickle <function Test.my_task.<locals>.mp_star at 0x10224a400>: attribute lookup mp_star on __main__ failed

which I already know about. I can't move the multiprocessing tasks to the top level though since I need to access the shared data. Also I want to keep the number of dependencies low so I need to work with the built-in multiprocessing libraries.

I hope the code makes sense. So, how can I use the shared memory space from an instance in multiprocessing? Is there a way to move the functions to the top level?

like image 201
HansSnah Avatar asked Dec 01 '25 18:12

HansSnah


1 Answers

Since the only functions that can be pickled are those in top level (see the documentation for pickle) and multiprocessing want to pickle it you're stuck with putting it at top level. You simply has to rework your requirement.

For example you've got arguments to the functions, why not supplying the shared data? Or you could put the shared data in an instance that is pickleable and have the function being at top level (you can still supply a class instance to a top level function).

For example if you want to put the shared data in an instance you can simply define the method at top level as if it were a normal method (but put the definition at top level):

def fubar(self):
    return self.x

class C(object):
     def __init__(self, x):
          self.x = x

     foo = fubar

c = C()

now you can pickle fubar. You can call it either as c.foo() or fubar(c), but you can only pickle it as pickle.dumps(fubar) so when it's unpickled and called it will expect to be called in the later way so you have to supply the self parameter along with the other arguments in p.map (ie p.map(mp_star, zip(repeat(self), xs, ys, repeat(c))). You have of course to make sure that self is pickleable too.

like image 173
skyking Avatar answered Dec 04 '25 06:12

skyking



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!