Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modify object in python multiprocessing

I have a large array of custom objects which I need to perform independent (parallelizable) tasks on, including modifying object parameters. I've tried using both a Manager().dict, and 'sharedmem'ory, but neither is working. For example:

import numpy as np
import multiprocessing as mp
import sharedmem as shm


class Tester:

    num = 0.0
    name = 'none'
    def __init__(self,tnum=num, tname=name):
        self.num  = tnum
        self.name = tname

    def __str__(self):
        return '%f %s' % (self.num, self.name)

def mod(test, nn):
    test.num = np.random.randn()
    test.name = nn


if __name__ == '__main__':

    num = 10

    tests = np.empty(num, dtype=object)
    for it in range(num):
        tests[it] = Tester(tnum=it*1.0)

    sh_tests = shm.empty(num, dtype=object)
    for it in range(num):
        sh_tests[it] = tests[it]
        print sh_tests[it]

    print '\n'
    workers = [ mp.Process(target=mod, args=(test, 'some') ) for test in sh_tests ]

    for work in workers: work.start()

    for work in workers: work.join()

    for test in sh_tests: print test

prints out:

0.000000 none
1.000000 none
2.000000 none
3.000000 none
4.000000 none
5.000000 none
6.000000 none
7.000000 none
8.000000 none
9.000000 none


0.000000 none
1.000000 none
2.000000 none
3.000000 none
4.000000 none
5.000000 none
6.000000 none
7.000000 none
8.000000 none
9.000000 none

I.e. the objects aren't modified.

How can I achieve the desired behavior?

like image 680
DilithiumMatrix Avatar asked Apr 07 '13 01:04

DilithiumMatrix


1 Answers

The problem is that when the objects are passed to the worker processes, they are packed up with pickle, shipped to the other process, where they are unpacked and worked on. Your objects aren't so much passed to the other process, as cloned. You don't return the objects, so the cloned object are happily modified, and then thrown away.

It looks like this can not be done (Python: Possible to share in-memory data between 2 separate processes) directly.

What you can do is return the modified objects.

import numpy as np
import multiprocessing as mp



class Tester:

    num = 0.0
    name = 'none'
    def __init__(self,tnum=num, tname=name):
        self.num  = tnum
        self.name = tname

    def __str__(self):
        return '%f %s' % (self.num, self.name)

def mod(test, nn, out_queue):
    print test.num
    test.num = np.random.randn()
    print test.num
    test.name = nn
    out_queue.put(test)




if __name__ == '__main__':       
    num = 10
    out_queue = mp.Queue()
    tests = np.empty(num, dtype=object)
    for it in range(num):
        tests[it] = Tester(tnum=it*1.0)


    print '\n'
    workers = [ mp.Process(target=mod, args=(test, 'some', out_queue) ) for test in tests ]

    for work in workers: work.start()

    for work in workers: work.join()

    res_lst = []
    for j in range(len(workers)):
        res_lst.append(out_queue.get())

    for test in res_lst: print test

This does lead to the interesting observation that because the spawned processes are identical, they all start with the same seed for the random number, so they all generate the same 'random' number.

like image 68
tacaswell Avatar answered Sep 20 '22 14:09

tacaswell