How can global variables be accessed when using Multiprocessing and Pool?

Tags:

I'm trying to avoid having to pass variables redundantly into dataList (e.g. [(1, globalDict), (2, globalDict), (3, globalDict)]) and use them globally instead. global globalDict is not a solution to do so in the following code, however.

Is there a straightforward way to access data in a multiprocessing function globally?

I read the following here:

"Communication is expensive. In contrast to communication between threads, exchanging data between processes is much more expensive. In Python, the data is pickled in to binary format before transferring on pipes. Hence, the overhead of communication can be very significant when the task is small. To reduce the extraneous cost, better assign tasks in chunk."

I'm not sure if that would apply here, but I would like to simplify data access in any case.

def MPfunction(data):
    global globalDict

    data += 1

    # use globalDict

    return data

if __name__ == '__main__':

    pool = mp.Pool(mp.cpu_count())

    try:
        globalDict = {'data':1}

        dataList = [0, 1, 2, 3]
        data = pool.map(MPfunction, dataList, chunksize=10)

    finally:
        pool.close()
        pool.join()
        pool.terminate()

787

asked May 03 '17 01:05

Phillip

1 Answers

On Linux, multiprocessing forks a new copy of the process to run a pool worker. The process has a copy-on-write view of the parent memory space. As long as you allocate globalDict before creating the pool, its already there. Notice that any changes to that dict stay in the child.

On Windows, a new instance of python is created and the needed state is pickled/unpickled in the child. You can use an initializing function when you create the pool and copy there. That's one copy per child process which is better than once per item mapped.

(as an aside, start the try block after creating the pool so that you don't reference a bad pool object if that's what raises the error)

import platform

def MPfunction(data):
    global globalDict

    data += 1

    # use globalDict

    return data

if platform.system() == "Windows":
    def init_pool(the_dict):
        global globalDict
        globalDict = the_dict

if __name__ == '__main__':
    globalDict = {'data':1}

    if platform.system() == "Windows":
        pool = mp.Pool(mp.cpu_count, init_pool(globalDict))
    else:
        pool = mp.Pool(mp.cpu_count())

    try:
        dataList = [0, 1, 2, 3]
        data = pool.map(MPfunction, dataList, chunksize=10)
    finally:
        pool.close()
        pool.join()

131

answered Sep 22 '22 02:09

tdelaney

Related questions
                            
                                search function (query in Flask, SQLAlchemy)
                            
                                How to access RGB pixel arrays from DICOM files using pydicom?
                            
                                What is the return value of Connection.ping() in cx_oracle?
                            
                                Index entire array backwards in for loop
                            
                                How to set size of a Gtk Image in Python
                            
                                Model description in django-admin
                            
                                How do you initialize a global variable only when its not defined?
                            
                                How to expand one column in Pandas to many columns?
                            
                                Tensorflow: Linear regression with non-negative constraints
                            
                                Python Dataframe select rows based on max values in one of the columns
                            
                                urllib: Get name of file from direct download link
                            
                                Gvim can not load my Python
                            
                                how to check if a string contains only lower case letters and numbers?
                            
                                tensorflow map_fn TensorArray has inconsistent shapes
                            
                                Django - Apache with mod_wsgi not serving static-files
                            
                                How to find the path of Tcl/Tk library that Tkinter is currently using?
                            
                                Maya AbcExport with Python
                            
                                using proxy with scrapy-splash
                            
                                What are the true and false criteria for a python object? [duplicate]
                            
                                play raw audio file in python in realtime

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can global variables be accessed when using Multiprocessing and Pool?

Tags:

python

global-variables

multiprocessing

Phillip

People also ask

1 Answers

tdelaney

Recent Activity

Donate For Us