Overcome memory constraints while using multiprocessing

Tags:

I am doing a function optimization using an evolutionary algorithm (CMAES). To run it faster I am using the multiprocessing module. The function I need to optimize takes large matrices as inputs (input_A_Opt, and input_B_Opt) in the code below.

They are several GBs of size. When I run the function without multiprocessing, it works well. When I use multiprocessing there seems to be a problem with memory. If I run it with small inputs it works well, but when I run with the full input, I get the following error:

File "<ipython-input-2-bdbae5b82d3c>", line 1, in <module>
opt.myFuncOptimization()

File "/home/joe/Desktop/optimization_folder/Python/Optimization.py", line 45, in myFuncOptimization
**f_values = pool.map_async(partial_function_to_optmize, solutions).get()**
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))

File "/usr/lib/python3.5/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)

error: 'i' format requires -2147483648 <= number <= 2147483647

And here's a simplified version of the code (again, if I run it with the input 10 times smaller, all works fine):

import numpy as np
import cma
import multiprocessing as mp
import functools
import myFuncs
import hdf5storage



def myFuncOptimization ():

    temp = hdf5storage.loadmat('/home/joe/Desktop/optimization_folder/matlab_workspace_for_optimization')    

    input_A_Opt  = temp["input_A"]
    input_B_Opt  = temp["input_B"]

    del temp

    numCores = 20

    # Inputs
   #________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
    P0 = np.array([            4.66666667, 2.5,    2.66666667, 4.16666667, 0.96969697,     1.95959596,     0.44088176,     0.04040404,     6.05210421,     0.58585859,     0.46464646,         8.75751503,         0.16161616,             1.24248497,         1.61616162,                 1.56312625,         5.85858586,                 0.01400841, 1.0,            2.4137931,      0.38076152, 2.5,    1.99679872      ])
    LBOpt = np.array([         0.0,        0.0,    0.0,        0.0,        0.0,            0.0,            0.0,            0.0,            0.0,            0.0,            0.0,                0.0,                0.0,                    0.0,                0.0,                        0.0,                0.0,                        0.0,        0.0,            0.0,            0.0,        0.0,    0.0,            ])
    UBOpt = np.array([         10.0,       10.0,   10.0,       10.0,       10.0,           10.0,           10.0,           10.0,           10.0,           10.0,           10.0,               10.0,               10.0,                   10.0,               10.0,                       10.0,               10.0,                       10.0,       10.0,           10.0,           10.0,       10.0,   10.0,           ])
    initialStdsOpt = np.array([2.0,        2.0,    2.0,        2.0,        2.0,            2.0,            2.0,            2.0,            2.0,            2.0,            2.0,                2.0,                2.0,                    2.0,                2.0,                        2.0,                2.0,                        2.0,        2.0,            2.0,            2.0,        2.0,    2.0,            ])
    minStdsOpt = np.array([    0.030,      0.40,   0.030,      0.40,       0.020,          0.020,          0.020,          0.020,          0.020,          0.020,          0.020,              0.020,              0.020,                  0.020,              0.020,                      0.020,              0.020,                      0.020,      0.050,          0.050,          0.020,      0.40,   0.020,          ]) 

    options = {'bounds':[LBOpt,UBOpt], 'CMA_stds':initialStdsOpt, 'minstd':minStdsOpt, 'popsize':numCores}
    es = cma.CMAEvolutionStrategy(P0, 1, options)

    pool = mp.Pool(numCores)

    partial_function_to_optmize = functools.partial(myFuncs.func1, input_A=input_A_Opt, input_B=input_B_Opt)

    while not es.stop():
        solutions = es.ask(es.popsize)            
        f_values = pool.map_async(partial_function_to_optmize, solutions).get()   
        es.tell(solutions, f_values)
        es.disp(1)
        es.logger.add()

    return es.result_pretty()

Any suggestions on how to solve this issue? am I not coding properly (new to python) or should I use other multiprocessing package like scoop?

341

asked Nov 16 '16 21:11

Joe

1 Answers

Your objects are too big to pass between processes. You're passing along more than 2147483647 bytes - that's over 2GB! The protocol isn't made for this, and the sheer overhead of serializing and deserializing such large chunks of data can be a serious performance overhead.

Reduce the size of data passed to each process. If you workflow allows it, read in the data in the separate process, and pass along only the results.

100

answered Oct 11 '22 13:10

MisterMiyagi

Related questions
                            
                                Pandas Dataframe indexing: KeyError:none of [columns] are in the [columns]
                            
                                Keras dependencies needed for prediction only in AWS Lambda
                            
                                Pandas series.rename gives TypeError: 'str' object is not callable error
                            
                                Using a DLL exported from D
                            
                                qt5 (and python) - How to get the default thumbnail size of user's os
                            
                                Clean separation between application thread and Qt thread (Python - PyQt)
                            
                                NaN results in tensorflow Neural Network
                            
                                TypeError: initial_value must be unicode or None, not str,
                            
                                Getting no such table error using pandas and sqldf
                            
                                MySQL Stored Procedures, Pandas, and "Use multi=True when executing multiple statements"
                            
                                Iterate over itertools.product in different order without ever creating list
                            
                                How to call pypdfocr functions to use them in a python script?
                            
                                Share DB connection in a process pool
                            
                                Getting covariance matrix of fitted parameters from scipy optimize.least_squares method
                            
                                Python with Selenium: Is it possible to refresh frame instead of the whole page?
                            
                                Is it possible to specify the driver dll directly in the ODBC connection string?
                            
                                Run Python + OpenCV + dlib in Azure Functions
                            
                                PyYAML dumping boolean
                            
                                Workflow for adding new columns from Pandas to SQLite tables
                            
                                Vim on Ubuntu 14.04 uses a funny python path, python can't import _io among other modules

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Overcome memory constraints while using multiprocessing

Tags:

python

multiprocessing

Joe

People also ask

1 Answers

MisterMiyagi

Recent Activity

Donate For Us