I am new to python and started using an genetic algorithm (GA) to do some sort of curve fitting. For that GA I am using the (awesome) pyevolve library (http://pyevolve.sourceforge.net/) that is able to reduce the calculation time enormously by using multiprocessing.
This is where my problem occurs: the curve that I want to approximate is an array that is read from an excel file and stored as an global variable at the beginning of my program. When using the python multiprocessing module, every process creates its own instance of python with its own global variable. That causes every individual, in every generation of the algorithm (means every process) to open and read the excel file again and again. Opening big excel files can cause an immense amount of time, so it would be nice to only have to open that file once and make the read array available to every process/individual.
The multiprocessing is initiated in the pyevolve library and I don't want to change it to keep it easy to update. Unfortunately this means just passing the variable to the process pool via e.g.
p = Process(target=my_func,args=(my_array))
is not an option for me. This is the only solution I found so far.
Does anyone know another way to make my_array accessible from every process?
Check out mmap, the Python interface for creating memory mapped files that can be shared between processes. You probably want something like the following:
import mmap
import os
import ctypes
mm = mmap.mmap(-1, 13)
mm.write('Hello world!')
mm_addr = id(mm)
with open('shared_id', 'w') as f:
f.write(str(mm_addr))
pid = os.fork()
if pid == 0: # In a child process
id_from_file = long(open('shared_id').read())
loaded_mm = ctypes.cast(id_from_file, ctypes.py_object).value
loaded_mm.seek(0)
print loaded_mm.readline()
loaded_mm.close()
I used this question to figure out how to get the physical memory address of the shared memory map and convert it back to a Python object.
I suppose you could also do this with any object in memory instead of a mmap, but I haven't tried it.
I just wanted to let you know how I solved this problem if anyone else is facing it:
My solution does not apply to the general python related problem but it helps when using pyevolve, which was enough in my case. What I didn't know was, that in pyevolve you can add parameters to your genomes or your genetic algorithm instance via
my_genome.setParams(xyz=my_array)
or
my_ga.setParams(xyz=my_array)
And these parameters can be accessed via
my_genome.getParam('xyz')
and my_ga.getParam('xyz')
These parameters are accessible by every process so my problem was solved and I didn't need to think about the general python multiprocessing issue. I hope that this helps anyone else!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With