I have the following function and dictionary comprehension:
def function(name, params):
results = fits.open(name)
<do something more to results>
return results
dictionary = {name: function(name, params) for name in nameList}
and would like to parallelize this. Any simple way to do this?
In here I have seend that the multiprocessing
module can be used, but could not understand how to make it pass my results to my dictionary.
NOTE: If possible, please give an answer that can be applied to any function that returns a result.
NOTE 2: the is mainly manipulate the fits file and assigning the results to a class
UPDATE
So here's what worked for me in the end (from @code_onkel answer):
def function(name, params):
results = fits.open(name)
<do something more to results>
return results
def function_wrapper(args):
return function(*args)
params = [...,...,..., etc]
p = multiprocessing..Pool(processes=(max([2, mproc.cpu_count() // 10])))
args_generator = ((name, params) for name in names)
dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))
using tqdm only worked partially since I could use my custom bar as tqdm reverts to a default bar with only the iterations.
The dictionary comprehension itself can not be parallelized. Here is an example how to use the multiprocessing
module with Python 2.7.
from __future__ import print_function
import time
import multiprocessing
params = [0.5]
def function(name, params):
print('sleeping for', name)
time.sleep(params[0])
return time.time()
def function_wrapper(args):
return function(*args)
names = list('onecharNAmEs')
p = multiprocessing.Pool(3)
args_generator = ((name, params) for name in names)
dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))
print(dictionary)
p.close()
This works with any function, though the restrictions of the multiprocssing
module apply. Most important, the classes passed as arguments and return values as well as the function to be parallelized itself have to be defined at the module level, otherwise the (de)serializer will not find them. The wrapper function is necessary since function()
takes two arguments, but Pool.map()
can only handle functions with one arguments (as the built-in map()
function).
Using Python >3.3 it can be simplified by using the Pool
as a context manager and the starmap()
function.
from __future__ import print_function
import time
import multiprocessing
params = [0.5]
def function(name, params):
print('sleeping for', name)
time.sleep(params[0])
return time.time()
names = list('onecharnamEs')
with multiprocessing.Pool(3) as p:
args_generator = ((name, params) for name in names)
dictionary = dict(zip(names, p.starmap(function, args_generator)))
print(dictionary)
This is a more readable version of the with
block:
with multiprocessing.Pool(3) as p:
args_generator = ((name, params) for name in names)
results = p.starmap(function, args_generator)
name_result_tuples = zip(names, results)
dictionary = dict(name_result_tuples)
The Pool.map()
function is for functions with a single argument, that's why the Pool.starmap()
function was added in 3.3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With