I would like to use multiprocessing in python with generator functions
Let's say I have a massive list of lists big_list, and I would like to use multiprocessing to compute values. If I use "traditional" functions which return values, this is straightforward:
import concurrent
def compute_function(list_of_lists):
return_values = [] ## empty list
for list in list_of_lists:
new_value = compute_something(list) ## compute something; just an example
return_values.append(new_value) ## append to list
return return_values
with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
new_list = list(executor.map(compute_function, big_list))
However, using lists in this manner is too memory intensive. So I would like to use generator functions instead:
import concurrent
def generator_function(list_of_lists):
for list in list_of_lists:
new_value = compute_something(list) ## compute something; just an example
yield new_value
with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
new_list = list(executor.map(generator_function, big_list))
My problem is, you cannot pickle generators. There are some workarounds to this problem for other data structures, but not for generators I think.
How could I accomplish this?
You can do your enumeration one level deeper in big_list using itertools.chain.from_iterable to iterate the sublists.
import concurrent
import itertools
def compute_function(item):
return compute_something(item)
with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
for result in executor.map(compute_function,
itertools.chain.from_iterable(big_list)):
print(result)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With