Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use python multiprocessing with generators?

I would like to use multiprocessing in python with generator functions

Let's say I have a massive list of lists big_list, and I would like to use multiprocessing to compute values. If I use "traditional" functions which return values, this is straightforward:

import concurrent

def compute_function(list_of_lists):
    return_values = []   ## empty list
    for list in list_of_lists:
        new_value = compute_something(list)    ## compute something; just an example
        return_values.append(new_value)  ## append to list
    return return_values

with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
        new_list = list(executor.map(compute_function, big_list))

However, using lists in this manner is too memory intensive. So I would like to use generator functions instead:

import concurrent

def generator_function(list_of_lists):
    for list in list_of_lists:
        new_value = compute_something(list)    ## compute something; just an example
        yield new_value

with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
        new_list = list(executor.map(generator_function, big_list))

My problem is, you cannot pickle generators. There are some workarounds to this problem for other data structures, but not for generators I think.

How could I accomplish this?

like image 339
EB2127 Avatar asked Apr 17 '26 02:04

EB2127


1 Answers

You can do your enumeration one level deeper in big_list using itertools.chain.from_iterable to iterate the sublists.

import concurrent
import itertools

def compute_function(item):
    return compute_something(item)

with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
    for result in executor.map(compute_function,
            itertools.chain.from_iterable(big_list)):
        print(result)
like image 170
tdelaney Avatar answered Apr 19 '26 14:04

tdelaney