Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use multiprocessing pool.map with multiple arguments

In the Python multiprocessing library, is there a variant of pool.map which supports multiple arguments?

import multiprocessing  text = "test"  def harvester(text, case):     X = case[0]     text + str(X)  if __name__ == '__main__':     pool = multiprocessing.Pool(processes=6)     case = RAW_DATASET     pool.map(harvester(text, case), case, 1)     pool.close()     pool.join() 
like image 824
user642897 Avatar asked Mar 26 '11 14:03

user642897


People also ask

How do you pass multiple arguments to a map in Python?

Passing Multiple Arguments to map() functionSuppose we pass n iterable to map(), then the given function should have n number of arguments. These iterable arguments must be applied on given function in parallel. In multiple iterable arguments, when shortest iterable is drained, the map iterator will stop.

How does pool map work Python?

The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.

When would you use a multiprocessing pool?

Understand multiprocessing in no more than 6 minutes Multiprocessing is quintessential when a long-running process has to be speeded up or multiple processes have to execute parallelly. Executing a process on a single core confines its capability, which could otherwise spread its tentacles across multiple cores.


2 Answers

is there a variant of pool.map which support multiple arguments?

Python 3.3 includes pool.starmap() method:

#!/usr/bin/env python3 from functools import partial from itertools import repeat from multiprocessing import Pool, freeze_support  def func(a, b):     return a + b  def main():     a_args = [1,2,3]     second_arg = 1     with Pool() as pool:         L = pool.starmap(func, [(1, 1), (2, 1), (3, 1)])         M = pool.starmap(func, zip(a_args, repeat(second_arg)))         N = pool.map(partial(func, b=second_arg), a_args)         assert L == M == N  if __name__=="__main__":     freeze_support()     main() 

For older versions:

#!/usr/bin/env python2 import itertools from multiprocessing import Pool, freeze_support  def func(a, b):     print a, b  def func_star(a_b):     """Convert `f([1,2])` to `f(1,2)` call."""     return func(*a_b)  def main():     pool = Pool()     a_args = [1,2,3]     second_arg = 1     pool.map(func_star, itertools.izip(a_args, itertools.repeat(second_arg)))  if __name__=="__main__":     freeze_support()     main() 

Output

1 1 2 1 3 1 

Notice how itertools.izip() and itertools.repeat() are used here.

Due to the bug mentioned by @unutbu you can't use functools.partial() or similar capabilities on Python 2.6, so the simple wrapper function func_star() should be defined explicitly. See also the workaround suggested by uptimebox.

like image 121
jfs Avatar answered Sep 21 '22 09:09

jfs


The answer to this is version- and situation-dependent. The most general answer for recent versions of Python (since 3.3) was first described below by J.F. Sebastian.1 It uses the Pool.starmap method, which accepts a sequence of argument tuples. It then automatically unpacks the arguments from each tuple and passes them to the given function:

import multiprocessing from itertools import product  def merge_names(a, b):     return '{} & {}'.format(a, b)  if __name__ == '__main__':     names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']     with multiprocessing.Pool(processes=3) as pool:         results = pool.starmap(merge_names, product(names, repeat=2))     print(results)  # Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ... 

For earlier versions of Python, you'll need to write a helper function to unpack the arguments explicitly. If you want to use with, you'll also need to write a wrapper to turn Pool into a context manager. (Thanks to muon for pointing this out.)

import multiprocessing from itertools import product from contextlib import contextmanager  def merge_names(a, b):     return '{} & {}'.format(a, b)  def merge_names_unpack(args):     return merge_names(*args)  @contextmanager def poolcontext(*args, **kwargs):     pool = multiprocessing.Pool(*args, **kwargs)     yield pool     pool.terminate()  if __name__ == '__main__':     names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']     with poolcontext(processes=3) as pool:         results = pool.map(merge_names_unpack, product(names, repeat=2))     print(results)  # Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ... 

In simpler cases, with a fixed second argument, you can also use partial, but only in Python 2.7+.

import multiprocessing from functools import partial from contextlib import contextmanager  @contextmanager def poolcontext(*args, **kwargs):     pool = multiprocessing.Pool(*args, **kwargs)     yield pool     pool.terminate()  def merge_names(a, b):     return '{} & {}'.format(a, b)  if __name__ == '__main__':     names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']     with poolcontext(processes=3) as pool:         results = pool.map(partial(merge_names, b='Sons'), names)     print(results)  # Output: ['Brown & Sons', 'Wilson & Sons', 'Bartlett & Sons', ... 

1. Much of this was inspired by his answer, which should probably have been accepted instead. But since this one is stuck at the top, it seemed best to improve it for future readers.

like image 44
senderle Avatar answered Sep 25 '22 09:09

senderle