Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: using map and multiprocessing

I'm trying to write a function that can take two arguments and then add it to multiprocessing.Pool and parallelize it. I had some complications when I tried to write this simple function.

df = pd.DataFrame()
df['ind'] = [111, 222, 333, 444, 555, 666, 777, 888]
df['ind1'] = [111, 444, 222, 555, 777, 333, 666, 777]

def mult(elem1, elem2):
    return elem1 * elem2

if __name__ == '__main__':
    pool = Pool(processes=4) 
    print(pool.map(mult, df.ind.astype(int).values.tolist(), df.ind1.astype(int).values.tolist()))
    pool.terminate()

It's returning an error:

TypeError: unsupported operand type(s) for //: 'int' and 'list'

I can't understand what's wrong. Can anybody explain what this error means and how I can fix it?

like image 535
Petr Petrov Avatar asked Jan 30 '17 17:01

Petr Petrov


People also ask

How do you pass multiple arguments in multiprocessing Python?

It uses the Pool. starmap method, which accepts a sequence of argument tuples. It then automatically unpacks the arguments from each tuple and passes them to the given function: import multiprocessing from itertools import product def merge_names(a, b): return '{} & {}'.

Can we do multiprocessing in Python?

Multiprocessing in Python is a built-in package that allows the system to run multiple processes simultaneously. It will enable the breaking of applications into smaller threads that can run independently.

What is starmap Python?

starmap() function The starmap() considers each element of the iterable within another iterable as a separate item. It is similar to map().


1 Answers

The multi-process Pool module takes in a list of the arguments that you want to multi-process, and only supports taking in one argument. You can fix this by doing the following:

from multiprocessing import Pool
import pandas as pd

df = pd.DataFrame()
df['ind'] = [111, 222, 333, 444, 555, 666, 777, 888]
df['ind1'] = [111, 444, 222, 555, 777, 333, 666, 777]

def mult(elements):
    elem1,elem2 = elements
    return elem1 * elem2

if __name__ == '__main__':
    pool = Pool(processes=4)
    inputs = zip(df.ind.astype(int).values.tolist(), df.ind1.astype(int).values.tolist())
    print(pool.map(mult, inputs))
    pool.terminate()

What I've done here is zip your two iterables into a list with each element being the two arguments that you wanted to input. Now, I change the input of your function to unpack those arguments so that they can be processed.

like image 134
tmwilson26 Avatar answered Sep 21 '22 02:09

tmwilson26