Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark foreach with arguments

Reading pyspark documentation I know that a foreach is done as:

def f(x): print(x)
sc.parallelize([1, 2, 3, 4, 5]).foreach(f)

But, what if I use a function with several arguments?

An example:

def f(x,arg1,arg2,arg3): 
    print(x*arg1+arg2+arg3)

The point is to use something similar this syntax:

sc.parallelize([1, 2, 3, 4, 5]).foreach(f(arg1=11,arg2=21,arg3=31))
like image 492
PeCaDe Avatar asked Oct 20 '25 22:10

PeCaDe


1 Answers

You can make a partial function:

from functools import partial

sc.parallelize([1, 2, 3, 4, 5]).foreach(
    partial(f, arg1=11, arg2=21, arg3=31)
 )

partial takes as input a function and a sequence of unnamed (*args) and named (**kwargs) parameters, and produces a new function that if you call that function will call the original function f, with the unnamed and named parameters already filled in.

like image 170
Willem Van Onsem Avatar answered Oct 23 '25 11:10

Willem Van Onsem



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!