Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass additional parameters to user-defined methods in pyspark for filter method?

I am using spark with python and I have a filter constraint as follows:

my_rdd.filter(my_func)

where my_func is a method I wrote to filter the rdd items based on my own logic. I have defined the my_func as follows:

def my_func(my_item):

{
...
}

Now, I want to pass another separate parameter to my_func, besides the item that goes into it. How can I do that? I know my_item will refer to one item that comes from my_rdd and how can I pass my own parameter (let's say my_param) as an additional parameter to my_func?

like image 619
London guy Avatar asked Dec 04 '15 11:12

London guy


1 Answers

Using below lambda syntax and modify your my_func with extra parameters:

my_rdd.filter(lambda row: my_func(row,extra_parameter))
like image 169
Shawn Guo Avatar answered Nov 17 '22 15:11

Shawn Guo