Global variables not recognized in lambda functions in Pyspark

Question

I am working in Pyspark with a lambda function like the following:

udf_func = UserDefinedFunction(lambda value: method1(value, dict_global), IntegerType())
result_col = udf_func(df[atr1])

The implementation of the method1 is the next one:

def method1(value, dict_global):
    result = len(dict_global)
    if (value in dict_global):
        result = dict_global[value]
    return result

'dict_global' is a global dictionary that contains some values.

The problem is that when I execute the lambda function the result is always None. For any reason the 'method1' function doesn't interpret the variable 'dict_global' as an external variable. Why? What could I do?

jartymcfly · Accepted Answer

Finally I found a solution. I write it below:

Lambda functions (as well as map and reduce functions) executed in SPARK schedule the executions among the different executors, and it works in different execution threads. So the problem in my code could be global variables sometimes are not caught by the functions executed in parallel in different threads, so I looked for a solution to try solve it.

Fortunately, in SPARK there is an element called "Broadcast" which allows to pass variables to the execution of a function organized among the executors to work with them without problems. There are 2 type of sharable variables: Broadcast (inmutable variables, only for read) and accumulators (mutable variables, but numeric values only accepted).

I rewrite my code to show you how did I fix the problem:

broadcastVar = sc.broadcast(dict_global)
udf_func = UserDefinedFunction(lambda value: method1(value, boradcastVar), IntegerType())
result_col = udf_func(df[atr1])

Hope it helps!

Global variables not recognized in lambda functions in Pyspark

Tags:

python

lambda

global

nonetype

pyspark

jartymcfly

1 Answers

jartymcfly

Recent Activity

Donate For Us

Global variables not recognized in lambda functions in Pyspark

Tags:

python

lambda

global

nonetype

pyspark

jartymcfly

1 Answers

jartymcfly

Related questions

Recent Activity

Donate For Us