Functions from Python packages for udf() of Spark dataframe

Question

For Spark dataframe via pyspark, we can use pyspark.sql.functions.udf to create a user defined function (UDF).

I wonder if I can use any function from Python packages in udf(), e.g., np.random.normal from numpy?

karlson · Accepted Answer

Assuming you want to add a column named new to your DataFrame df constructed by calling numpy.random.normal repeatedly, you could do:

import numpy
from pyspark.sql.functions import UserDefinedFunction
from pyspark.sql.types import DoubleType

udf = UserDefinedFunction(numpy.random.normal, DoubleType())

df_with_new_column = df.withColumn('new', udf())

Functions from Python packages for udf() of Spark dataframe

Tags:

python

apache-spark

pyspark

Jie Chen

1 Answers

karlson

Recent Activity

Donate For Us

Functions from Python packages for udf() of Spark dataframe

Tags:

python

apache-spark

pyspark

Jie Chen

1 Answers

karlson

Related questions

Recent Activity

Donate For Us