I'm using Spark with Databricks and have the following code:
def replaceBlanksWithNulls(column):
return when(col(column) != "", col(column)).otherwise(None)
Both of these next statements work:
x = rawSmallDf.withColumn("z", replaceBlanksWithNulls("z"))
and using a UDF:
replaceBlanksWithNulls_Udf = udf(replaceBlanksWithNulls)
y = rawSmallDf.withColumn("z", replaceBlanksWithNulls_Udf("z"))
It is unclear to me from the documentation when I should use one over the other and why?
An UDF
can essentially be any sort of function (there are exceptions, of course) - it is not necessary to use Spark structures such as when
, col
, etc. By using an UDF
the replaceBlanksWithNulls
function can be written as normal python code:
def replaceBlanksWithNulls(s):
return "" if s != "" else None
which can be used on a dataframe column after registering it:
replaceBlanksWithNulls_Udf = udf(replaceBlanksWithNulls)
y = rawSmallDf.withColumn("z", replaceBlanksWithNulls_Udf("z"))
Note: The default return type of an UDF
is strings. If another type is required that must be specified when registering it, e.g.
from pyspark.sql.types import LongType
squared_udf = udf(squared, LongType())
In this case, the column operation is not complex and there are Spark functions that can acheive the same thing (i.e. replaceBlanksWithNulls
as in the question:
x = rawSmallDf.withColumn("z", when(col("z") != "", col("z")).otherwise(None))
This is always prefered whenever possible since it allows Spark to optimize the query, see e.g. Spark functions vs UDF performance?
You can find the difference in the Spark SQL (as mentioned in the document). For example, you can find that if you write:
spark.sql("select replaceBlanksWithNulls(column_name) from dataframe")
does not work if you didn't register the function replaceBlanksWithNulls
as a udf
. In spark sql we need to know the returned type of the function for the exectuion. Hence, we need to register the custom function as a user-defined function (udf
) to be used in spark sql.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With