Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to register UDF to use in SQL and DataFrame?

From what I have seen, in order to do this you have to

  1. make the udf as a plain function
  2. register the function with SQLContext for SQL

    spark.sqlContext.udf.register("myUDF", myFunc)
    
  3. turn this into a UserDefinedFunction for DataFrame

    def myUDF = udf(myFunc)
    

Is there no way to combine this into one step and make the udf available for both? Also, for cases where a function exists for DataFrame but not for SQL, how do you go about registering it without copying over the code again?

like image 453
totoromeow Avatar asked Apr 19 '17 00:04

totoromeow


1 Answers

UDFRegistration.register variants, which take a scala.FunctionN, return an UserDefinedFunction so you can register SQL function and create DSL friendly UDF in a single step:

val timesTwoUDF = spark.udf.register("timesTwo", (x: Int) => x * 2)
spark.sql("SELECT timesTwo(1)").show
+---------------+
|UDF:timesTwo(1)|
+---------------+
|              2|
+---------------+
spark.range(1, 2).toDF("x").select(timesTwoUDF($"x")).show
+------+
|UDF(x)|
+------+
|     2|
+------+
like image 137
zero323 Avatar answered Oct 22 '22 09:10

zero323