Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Scala Register UDF - Why I need to pass underscore (_) at the end of function

I have created an UDF in Scala and when I was trying to register this UDF with just function name it was showing me error.

Not Working

def IPConvertUDF = spark.udf.register("IPConvertUDF", IPConvert)

Error

error: missing argument list for method IPConvert
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `IPConvert _` or `IPConvert(_)` instead of `IPConvert`.
def IPConvertUDF = spark.udf.register("IPConvertUDF", IPConvert)

And so I added extra _ after method name and it worked.

Works perfectly

def IPConvertUDF = spark.udf.register("IPConvertUDF", IPConvert _)

Would someone be able to explain be what is the meaning of extra _ after the method name?

like image 625
Gaurang Shah Avatar asked Jan 27 '23 02:01

Gaurang Shah


1 Answers

Well the short answer is, you are trying to pass a method where a function is expected as an argument. Methods are not functions. Let's dig a bit deeper.

Lets try with simple add function first

 val add:(Int,Int) => Int = (val1,val2) => val1+val2

 spark.udf.register("add",add)

The above code compiled without any error. The reason is add is a function.

Now lets try the same add as a method

def add(val1:Int,val2:Int): Int ={
     val1+val2
   }

 spark.udf.register("add",add)

Now you get an error saying error: missing argument list for method add. As I mentioned, register(..) is expecting function and methods cannot be passed as arguments.

_ is a shorthand for partially applied function.In other words , add method is converted into partially applied function and that's the reason the error has disappeared.

spark.udf.register("add",add _)
like image 125
Balaji Reddy Avatar answered Apr 12 '23 15:04

Balaji Reddy