Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark create UDF that doesn't take in input

I want to add a column with a randomly generated id to my Spark dataframe. To do that, I'm using a UDF to call UUID's random UUID method, like so:

def getRandomId(s:String) : String = {
    UUID.randomUUID().toString()
}

val idUdf = udf(getRandomId(_:String))
val newDf = myDf.withColumn("id", idUdf($"colName"))

Obviously, my getRandomId function does not need an input parameter; however, I can't figure out how to create a UDF that does not take in a column as input. Is that possible in Spark?

I am using Spark 1.5

like image 487
alexgbelov Avatar asked Jan 26 '17 06:01

alexgbelov


1 Answers

you can register udf with no params. Here () => String will solve the requirement

import org.apache.spark.sql.functions.udf
val uuid = udf(() => java.util.UUID.randomUUID().toString)

using the UDF(uuid) on DataFrame

val newDf = myDf.withColumn("uuid", uuid())
like image 104
mrsrinivas Avatar answered Oct 05 '22 13:10

mrsrinivas