I want to add a column with a randomly generated id to my Spark dataframe. To do that, I'm using a UDF to call UUID's random UUID method, like so:
def getRandomId(s:String) : String = {
UUID.randomUUID().toString()
}
val idUdf = udf(getRandomId(_:String))
val newDf = myDf.withColumn("id", idUdf($"colName"))
Obviously, my getRandomId function does not need an input parameter; however, I can't figure out how to create a UDF that does not take in a column as input. Is that possible in Spark?
I am using Spark 1.5
you can register udf with no params. Here () => String
will solve the requirement
import org.apache.spark.sql.functions.udf
val uuid = udf(() => java.util.UUID.randomUUID().toString)
using the UDF(uuid
) on DataFrame
val newDf = myDf.withColumn("uuid", uuid())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With