Scala and Spark UDF function

Tags:

I made a simple UDF to convert or extract some values from a time field in a temptabl in spark. I register the function but when I call the function using sql it throws a NullPointerException. Below is my function and process of executing it. I am using Zeppelin. Strangly this was working yesterday but it stopped working this morning.

Function

def convert( time:String ) : String = {
  val sdf = new java.text.SimpleDateFormat("HH:mm")
  val time1 = sdf.parse(time)
  return sdf.format(time1)
}

sqlContext.udf.register("convert",convert _)

Test the function without SQL -- This works

convert(12:12:12) -> returns 12:12

Test the function with SQL in Zeppelin this FAILS.

%sql
select convert(time) from temptable limit 10

Structure of temptable

root
 |-- date: string (nullable = true)
 |-- time: string (nullable = true)
 |-- serverip: string (nullable = true)
 |-- request: string (nullable = true)
 |-- resource: string (nullable = true)
 |-- protocol: integer (nullable = true)
 |-- sourceip: string (nullable = true)

Part of the stacktrace that I am getting.

java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:643)
    at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:652)
    at org.apache.spark.sql.hive.HiveFunctionRegistry.lookupFunction(hiveUdfs.scala:54)
    at org.apache.spark.sql.hive.HiveContext$$anon$3.org$apache$spark$sql$catalyst$analysis$OverrideFunctionRegistry$$super$lookupFunction(HiveContext.scala:376)
    at org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$$anonfun$lookupFunction$2.apply(FunctionRegistry.scala:44)
    at org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$$anonfun$lookupFunction$2.apply(FunctionRegistry.scala:44)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.sql.catalyst.analysis.OverrideFunctionRegistry$class.lookupFunction(FunctionRegistry.scala:44)

730

asked Jul 28 '16 10:07

fanbondi

2 Answers

Use udf instead of define a function directly

import org.apache.spark.sql.functions._

val convert = udf[String, String](time => {
        val sdf = new java.text.SimpleDateFormat("HH:mm")
        val time1 = sdf.parse(time)
        sdf.format(time1)
    }
)

A udf's input parameter is Column(or Columns). And the return type is Column.

case class UserDefinedFunction protected[sql] (
    f: AnyRef,
    dataType: DataType,
    inputTypes: Option[Seq[DataType]]) {

  def apply(exprs: Column*): Column = {
    Column(ScalaUDF(f, dataType, exprs.map(_.expr), inputTypes.getOrElse(Nil)))
  }
}

140

answered Oct 26 '22 03:10

Rockie Yang

You have to define your function as a UDF.

import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf

val convertUDF: UserDefinedFunction = udf((time:String) => {
  val sdf = new java.text.SimpleDateFormat("HH:mm")
  val time1 = sdf.parse(time)
  sdf.format(time1)
})

Next you would apply your UDF on your DataFrame.

// assuming your DataFrame is already defined
dataFrame.withColumn("time", convertUDF(col("time"))) // using the same name replaces existing

Now, as to your actual problem, one reason you are receiving this error could be because your DataFrame contains rows which are nulls. If you filter them out before you apply the UDF, you should be able to continue no problem.

dataFrame.filter(col("time").isNotNull)

I'm curious what else causes a NullPointerException when running a UDF other than it encountering a null, if you found a reason different than my suggestion, I'd be glad to know.

answered Oct 26 '22 03:10

kfkhalili

Related questions
                            
                                Iterate over a list, returning the current, next and the element before current
                            
                                Does extending a class in scala with constructor params add vals (fields) to the class?
                            
                                Is it possible to pass "this" as implicit parameter in Scala?
                            
                                Defining a Map from String to Function in Scala
                            
                                Scala Object inside Trait
                            
                                Trying to find Scala tutorials that focus multi-threading
                            
                                Scala Returning a void function with 0 parameters, ugly syntax?
                            
                                Converting a Scala Iterator to a Vector
                            
                                How can I create curried anonymous function in scala?
                            
                                How can I reuse definition (AST) subtrees in a macro?
                            
                                Why does leaving the dot out in foldLeft cause a compilation error?
                            
                                maven-scala-plugin giving pom file error
                            
                                Scala conditional list construction
                            
                                Cannot iterate over Enumeration
                            
                                What is the added value of the kestrel functional programming Design Pattern? (Scala)
                            
                                How to show custom failure message in Specs2 (Scala)?
                            
                                How to create a Row from a List or Array in Spark using Scala
                            
                                How to access broadcasted DataFrame in Spark
                            
                                What is the difference between onComplete and foreach for a future in Scala?
                            
                                Case Class Instantiation From Typesafe Config

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scala and Spark UDF function

Tags:

scala

apache-spark

apache-spark-sql

apache-zeppelin

fanbondi

People also ask

2 Answers

Rockie Yang

kfkhalili

Recent Activity

Donate For Us