How can I pass extra parameters to UDFs in Spark SQL?

Tags:

I want to parse the date columns in a DataFrame, and for each date column, the resolution for the date may change (i.e. 2011/01/10 => 2011 /01 if the resolution is set to "Month").

I wrote the following code:

def convertDataFrame(dataframe: DataFrame, schema : Array[FieldDataType], resolution: Array[DateResolutionType]) : DataFrame = {   import org.apache.spark.sql.functions._   val convertDateFunc = udf{(x:String, resolution: DateResolutionType) => SparkDateTimeConverter.convertDate(x, resolution)}   val convertDateTimeFunc = udf{(x:String, resolution: DateResolutionType) => SparkDateTimeConverter.convertDateTime(x, resolution)}    val allColNames = dataframe.columns   val allCols = allColNames.map(name => dataframe.col(name))    val mappedCols =   {     for(i <- allCols.indices) yield     {       schema(i) match       {         case FieldDataType.Date => convertDateFunc(allCols(i), resolution(i)))         case FieldDataType.DateTime => convertDateTimeFunc(allCols(i), resolution(i))         case _ => allCols(i)       }     }   }    dataframe.select(mappedCols:_*)  }}

However it doesn't work. It seems that I can only pass Columns to UDFs. And I wonder if it will be very slow if I convert the DataFrame to RDD and apply the function on each row.

Does anyone know the correct solution? Thank you!

309

asked Feb 22 '16 05:02

DarkZero

2 Answers

Just use a little bit of currying:

def convertDateFunc(resolution: DateResolutionType) = udf((x:String) =>    SparkDateTimeConverter.convertDate(x, resolution))

and use it as follows:

case FieldDataType.Date => convertDateFunc(resolution(i))(allCols(i))

On a side note you should take a look at sql.functions.trunc and sql.functions.date_format. These should at least part of the job without using UDFs at all.

Note:

In Spark 2.2 or later you can use typedLit function:

import org.apache.spark.sql.functions.typedLit

which support a wider range of literals like Seq or Map.

111

answered Nov 01 '22 21:11

zero323

You can create a literal Column to pass to a udf using the lit(...) function defined in org.apache.spark.sql.functions

For example:

val takeRight = udf((s: String, i: Int) => s.takeRight(i)) df.select(takeRight($"stringCol", lit(1)))

answered Nov 01 '22 19:11

Michael Armbrust

Related questions
                            
                                Animate view height with Swift
                            
                                How to connect multiple recyclerview or fragment for animation like shown
                            
                                How to Fix this C# issue No test matches the given testcase filter `FullyQualifiedName =
                            
                                How to retrieve a webpage with C#?
                            
                                How can I rollback an UPDATE query in SQL server 2005?
                            
                                Filtering Sharepoint Lists on a "Now" or "Today"
                            
                                Visual Studio - automatically implement all inherited methods from an interface
                            
                                Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?
                            
                                Total height of the page
                            
                                Getting a modified preorder tree traversal model (nested set) into a <ul>
                            
                                To escape many _ in LaTeX efficiently
                            
                                C# how to change data in DataTable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With