Spark Scala: How to transform a column in a DF

Tags:

I have a dataframe in Spark with many columns and a udf that I defined. I want the same dataframe back, except with one column transformed. Furthermore, my udf takes in a string and returns a timestamp. Is there an easy way to do this? I tried

val test = myDF.select("my_column").rdd.map(r => getTimestamp(r))

but this returns an RDD and just with the transformed column.

329

asked May 04 '16 23:05

mt88

1 Answers

If you really need to use your function, I can suggest two options:

1) Using map / toDF:

import org.apache.spark.sql.Row import sqlContext.implicits._  def getTimestamp: (String => java.sql.Timestamp) = // your function here  val test = myDF.select("my_column").rdd.map {   case Row(string_val: String) => (string_val, getTimestamp(string_val)) }.toDF("my_column", "new_column")

2) Using UDFs (UserDefinedFunction):

import org.apache.spark.sql.functions._  def getTimestamp: (String => java.sql.Timestamp) = // your function here  val newCol = udf(getTimestamp).apply(col("my_column")) // creates the new column val test = myDF.withColumn("new_column", newCol) // adds the new column to original DF

There's more detail about Spark SQL UDFs in this nice article by Bill Chambers .

Alternatively,

If you just want to transform a StringType column into a TimestampType column you can use the unix_timestamp column function available since Spark SQL 1.5:

val test = myDF   .withColumn("new_column", unix_timestamp(col("my_column"), "yyyy-MM-dd HH:mm").cast("timestamp"))

Note: For spark 1.5.x, it is necessary to multiply the result of unix_timestamp by 1000 before casting to timestamp (issue SPARK-11724). The resulting code would be:

val test = myDF   .withColumn("new_column", (unix_timestamp(col("my_column"), "yyyy-MM-dd HH:mm") *1000L).cast("timestamp"))

Edit: Added udf option

200

answered Sep 28 '22 12:09

Daniel de Paula

Related questions
                            
                                Your experience with Scala+Wicket [closed]
                            
                                Why use @Singleton over Scala's object in Play Framework?
                            
                                Scala: circular references in immutable data types?
                            
                                Step by Step / Deep explain: The Power of (Co)Yoneda (preferably in scala) through Coroutines
                            
                                Recursive set union: how does it work really?
                            
                                How do I make the "Java Hot Spot MaxPermSize" warning go away when using IntelliJ or Play?
                            
                                Scala catching confusion
                            
                                Scala and Clojure both in one project
                            
                                How to extract best parameters from a CrossValidatorModel
                            
                                Idiomatic Scala Map upsert
                            
                                Scala slick query where in list
                            
                                Specialization of generic functions in Scala (or Java)
                            
                                Cyclomatic complexity of scala [closed]
                            
                                Testing Actors in Akka
                            
                                How to find the number of (key , value) pairs in a map in scala?
                            
                                Try with exception logging
                            
                                How to clear all variables in Scala REPL
                            
                                Migrating Java to Scala
                            
                                How to implement a REST Web Service using Akka?
                            
                                Constructing simple Scala case classes from Strings, strictly without boiler-plate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Scala: How to transform a column in a DF

Tags:

scala

apache-spark

mt88

People also ask

1 Answers

Daniel de Paula

Recent Activity

Donate For Us