How can I create this spark dataframe with timestamp data type in one step? Here is how I am doing it in two steps. Using spark 2.4
First create dataframe with timestamp strings
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions.to_timestamp
val eventData = Seq(
Row(1, "2014/01/01 23:00:01"),
Row(1, "2014/11/30 12:40:32"),
Row(2, "2016/12/29 09:54:00"),
Row(2, "2016/05/09 10:12:43")
)
val schema = StructType(List(
StructField("typeId", IntegerType, false),
StructField("eventTimeString", StringType, false)
))
val eventDF = spark.createDataFrame(
sc.parallelize(eventData),
schema
)
eventDF.show()
+------+-------------------+
|typeId| eventTimeString|
+------+-------------------+
| 1|2014/01/01 23:00:01|
| 1|2014/11/30 12:40:32|
| 2|2016/12/29 09:54:00|
| 2|2016/05/09 10:12:43|
+------+-------------------+
Then convert string to timestamp and drop string column
val eventTimestampsDF = eventDF
.withColumn("eventTime", to_timestamp($"eventTimeString", "yyyy/MM/dd k:mm:ss"))
.drop($"eventTimeString")
How can I eliminate the second step and create timestamps directly?
You can to this like this:
import java.sql.Timestamp
import spark.implicits._
val df = Seq(
(1, Timestamp.valueOf("2014-01-01 23:00:01")),
(1, Timestamp.valueOf("2014-11-30 12:40:32")),
(2, Timestamp.valueOf("2016-12-29 09:54:00")),
(2, Timestamp.valueOf("2016-05-09 10:12:43"))
).toDF("typeId","eventTime")
No need to use Row
objects and custom schema
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With