I have some datas contained in an Array of String like below (just for exemple):
val myArray = Array("1499955986039", "1499955986051", "1499955986122")
I want to map my list to an array of Timestamp, in order to create an RDD (myRdd) then create a dataframe like this
val df = createdataframe(myRdd, StructType(StructField("myTymeStamp", TimestampType,true)
My question is not how to create the Rdd, but how to replace string by millisecond timestamp. Do you have any idea? Thanks
Use java.sql.Timestamp
:
val myArray = Array("1499955986039", "1499955986051", "1499955986122")
import java.sql.Timestamp
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, TimestampType}
val rdd = sc.parallelize(myArray).map(s => Row(new Timestamp(s.toLong)))
val schema = StructType(Array(StructField("myTymeStamp", TimestampType, true)))
spark.createDataFrame(rdd, schema)
// res25: org.apache.spark.sql.DataFrame = [myTymeStamp: timestamp]
You dont need to convert to timestamp before, You just convert to long and you can use schema to convert to tymestamp while creating dataframe as below
import org.apache.spark.sql.Row
val myArray = Array("1499955986039", "1499955986051", "1499955986122")
val myrdd = spark.sparkContext.parallelize(myArray.map(a => Row(a.toLong)))
val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", TimestampType,true))))
Otherwise you can just create a dataframe from String and cast to timestamp later as below
val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", StringType,true))))
//cast myTymeStamp from String to Long and to timestamp
df.withColumn("myTymeStamp", $"myTymeStamp".cast(LongType).cast(TimestampType))
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With