Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert unix timestamp to date in Spark

I have a data frame with a column of unix timestamp(eg.1435655706000), and I want to convert it to data with format 'yyyy-MM-DD', I've tried nscala-time but it doesn't work.

val time_col = sqlc.sql("select ts from mr").map(_(0).toString.toDateTime) time_col.collect().foreach(println) 

and I got error: java.lang.IllegalArgumentException: Invalid format: "1435655706000" is malformed at "6000"

like image 525
youngchampion Avatar asked Jun 30 '15 09:06

youngchampion


People also ask

How do I convert Unix epoch to timestamp in PySpark?

In PySpark SQL, unix_timestamp() is used to get the current time and to convert the time string in a format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds) and from_unixtime() is used to convert the number of seconds from Unix epoch ( 1970-01-01 00:00:00 UTC ) to a string representation of the timestamp.

How do I convert a String to a date in spark?

PySpark to_date() – Convert String to Date Format to_date() – function is used to format string ( StringType ) to date ( DateType ) column. This function takes the first argument as a date string and the second argument takes the pattern the date is in the first argument.

How do you convert long to date in PySpark?

The to_date() function in Apache PySpark is popularly used to convert Timestamp to the date. This is mostly achieved by truncating the Timestamp column's time part. The to_date() function takes TimeStamp as it's input in the default format of "MM-dd-yyyy HH:mm:ss.

How do I change the date format in spark?

Spark Dataframe API also provides date function to_date() which parses Date from String object and converts to Spark DateType format. when dates are in 'yyyy-MM-dd' format, spark function auto-cast to DateType by casting rules. When dates are not in specified format this function returns null.


2 Answers

Here it is using Scala DataFrame functions: from_unixtime and to_date

// NOTE: divide by 1000 required if milliseconds // e.g. 1446846655609 -> 2015-11-06 21:50:55 -> 2015-11-06  mr.select(to_date(from_unixtime($"ts" / 1000)))  
like image 89
Marsellus Wallace Avatar answered Oct 29 '22 05:10

Marsellus Wallace


Since spark1.5 , there is a builtin UDF for doing that.

val df = sqlContext.sql("select from_unixtime(ts,'YYYY-MM-dd') as `ts` from mr") 

Please check Spark 1.5.2 API Doc for more info.

like image 21
Yuan Zhao Avatar answered Oct 29 '22 04:10

Yuan Zhao