Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert string to timestamp for Spark using Scala

I have a dataframe called train, he has the following schema :

root
|-- date_time: string (nullable = true)
|-- site_name: integer (nullable = true)
|-- posa_continent: integer (nullable = true)

I want to cast the date_timecolumn to timestampand create a new column with the year value extracted from the date_timecolumn.

To be clear, I have the following dataframe :

+-------------------+---------+--------------+
|          date_time|site_name|posa_continent|
+-------------------+---------+--------------+
|2014-08-11 07:46:59|        2|             3|
|2014-08-11 08:22:12|        2|             3|
|2015-08-11 08:24:33|        2|             3|
|2016-08-09 18:05:16|        2|             3|
|2011-08-09 18:08:18|        2|             3|
|2009-08-09 18:13:12|        2|             3|
|2014-07-16 09:42:23|        2|             3|
+-------------------+---------+--------------+

I want to get the following dataframe :

+-------------------+---------+--------------+--------+
|          date_time|site_name|posa_continent|year    |
+-------------------+---------+--------------+--------+
|2014-08-11 07:46:59|        2|             3|2014    |
|2014-08-11 08:22:12|        2|             3|2014    |
|2015-08-11 08:24:33|        2|             3|2015    |
|2016-08-09 18:05:16|        2|             3|2016    |
|2011-08-09 18:08:18|        2|             3|2011    |
|2009-08-09 18:13:12|        2|             3|2009    |
|2014-07-16 09:42:23|        2|             3|2014    |
+-------------------+---------+--------------+--------+
like image 303
Aissa El Ouafi Avatar asked May 20 '16 14:05

Aissa El Ouafi


People also ask

How do I convert a String to a date in spark Scala?

Spark to_date() – Convert String to Date format to_date() – function is used to format string ( StringType ) to date ( DateType ) column. Below code, snippet takes the date in a string and converts it to date format on DataFrame.

How do I convert a String to timestamp in PySpark?

Use <em>to_timestamp</em>() function to convert String to Timestamp (TimestampType) in PySpark. The converted time would be in a default format of MM-dd-yyyy HH:mm:ss.

How do I print the timestamp on spark?

current_timestamp(): This function returns the current timestamp in the apache spark. The default format produced is in yyyy-MM-dd HH:mm:ss.


1 Answers

Well, if you want to cast the date_timecolumn to timestampand create a new column with the year value then do exactly that:

import org.apache.spark.sql.functions.year

df
  .withColumn("date_time", $"date_time".cast("timestamp"))  // cast to timestamp
  .withColumn("year", year($"date_time"))  // add year column
like image 172
zero323 Avatar answered Oct 30 '22 07:10

zero323