Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get date from different year, month and day columns in spark (scala)

I have a DataFrame including data like:

+----+-----+---+-----+
|Year|Month|Day|...  |
+----+-----+---+-----+
|2012|    2| 20|     |
|2011|    7|  6|     |
|2015|    3| 15|     |

and I would like to add a column with date

like image 222
Mederr Avatar asked Nov 07 '17 06:11

Mederr


People also ask

How do you find the difference between two dates in spark?

Using PySpark SQL functions datediff() , months_between() you can calculate the difference between two dates in days, months, and year, let's see this by using a DataFrame example. You can also use these to calculate age.

How do I get current date in Scala spark?

current_date(): This Date function returns the current date. Date_format() function is a Date function that returns the date into a specified format. The Spark SQL functions package is imported into the environment to run date functions. Seq() function takes the date 2021-02-14 as Input.

How do you change date from one format to another in PySpark?

In PySpark use date_format() function to convert the DataFrame column from Date to String format.

How do I cast a date on spark?

Spark to_date() – Convert String to Date format to_date() – function is used to format string ( StringType ) to date ( DateType ) column. Below code, snippet takes the date in a string and converts it to date format on DataFrame.


1 Answers

Not so complex as Shaido, just

df.withColumn("date", F.to_date(F.concat_ws("-", "Year", "Month", "Day")) ).show()

Work on spark 2.4 .

like image 94
Mithril Avatar answered Nov 08 '22 00:11

Mithril