I have a DataFrame including data like:
+----+-----+---+-----+
|Year|Month|Day|... |
+----+-----+---+-----+
|2012| 2| 20| |
|2011| 7| 6| |
|2015| 3| 15| |
and I would like to add a column with date
Using PySpark SQL functions datediff() , months_between() you can calculate the difference between two dates in days, months, and year, let's see this by using a DataFrame example. You can also use these to calculate age.
current_date(): This Date function returns the current date. Date_format() function is a Date function that returns the date into a specified format. The Spark SQL functions package is imported into the environment to run date functions. Seq() function takes the date 2021-02-14 as Input.
In PySpark use date_format() function to convert the DataFrame column from Date to String format.
Spark to_date() – Convert String to Date format to_date() – function is used to format string ( StringType ) to date ( DateType ) column. Below code, snippet takes the date in a string and converts it to date format on DataFrame.
Not so complex as Shaido, just
df.withColumn("date", F.to_date(F.concat_ws("-", "Year", "Month", "Day")) ).show()
Work on spark 2.4 .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With