I have a dataframe which has a date in the following format:
+----------------------+
|date |
+----------------------+
|May 6, 2016 5:59:34 AM|
+----------------------+
I intend to extract the date from this in the format YYYY-MM-DD
; so the result should be for the above date - 2016-05-06.
But when I extract is using the following:
df.withColumn('part_date', from_unixtime(unix_timestamp(df.date, "MMM dd, YYYY hh:mm:ss aa"), "yyyy-MM-dd"))
I get the following date
2015-12-27
Can anyone please advise on this? I do not intend to convert my df to rdd to use datetime function from python and want to use this in the dataframe it self.
There are some errors with your pattern. Here's a suggestion:
from_pattern = 'MMM d, yyyy h:mm:ss aa'
to_pattern = 'yyyy-MM-dd'
df.withColumn('part_date', from_unixtime(unix_timestamp(df['date'], from_pattern), to_pattern)).show()
+----------------------+----------+
|date |part_date |
+----------------------+----------+
|May 6, 2016 5:59:34 AM|2016-05-06|
+----------------------+----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With