I'm looking to extract the year, month, day and hours from the date string after converting it to my current timezone. I have created the following standalone code which is resulting in a null. Not sure how to handle T and Z delimiters in the time format coming in my data.
from pyspark.sql.functions import unix_timestamp, from_unixtime
df = spark.createDataFrame(
[("2020-02-28T09:49Z",)],
['date_str']
)
df2 = df.select(
'date_str',
from_unixtime(unix_timestamp('date_str', 'yyyy-MM-ddThh:mmZ')).alias('date')
)
df2.show()
Result from the above -
+-----------------+----+
| date_str|date|
+-----------------+----+
|2020-02-28T09:49Z|null|
+-----------------+----+
Can someone guide me on how to handle this and print the date here?
We can use either to_timestamp, from_unixtime(unix_timestamp())
functions for this case.
"yyyy-MM-dd'T'hh:mm'Z'"
enclosing T
,Z
in single quotes!Example:
df.select('date_str', to_timestamp('date_str',"yyyy-MM-dd'T'hh:mm'Z'").alias('date')).show()
df.select('date_str', from_unixtime(unix_timestamp('date_str', "yyyy-MM-dd'T'hh:mm'Z'")).alias('date')).show()
#+-----------------+-------------------+
#| date_str| date|
#+-----------------+-------------------+
#|2020-02-28T09:49Z|2020-02-28 09:49:00|
#+-----------------+-------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With