Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle T and Z in the date format using pyspark functions [duplicate]

I'm looking to extract the year, month, day and hours from the date string after converting it to my current timezone. I have created the following standalone code which is resulting in a null. Not sure how to handle T and Z delimiters in the time format coming in my data.

from pyspark.sql.functions import unix_timestamp, from_unixtime

df = spark.createDataFrame(
    [("2020-02-28T09:49Z",)], 
    ['date_str']
)

df2 = df.select(
    'date_str', 
    from_unixtime(unix_timestamp('date_str', 'yyyy-MM-ddThh:mmZ')).alias('date')
)

df2.show()

Result from the above -

 +-----------------+----+
|         date_str|date|
+-----------------+----+
|2020-02-28T09:49Z|null|
+-----------------+----+

Can someone guide me on how to handle this and print the date here?

like image 825
Atom Avatar asked Sep 12 '25 22:09

Atom


1 Answers

We can use either to_timestamp, from_unixtime(unix_timestamp()) functions for this case.

  • Try with "yyyy-MM-dd'T'hh:mm'Z'" enclosing T,Z in single quotes!

Example:

df.select('date_str', to_timestamp('date_str',"yyyy-MM-dd'T'hh:mm'Z'").alias('date')).show()
df.select('date_str', from_unixtime(unix_timestamp('date_str', "yyyy-MM-dd'T'hh:mm'Z'")).alias('date')).show()

#+-----------------+-------------------+
#|         date_str|               date|
#+-----------------+-------------------+
#|2020-02-28T09:49Z|2020-02-28 09:49:00|
#+-----------------+-------------------+
like image 162
notNull Avatar answered Sep 14 '25 12:09

notNull