Here my DataFrame looks like this:
+----------------+-------------+
| Business_Date| Code|
+----------------+-------------+
|1539129600000000| BSD|
|1539129600000000| BTN|
|1539129600000000| BVI|
|1539129600000000| BWP|
|1539129600000000| BYB|
+----------------+-------------+
I wanted to convert the Business_Date column from bigint to timestamp value while loading data into hive table.
How can I do this?
You can use pyspark.sql.functions.from_unixtime() which will
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
It appears that your Business_Date needs to be divided by 1M to convert to seconds.
For example:
from pyspark.sql.functions import from_unixtime, col
df = df.withColumn(
"Business_Date",
from_unixtime(col("Business_Date")/1000000).cast("timestamp")
)
df.show()
#+---------------------+----+
#|Business_Date |Code|
#+---------------------+----+
#|2018-10-09 20:00:00.0|BSD |
#|2018-10-09 20:00:00.0|BTN |
#|2018-10-09 20:00:00.0|BVI |
#|2018-10-09 20:00:00.0|BWP |
#|2018-10-09 20:00:00.0|BYB |
#+---------------------+----+
from_unixtime returns a string so you can cast the result to a timestamp.
Now the new schema:
df.printSchema()
#root
# |-- Business_Date: timestamp (nullable = true)
# |-- Code: string (nullable = true)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With