Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Integer Column to Date in PySpark

I have an Integer column called birth_date in this format: 20141130

I want to convert that to 2014-11-30 in PySpark.

This converts the date incorrectly:

.withColumn("birth_date", F.to_date(F.from_unixtime(F.col("birth_date"))))

This gives an error: argument 1 requires (string or date or timestamp) type, however, 'birth_date' is of int type

.withColumn('birth_date', F.to_date(F.unix_timestamp(F.col('birth_date'), 'yyyyMMdd').cast('timestamp')))

What is the best way to convert it to the date I want?

like image 942
Dimpu Avatar asked Oct 30 '25 04:10

Dimpu


1 Answers

Convert the birth_date column from Integer to String before you pass it to the to_date function:

from pyspark.sql import functions as F

df.withColumn("birth_date", F.to_date(F.col("birth_date").cast("string"), \
    'yyyyMMdd')).show()

+----------+
|birth_date|
+----------+
|2014-11-30|
+----------+
like image 107
Cena Avatar answered Nov 01 '25 13:11

Cena



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!