Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert date from integer to date format

I have a column 'start_date' which is an integer 37823. This happened when I used xlrd library to convert xlsx to csv. Hence '2003/07/21' got converted to 37823.

I have gone through xlrd documentation and I understand there are several ways to convert it to date. However, I need to convert this to date format using PySpark in AWS Glue ETL jobs. Any suggestions?

I tried using to_date, date_format functions, but nothing worked.

like image 666
Raj Avatar asked Mar 05 '23 15:03

Raj


2 Answers

I was able to finally resolve this issue.

Using UDF:

def convert_date(x):
    mDt = datetime.datetime(1899, 12, 30)
    dlt = mDt + datetime.timedelta(days=x)
    return dlt.strftime("%Y-%m-%d")

convert_date_udf = udf(lambda z: convert_date(z), StringType())
df = df.withColumn('hire date', convert_date_udf('hire date').alias('hire date new'))

Without using UDF:

df = df.withColumn('hire date', F.expr("date_add(to_date('1899-12-30'), cast(`hire date` as int))").cast(StringType())

Hope it helps!

like image 185
Raj Avatar answered Mar 11 '23 15:03

Raj


I suppose that there are more elegant ways to done this but this is what I have come up with for now.

from datetime import date
df.hire_date = df.hire_date.apply(date.fromordinal) # this will give you date in dash format
df.hire_date = df.hire_date.apply(lambda x: str(x).replace('-', '/')) # this will simply replace dash with slash

Hope that this works for you :)

like image 29
Dawid_Sielski Avatar answered Mar 11 '23 17:03

Dawid_Sielski