How to get today - “6 months” date in PySpark(SQL) [duplicate]

Question

I have the table which is updated every day. I use this table for analysis. I want to have a static window of 6 months data as input for analysis.

I know I can make a filter like this in SQL to have 6 months data every time I run the code.

date >= dateadd(mm, -6, getdate())

Can somebody suggest how I can carry on the same action in PySpark. I can only think of this:

df.filter(col("date") >= date_add(current_date(), -6)))

Thanks in advance!

Shaido · Accepted Answer

date_add will add or subtract a number of days, in this case use add_months instead:

import pyspark.sql.functions as F

df.filter(F.col("date") >= F.add_months(F.current_date(), -6)))

Donate For Us