I have the table which is updated every day. I use this table for analysis. I want to have a static window of 6 months data as input for analysis.
I know I can make a filter like this in SQL to have 6 months data every time I run the code.
date >= dateadd(mm, -6, getdate())
Can somebody suggest how I can carry on the same action in PySpark. I can only think of this:
df.filter(col("date") >= date_add(current_date(), -6)))
Thanks in advance!
date_add
will add or subtract a number of days, in this case use add_months
instead:
import pyspark.sql.functions as F
df.filter(F.col("date") >= F.add_months(F.current_date(), -6)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With