Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get today - “6 months” date in PySpark(SQL) [duplicate]

I have the table which is updated every day. I use this table for analysis. I want to have a static window of 6 months data as input for analysis.

I know I can make a filter like this in SQL to have 6 months data every time I run the code.

date >= dateadd(mm, -6, getdate())

Can somebody suggest how I can carry on the same action in PySpark. I can only think of this:

df.filter(col("date") >= date_add(current_date(), -6)))

Thanks in advance!

like image 477
James Taylor Avatar asked Jul 11 '18 09:07

James Taylor


1 Answers

date_add will add or subtract a number of days, in this case use add_months instead:

import pyspark.sql.functions as F

df.filter(F.col("date") >= F.add_months(F.current_date(), -6)))
like image 53
Shaido Avatar answered Nov 05 '22 21:11

Shaido