Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter pyspark dataframe with last 14 days?

I am having a date column in my dataframe

enter image description here

I wanted to filter out the last 14 days from the dataframe using the date column. I tried the below code but it's not working

  last_14 = df.filter((df('Date')> date_add(current_timestamp(), -14)).select("Event_Time","User_ID","Impressions","Clicks","URL", "Date")

Event_time, user_id, impressions, clicks, URL is my other columns

Can anyone advise how to do this?

like image 231
sthambi Avatar asked Sep 14 '25 12:09

sthambi


1 Answers

from pyspark.sql import functions as F, types as T

df  = spark.createDataFrame(
      [
    ('2022-03-10',),
    ('2022-03-09',),
    ('2022-03-08',),
    ('2022-02-02',),
    ('2022-02-01',)
      ], ['Date']
    ).withColumn('Date', F.to_date('Date', 'y-M-d'))

df\
    .filter((F.col('Date') > F.date_sub(F.current_date(), 14)))\
    .show()
+----------+
|      Date|
+----------+
|2022-03-10|
|2022-03-09|
|2022-03-08|
+----------+

In your code it would be:

 last_14 = df.filter((F.col('Date') > F.date_sub(F.current_date(), 14))).select("Event_Time","User_ID","Impressions","Clicks","URL", "Date")
like image 61
Luiz Viola Avatar answered Sep 17 '25 19:09

Luiz Viola



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!