Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add Extra column with current date in Spark dataframe

I am trying to add one column in my existing Pyspark Dataframe using withColumn method.I want to insert current date in this column.From my Source I don't have any date column so i am adding this current date column in my dataframe and saving this dataframe in my table so later for tracking purpose i can use this current date column. I am using below code

    df2=df.withColumn("Curr_date",datetime.now().strftime('%Y-%m-%d'))

here df is my existing Dataframe and i want to save df2 as table with Curr_date column. but here its expecting existing column or lit method instead of datetime.now().strftime('%Y-%m-%d'). someone please guide me how should i add this Date column in my dataframe.?

like image 893
Rahul Patidar Avatar asked Sep 14 '25 22:09

Rahul Patidar


1 Answers

use either lit or current_date

from pyspark.sql import functions as F

df2 = df.withColumn("Curr_date", F.lit(datetime.now().strftime("%Y-%m-%d")))

# OR

df2 = df.withColumn("Curr_date", F.current_date())
like image 150
Steven Avatar answered Sep 17 '25 18:09

Steven