Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Who executes the python codes in pyspark

I am new to spark and have a small doubt in spark. If I write some pyspark code which has some python code as shown below

from datetime import datetime
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print("Current Time =", current_time)
df = spark.read.format("csv").option("delimiter", ",").load('countries.csv')
df = df.withColumn('C_DT',lit(current_time))
print("new column added")

here does the executor run the datetime.now or each executor run the command. who runs the print commands, executor or the driver.

like image 364
Snehasish Das Avatar asked Nov 06 '22 14:11

Snehasish Das


1 Answers

Both print commands and datetime.now() are executed in Spark driver. The current_time will be passed to executors on next action command to actually add it to DataFrame. At the time of print("new column added") only df's schema has changed, and there was no actual work done.

like image 93
Vapira Avatar answered Nov 15 '22 11:11

Vapira