Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Standalone - Tmp Folder

I'm working with Jupyter Notebook with Pyspark kernel on a node of a cluster, the problem is that my /tmp folder is always full. I already updated the parameters:

SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=172800"

The problem is that the folder has just 200GB, is there a way to say to spark clean when I shutdown the kernel in Jupyter? Or should I just set Dspark.worker.cleanup.appDataTtl to 30 min, so that every 30 min all the temp files/logs are deleted?

like image 497
Antonio Lisi Avatar asked May 31 '26 00:05

Antonio Lisi


1 Answers

You might try changing the spark.local.dir parameter to a different location having more space.

See: https://spark.apache.org/docs/latest/configuration.html

like image 141
Aydin K. Avatar answered Jun 01 '26 20:06

Aydin K.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!