I'm trying to setup standalone Spark on Windows 10. I would like to set spark.local.dir
to D:\spark-tmp\tmp
, as currently it appears to be using C:\Users\<me>\AppData\Local\Temp
, which in my case is on an SSD drive which might not have enough space given the size of some datasets.
So I changed the file %SPARK_HOME%\conf\spark-defaults.conf
to the following, without success
spark.eventLog.enabled true
spark.eventLog.dir file:/D:/spark-tmp/log
spark.local.dir file:/D:/spark-tmp/tmp
I also tried to run %HADOOP_HOME\bin\winutils.exe chmod -R 777 D:/spark-tmp
, but it didn't change anything.
The error that I get is the following:
java.io.IOException: Failed to create a temp directory (under file:/D:/spark-tmp/tmp) after 10 attempts!
If I start the path with file://D:/...
(note the double slash) nothing changes. If I remove the scheme at all, a different exception says that the scheme D:
is not recognized.
I also noticed this warning:
WARN SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
So I tried to put the following line in %SPARK_HOME%\conf\spark-env.sh
:
SPARK_LOCAL_DIRS=file:/D:/spark-tmp/tmp
If I put this line and comment the spark.local.dir
line in the .conf
file, Spark works perfectly, but the temporary files are still saved in my AppData\Local\Temp
folder. So the SPARK_LOCAL_DIRS
line is not read.
What's strange is that, if I let it run, it actually puts logs in D:/spark-tmp/log
, which means that it's not a problem of syntax or permissions.
On windows you will have to make those environment variables
Add the key value pair
SPARK_LOCAL_DIRS -> d:\spark-tmp\tmp
to your systems environment variables
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With