I am trying to change the location spark writes temporary files to. Everything I've found online says to set this by setting the SPARK_LOCAL_DIRS
parameter in the spark-env.sh
file, but I am not having any luck with the changes actually taking effect.
Here is what I've done:
sparklyr
package as a front end. The worker nodes are spun up using an auto scaling group./tmp/jaytest
. There is one of these in each worker and one in the master.home/ubuntu/spark-2.2.0-bin-hadoop2.7/conf/spark-env.sh
, and modified the file to contain this line: SPARK_LOCAL_DIRS="/tmp/jaytest"
Permissions for each of the spark-env.sh
files are -rwxr-xr-x
, and for the jaytest folders are drwxrwxr-x
.
As far as I can tell this is in line with all the advice I've read online. However, when I load some data into the cluster it still ends up in /tmp
, rather than /tmp/jaytest
.
I have also tried setting the spark.local.dir
parameter to the same directory, but also no luck.
Can someone please advise on what I might be missing here?
Edit: I'm running this as a standalone cluster (as the answer below indicates that the correct parameter to set depends on the cluster type).
As per the spark documentation it is clearly saying that if you have configured Yarn Cluster manager then it will be overwrite the spark-env.sh setting. Can you just check in Yarn-env or yarn-site file for the local dir folder setting.
"this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager." source - https://spark.apache.org/docs/2.3.1/configuration.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With