I am trying to setup Apache-Spark on a small standalone cluster (1 Master Node and 8 Slave Nodes). I have installed the "pre-built" version of spark 1.1.0 built on top of Hadoop 2.4. I have set up the passwordless ssh between nodes and exported a few necessary environment variables. One of these variables (which is probably most relevant) is:
export SPARK_LOCAL_DIRS=/scratch/spark/
I have a small piece of python code which I know works with Spark. I can run it locally--on my desktop, not the cluster--with:
$SPARK_HOME/bin/spark-submit ~/My_code.py
I copied the code to the cluster. Then, I start all the processes from the head node:
$SPARK_HOME/sbin/start-all
And each of the slaves is listed as running as process xxxxx.
If I then attempt to run my code with the same command above:
$SPARK_HOME/bin/spark-submit ~/MY_code.py
I get the following error:
14/10/27 14:19:02 ERROR util.Utils: Failed to create local root dir in /scratch/spark/. Ignoring this directory.
14/10/27 14:19:02 ERROR storage.DiskBlockManager: Failed to create any local dir.
I have the permissions set on the /scratch
and /scratch/spark
at 777. Any help is greatly appreciated.
The problem was that I didn't realize the master node also needed a scratch directory. In each of my 8 worker nodes I created the local /scratch/spark directory, but neglected to do so on the master node. Adding the directory fixed the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With