Background
I have been battling with Apache Spark and have worked out most errors except one. I have a Master and one Slave. I can start the master via
./sbin/start-master.sh
and then I can connect to it from the slave by
JAVA_OPTS="-Xmx10g" ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.17.16.43:7077
I then see the success message
14/08/25 08:47:04 INFO worker.Worker: Successfully registered with master spark://10.17.16.43:7077
All of these errors are repeatable (I have been at this for a while). I can telnet into the master from the slave just fine as is mentioned in most other tutorials. SSH is configured to not need passwords between master and slave (RSA keys) as mentioned elsewhere.
I have spark/conf/spark-env.sh set to the following. There are more lines that are commented out
export SPARK_DAEMON_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dspark.akka.logLifecycleEvents=true"
export SPARK_LOCAL_IP=`ifconfig | sed -En 's/127.0.0.1//;s/.*inet (addr:)?(([0-9]*\.){3}[0-9]*).*/\2/p' | head -1`
export SPARK_MASTER_IP=$SPARK_LOCAL_IP
export SPARK_MASTER_WEBUI_PORT=8090
export SPARK_WORKER_CORES=1
I pulled those from various tutorials in hope that they would fix something.
Here is my master /etc/hosts
127.0.0.1 localhost
10.17.16.43 aidan-workstation
10.17.16.49 ubuntu
And slave
127.0.0.1 localhost
10.17.16.49 ubuntu
10.17.16.43 aidan-workstation
The Error
When I run ./bin/spark-shell
I get the following in the master terminal ( just posted the tail end of it the full output is here )
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor added: app-20140825085822-0002/8 on worker-20140825084704-ubuntu-49237 (ubuntu:49237) with 8 cores
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140825085822-0002/8 on hostPort ubuntu:49237 with 8 cores, 512.0 MB RAM
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/8 is now RUNNING
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/8 is now FAILED (Command exited with code 1)
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140825085822-0002/8 removed: Command exited with code 1
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor added: app-20140825085822-0002/9 on worker-20140825084704-ubuntu-49237 (ubuntu:49237) with 8 cores
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140825085822-0002/9 on hostPort ubuntu:49237 with 8 cores, 512.0 MB RAM
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/9 is now RUNNING
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/9 is now FAILED (Command exited with code 1)
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140825085822-0002/9 removed: Command exited with code 1
14/08/25 08:58:25 ERROR client.AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/08/25 08:58:25 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...
And at the same time the slave outputs (tail as well full output is here as well)
14/08/25 09:04:18 INFO worker.ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" ":/home/hduser/spark/conf:/home/hduser/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.2-hadoop2.2.0.jar:/home/hduser/hadoop/etc/hadoop" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler" "7" "ubuntu" "8" "akka.tcp://sparkWorker@ubuntu:55553/user/Worker" "app-20140825090434-0003"
14/08/25 09:04:18 INFO worker.Worker: Executor app-20140825090434-0003/7 finished with state FAILED message Command exited with code 1 exitStatus 1
14/08/25 09:04:18 INFO worker.Worker: Asked to launch executor app-20140825090434-0003/8 for Spark shell
14/08/25 09:04:18 INFO worker.ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" ":/home/hduser/spark/conf:/home/hduser/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.2-hadoop2.2.0.jar:/home/hduser/hadoop/etc/hadoop" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler" "8" "ubuntu" "8" "akka.tcp://sparkWorker@ubuntu:55553/user/Worker" "app-20140825090434-0003"
14/08/25 09:04:19 INFO worker.Worker: Executor app-20140825090434-0003/8 finished with state FAILED message Command exited with code 1 exitStatus 1
14/08/25 09:04:19 INFO worker.Worker: Asked to launch executor app-20140825090434-0003/9 for Spark shell
14/08/25 09:04:19 INFO worker.ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" ":/home/hduser/spark/conf:/home/hduser/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.2-hadoop2.2.0.jar:/home/hduser/hadoop/etc/hadoop" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler" "9" "ubuntu" "8" "akka.tcp://sparkWorker@ubuntu:55553/user/Worker" "app-20140825090434-0003"
14/08/25 09:04:19 INFO worker.Worker: Executor app-20140825090434-0003/9 finished with state FAILED message Command exited with code 1 exitStatus 1
You may notice that the times are off. This is my fault. I had to re run the programs at different times to get a clean output. This is not due to the program.
What I want
How can I connect my master and slave such that I can run Scala programs on a distributed system?
FileAlreadyExistsException in Spark jobs As a result, the FileAlreadyExistsException error occurs. When any Spark executor fails, Spark retries to start the task, which might result into FileAlreadyExistsException error after the maximum number of retries.
Start the Spark worker on a specific port (default: random). Port for the worker web UI (default: 8081). Directory to run applications in, which will include both logs and scratch space (default: SPARK_HOME/work). Configuration properties that apply only to the worker in the form "-Dx=y" (default: none).
Launch Spark Shell (spark-shell) CommandGo to the Apache Spark Installation directory from the command line and type bin/spark-shell and press enter, this launches Spark shell and gives you a scala prompt to interact with Spark in scala language.
In Spark, stage failures happen when there's a problem with processing a Spark task. These failures can be caused by hardware issues, incorrect Spark configurations, or code problems.
I note from your logs that akka is using a simple hostname aidan-workstation
rather than a fully qualified domain name like aidan-workstation.acme.com
akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler
akka.tcp://sparkWorker@ubuntu:55553/user/Worker
From this user post it "may" be the issue you're having
I had to set SPARK_MASTER_IP in conf/start-master.sh to hostname -f instead of hostname, since akka seems not to work properly with host names / ip, it requires fully qualified domain names.
You can try editing your hosts file to include a faked domain name.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With