Apache Spark shell crashes when trying to start executor on worker

Tags:

Background

I have been battling with Apache Spark and have worked out most errors except one. I have a Master and one Slave. I can start the master via

./sbin/start-master.sh

and then I can connect to it from the slave by

JAVA_OPTS="-Xmx10g" ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.17.16.43:7077

I then see the success message

14/08/25 08:47:04 INFO worker.Worker: Successfully registered with master spark://10.17.16.43:7077

All of these errors are repeatable (I have been at this for a while). I can telnet into the master from the slave just fine as is mentioned in most other tutorials. SSH is configured to not need passwords between master and slave (RSA keys) as mentioned elsewhere.

I have spark/conf/spark-env.sh set to the following. There are more lines that are commented out

export SPARK_DAEMON_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dspark.akka.logLifecycleEvents=true"
export SPARK_LOCAL_IP=`ifconfig | sed -En 's/127.0.0.1//;s/.*inet (addr:)?(([0-9]*\.){3}[0-9]*).*/\2/p' | head -1`
export SPARK_MASTER_IP=$SPARK_LOCAL_IP
export SPARK_MASTER_WEBUI_PORT=8090
export SPARK_WORKER_CORES=1

I pulled those from various tutorials in hope that they would fix something.

Here is my master /etc/hosts

127.0.0.1       localhost
10.17.16.43     aidan-workstation
10.17.16.49     ubuntu

And slave

127.0.0.1   localhost
10.17.16.49 ubuntu
10.17.16.43 aidan-workstation

The Error

When I run ./bin/spark-shell

I get the following in the master terminal ( just posted the tail end of it the full output is here )

14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor added: app-20140825085822-0002/8 on worker-20140825084704-ubuntu-49237 (ubuntu:49237) with 8 cores
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140825085822-0002/8 on hostPort ubuntu:49237 with 8 cores, 512.0 MB RAM
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/8 is now RUNNING
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/8 is now FAILED (Command exited with code 1)
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140825085822-0002/8 removed: Command exited with code 1
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor added: app-20140825085822-0002/9 on worker-20140825084704-ubuntu-49237 (ubuntu:49237) with 8 cores
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20140825085822-0002/9 on hostPort ubuntu:49237 with 8 cores, 512.0 MB RAM
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/9 is now RUNNING
14/08/25 08:58:25 INFO client.AppClient$ClientActor: Executor updated: app-20140825085822-0002/9 is now FAILED (Command exited with code 1)
14/08/25 08:58:25 INFO cluster.SparkDeploySchedulerBackend: Executor app-20140825085822-0002/9 removed: Command exited with code 1
14/08/25 08:58:25 ERROR client.AppClient$ClientActor: Master removed our application: FAILED; stopping client
14/08/25 08:58:25 WARN cluster.SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection...

And at the same time the slave outputs (tail as well full output is here as well)

14/08/25 09:04:18 INFO worker.ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" ":/home/hduser/spark/conf:/home/hduser/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.2-hadoop2.2.0.jar:/home/hduser/hadoop/etc/hadoop" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler" "7" "ubuntu" "8" "akka.tcp://sparkWorker@ubuntu:55553/user/Worker" "app-20140825090434-0003"
14/08/25 09:04:18 INFO worker.Worker: Executor app-20140825090434-0003/7 finished with state FAILED message Command exited with code 1 exitStatus 1
14/08/25 09:04:18 INFO worker.Worker: Asked to launch executor app-20140825090434-0003/8 for Spark shell
14/08/25 09:04:18 INFO worker.ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" ":/home/hduser/spark/conf:/home/hduser/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.2-hadoop2.2.0.jar:/home/hduser/hadoop/etc/hadoop" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler" "8" "ubuntu" "8" "akka.tcp://sparkWorker@ubuntu:55553/user/Worker" "app-20140825090434-0003"
14/08/25 09:04:19 INFO worker.Worker: Executor app-20140825090434-0003/8 finished with state FAILED message Command exited with code 1 exitStatus 1
14/08/25 09:04:19 INFO worker.Worker: Asked to launch executor app-20140825090434-0003/9 for Spark shell
14/08/25 09:04:19 INFO worker.ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-oracle/bin/java" "-cp" ":/home/hduser/spark/conf:/home/hduser/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.2-hadoop2.2.0.jar:/home/hduser/hadoop/etc/hadoop" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler" "9" "ubuntu" "8" "akka.tcp://sparkWorker@ubuntu:55553/user/Worker" "app-20140825090434-0003"
14/08/25 09:04:19 INFO worker.Worker: Executor app-20140825090434-0003/9 finished with state FAILED message Command exited with code 1 exitStatus 1

You may notice that the times are off. This is my fault. I had to re run the programs at different times to get a clean output. This is not due to the program.

What I want

How can I connect my master and slave such that I can run Scala programs on a distributed system?

212

asked Aug 25 '14 16:08

ignorance

1 Answers

I note from your logs that akka is using a simple hostname aidan-workstation rather than a fully qualified domain name like aidan-workstation.acme.com

akka.tcp://spark@aidan-workstation:60456/user/CoarseGrainedScheduler
akka.tcp://sparkWorker@ubuntu:55553/user/Worker

From this user post it "may" be the issue you're having

I had to set SPARK_MASTER_IP in conf/start-master.sh to hostname -f instead of hostname, since akka seems not to work properly with host names / ip, it requires fully qualified domain names.

You can try editing your hosts file to include a faked domain name.

125

answered Sep 28 '22 03:09

Brad

Related questions
                            
                                How to auto start an application in openwrt?
                            
                                Using local settings through SSH
                            
                                how to use shell script checking last changed time of a file
                            
                                how to Find a substring in a bash shell script variable
                            
                                How to disable 'zip' warning in bash?
                            
                                How to kill nodemon process on mac? [closed]
                            
                                Save last working directory on Bash logout
                            
                                How to use a for loop in make recipe
                            
                                OSX bash "sleep"
                            
                                How to check if folder is empty or have folder file use shell-script? [duplicate]
                            
                                How to respond to password prompt when using SCP in a shell script?
                            
                                Bash Script Regular Expressions...How to find and replace all matches?
                            
                                Determining age of a file in shell script
                            
                                Shell Scripting: Using bash with xargs
                            
                                Capistrano 'Bundle Not Found' Error During Deployment
                            
                                How to check if a string contains a special character (!@#$%^&*()_+)
                            
                                Verifying that a copy succeeded
                            
                                Multiple conditions in if statement shell script [duplicate]
                            
                                How to build an executable for Android shell

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache Spark shell crashes when trying to start executor on worker

Tags:

shell

scala

apache-spark

ignorance

People also ask

1 Answers

Brad

Recent Activity

Donate For Us