Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use spark-submit to submit a application to EC2 cluster

I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit to submit the application to the cluster. The command looks like this:

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://ec2-54-88-9-74.compute-1.amazonaws.com:7077 --executor-memory 2G --total-executor-cores 1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100

However, I got the following error:

ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.

Please let me know how to fix it. Thanks.

like image 839
user2547081 Avatar asked Jun 13 '14 07:06

user2547081


2 Answers

You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit is client which runs the driver on the machine that submitted it.

A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.

You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260

Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.

like image 119
jhappoldt Avatar answered Oct 15 '22 10:10

jhappoldt


I'm curious if you still having this issue ... But in case anyone is asking here is a brief answer. As clarified by jhappoldt, the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your local machine). Two workarounds are possible, tested and succeeded.

(1) From EC2 Management Console, create a new security group and add rules to enable TCP back and forth from your PC (public IP). (what I did was adding TCP rules inbound and outbound) ... Then add this security group to your master instance. (right click --> Networking --> Change security groups). Note: add it and don't remove the already established security groups.

This solution work well, but in your specific scenario, deploying your application from local machine to EC2 cluster, you will face further problems (resource related) so the next option is the best one

(2) Having your .jar file (or .egg) copy it to the master node using scp. You can check this link http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html for information about how to do that; and deploy your application from the master node. Note: spark is already pre-insalled so you will do nothing but write the same exact command you write on your local machine from ~/spark/bin. This shall work perfect.

like image 38
ElSioufy Avatar answered Oct 15 '22 10:10

ElSioufy