Submitting jobs to Spark EC2 cluster remotely

Tags:

apache-spark

I've set up the EC2 cluster with Spark. Everything works, all master/slaves are up and running.

I'm trying to submit a sample job (SparkPi). When I ssh to cluster and submit it from there - everything works fine. However when driver is created on a remote host (my laptop), it doesn't work. I've tried both modes for --deploy-mode:

--deploy-mode=client:

From my laptop:

./bin/spark-submit --master spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar

Results in the following indefinite warnings/errors:

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15/02/22 18:30:45

ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 15/02/22 18:30:45

ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1

...and failed drivers - in Spark Web UI "Completed Drivers" with "State=ERROR" appear.

I've tried to pass limits for cores and memory to submit script but it didn't help...

--deploy-mode=cluster:

From my laptop:

./bin/spark-submit --master spark://ec2-52-10-82-218.us-west-2.compute.amazonaws.com:7077 --deploy-mode cluster --class SparkPi ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar

The result is:

.... Driver successfully submitted as driver-20150223023734-0007 ... waiting before polling master for driver state ... polling master for driver state State of driver-20150223023734-0007 is ERROR Exception from cluster was: java.io.FileNotFoundException: File file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar does not exist. java.io.FileNotFoundException: File file:/home/oleg/spark/spark12/ec2test/target/scala-2.10/ec2test_2.10-0.0.1.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:329) at org.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150) at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:75)

So, I'd appreciate any pointers on what is going wrong and some guidance how to deploy jobs from remote client. Thanks.

UPDATE: So for the second issue in cluster mode, the file must be globally visible by each cluster node, so it has to be somewhere in accessible location. This solve IOException but leads to the same issue as in the client mode.

641

asked Feb 23 '15 02:02

Oleg Shirokikh

1 Answers

The documentation at:

http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security

lists all the different communication channels used in a Spark cluster. As you can see, there are a bunch where the connection is made from the Executor(s) to the Driver. When you run with --deploy-mode=client, the driver runs on your laptop, so the executors will try to make a connection to your laptop. If the AWS security group that your executors run under blocks outbound traffic to your laptop (which the default security group created by the Spark EC2 scripts doesn't), or you are behind a router/firewall (more likely), they fail to connect and you get the errors you are seeing.

So to resolve it, you have to forward all the necessary ports to your laptop, or reconfigure your firewall to allow connection to the ports. Seeing as a bunch of the ports are chosen at random, this means opening up a wide range of, if not all ports. So probably using --deploy-mode=cluster, or client from the cluster, is less painful.

140

answered Nov 16 '22 01:11

sgvd

Related questions
                            
                                Node.js + Socket.IO scaling with redis + cluster
                            
                                Speed up AMI and ASG Creation
                            
                                Status Code 460 on Application Load Balancer
                            
                                Get list of EC2 instances with specific Tag and Value in Boto3
                            
                                Linux in EC2(Amazon) cannot use port 80 for tomcat [closed]
                            
                                Keep running a python script on AWS EC2 even if CLI session is closed
                            
                                Accessing Tensorboard on AWS
                            
                                Installing docker-compose on Amazon EC2 Linux 2. 9kb docker-compose file
                            
                                AWS - EC2 instances not showing up in console
                            
                                How to set up telnet in AWS instance?
                            
                                Sublime text SFTP on EC2
                            
                                Amazon EC2 - Apache server restart issue
                            
                                What is the difference between AWS Elastic MapReduce and AWS Redshift
                            
                                AWS NLB in public subnets with EC2 in private subnets
                            
                                KAFKA broker throwing Wrong request type 18 and Wrong request type 16
                            
                                Force EC2 Instance Replacement When Updating UserData in CloudFormation
                            
                                AWS EC2 - what is the difference between Amazon Linux AMIs
                            
                                Using R's GPU packages on Amazon
                            
                                Take backup of AWS configuration across all services

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With