Cannot do simple task on ec2 spark cluster from local pyspark

Tags:

I am trying to execute pyspark from my mac to do compute on a EC2 spark cluster.
If I login to the cluster, it works as expected:

$ ec2/spark-ec2 -i ~/.ec2/spark.pem -k spark login test-cluster2
$ spark/bin/pyspark

Then do a simple task

>>> data=sc.parallelize(range(1000),10)`
>>> data.count()

Works as expected:

14/06/26 16:38:52 INFO spark.SparkContext: Starting job: count at <stdin>:1
14/06/26 16:38:52 INFO scheduler.DAGScheduler: Got job 0 (count at <stdin>:1) with 10 output partitions (allowLocal=false)
14/06/26 16:38:52 INFO scheduler.DAGScheduler: Final stage: Stage 0 (count at <stdin>:1)
...
14/06/26 16:38:53 INFO spark.SparkContext: Job finished: count at <stdin>:1, took 1.195232619 s
1000

But now if I try the same thing from local machine,

$ MASTER=spark://ec2-54-234-204-13.compute-1.amazonaws.com:7077 bin/pyspark

it can't seem to connect to the cluster

14/06/26 09:45:43 INFO AppClient$ClientActor: Connecting to master spark://ec2-54-234-204-13.compute-1.amazonaws.com:7077...
14/06/26 09:45:47 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
...
  File "/Users/anthony1/git/incubator-spark/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o20.collect.
: org.apache.spark.SparkException: Job aborted: Spark cluster looks down
14/06/26 09:53:17 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

I thought the problem was in the ec2 security but it does not help even after adding inbound rules to both master and slave security groups to accept all ports.

Any help will be greatly appreciated!

Others are asking same question on mailing list http://apache-spark-user-list.1001560.n3.nabble.com/Deploying-a-python-code-on-a-spark-EC2-cluster-td4758.html#a8465

234

asked Jun 26 '14 21:06

Anthony

2 Answers

The spark-ec2 script configure the Spark Cluster in EC2 as standalone, which mean it can not work with remote submits. I've been struggled with this same error you described for days before figure out it's not supported. The message error is unfortunately incorrect.

So you have to copy your stuff and log into the master to execute your spark task.

answered Oct 24 '22 05:10

Felix

In my experience Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory usually means you have accidentally set the cores too high, or set the executer memory too high - i.e. higher than what your nodes actually have.

Other, less likely causes, could be you got the URI wrong and your not really connecting to the master. And once I saw that problem when the /run partition was 100%.

Even less likely, your cluster may actually be down, and you need to restart your spark workers.

answered Oct 24 '22 06:10

samthebest

Related questions
                            
                                DynamoDB: Best hash/sort keys for my use case [confusion with AppSync/GraphQL]
                            
                                Can not access S3 via VPC endpoint in Lambda
                            
                                Using SAM application how to define body mapping templates
                            
                                RDS MySQL Storage Full ... When DB usage is low
                            
                                Unable to access ECR repository from separate account via `docker pull`
                            
                                Configure AWS Cloud9 to use Anaconda Python Environment
                            
                                Can't access EKS api server endpoint within VPC when private access is enabled
                            
                                ALB is not propagating response headers correctly
                            
                                Resolving cyclical dependencies between AWS CDK CloudFormation stacks
                            
                                getSecretValue callback is not working in AWS Lambda
                            
                                Why Lambda@Edge has to be in us-east-1 region?
                            
                                have R halt the EC2 machine it's running on
                            
                                Multi-threading to speed up an email-sending application
                            
                                How to create an Amazon VPC using AWS CloudFormation?
                            
                                Routing DNS (Route 53) to an Elastic Beanstalk application without a loadbalancer
                            
                                Specify region when publishing to an SNS topic using AWS SDK for Ruby
                            
                                AWS signup asks for credit card but I want to use consolidated billing
                            
                                Will star schema benefit in redshift?
                            
                                AWS Elastic Beanstalk Backup & Recovery
                            
                                Reference SecurityGroup from another cloudformation template inside VPC

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cannot do simple task on ec2 spark cluster from local pyspark

Tags:

amazon-web-services

amazon-ec2

apache-spark

Anthony

People also ask

2 Answers

Felix

samthebest

Recent Activity

Donate For Us