Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.

My Environment:

Scala: 2.10.4
Spark: 1.5.1
JDK:   1.8.40
OS:    CentOS Linux release 7.1.1503 (Core)

The local cluster:

Spark Master: spark://local1:7077

The remote cluster:

Spark Master: spark://remote1:7077

I want to finish this:

Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.

But I got some problem:

When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.

But When I use the remote cluster, I got a warning log:

TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

It is sufficient resources not sufficient memory!

And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.

I want to know:

  • Is it possible to submit codes from IntelliJ IDEA to remote cluster?
  • If it's OK, does it need configuration?
  • What are the possible reasons that can cause my problem?
  • How can I handle this problem?

Thanks a lot!

Update

There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.

I want to know whether the security policy or fireworks can cause this?

like image 555
xring Avatar asked Nov 09 '15 10:11

xring


1 Answers

Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.

My answer here is about a portion of your post: specifically the

TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the

$SPARK_HOME/conf/spark-defaults.conf

were not properly matching the parameters required for your task on the existing cluster. You may need to update:

spark.driver.memory   4g
spark.executor.memory   8g
spark.executor.cores  8

You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.

like image 177
WestCoastProjects Avatar answered Nov 02 '22 19:11

WestCoastProjects