I wish to connect to a remote cluster and execute a Spark process. So, from what I have read, this is specified in the SparkConf. <pre class="prettyprint"><code> val conf = new SparkConf() .setAppName("MyAppName") .setMaster("spark://my_ip:7077") </code></pre> Where my_ip is the IP address of my cluster. Unfortunately, I get connection refused. So, I am guessing some credentials must be added to connect correctly. How would I specify the credentials? It seems it would be done with .set(key, value), but have no leads on this.

There are two things missing: <ul> <li>The cluster manager should be set to <code>yarn</code> (setMaster("yarn")) and the deploy-mode to <code>cluster</code>, your current setup is used for Spark standalone. More info here: http://spark.apache.org/docs/latest/configuration.html#application-properties </li> <li>Also, you need to get <code>yarn-site.xml</code> and <code>core-site.xml</code> files from the cluster and put them in <code>HADOOP_CONF_DIR</code>, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html </li> </ul> By the way, this would work if you use <code>spark-submit</code> to submit a job, programatically it's more complex to achieve it and could only use <code>yarn-client</code> mode which is tricky to setup remotely.

Scala Spark connect to remote cluster

Tags:

scala

apache-spark

remote-access

I wish to connect to a remote cluster and execute a Spark process. So, from what I have read, this is specified in the SparkConf.

 val conf = new SparkConf()
  .setAppName("MyAppName")
  .setMaster("spark://my_ip:7077")

Where my_ip is the IP address of my cluster. Unfortunately, I get connection refused. So, I am guessing some credentials must be added to connect correctly. How would I specify the credentials? It seems it would be done with .set(key, value), but have no leads on this.

949

asked Apr 26 '17 09:04

Alessandro La Corte

1 Answers

There are two things missing:

The cluster manager should be set to yarn (setMaster("yarn")) and the deploy-mode to cluster, your current setup is used for Spark standalone. More info here: http://spark.apache.org/docs/latest/configuration.html#application-properties
Also, you need to get yarn-site.xml and core-site.xml files from the cluster and put them in HADOOP_CONF_DIR, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html

By the way, this would work if you use spark-submit to submit a job, programatically it's more complex to achieve it and could only use yarn-client mode which is tricky to setup remotely.

answered Oct 16 '22 06:10

jamborta

Related questions
                            
                                why do I get an "ambiguous implicits" error despite having prioritized implicits?
                            
                                ClassNotFoundException: org.apache.spark.repl.SparkCommandLine
                            
                                Efficiently extract WikiData entities from text
                            
                                javac and scalac warnings as errors only for certain types
                            
                                akka streams over tcp
                            
                                Scala SuperSafe Community Plugin artifact, sbt 0.13, scala 2.11.8 not resolving
                            
                                Scala mongodb : result of query as list
                            
                                Using aws credentials profiles with spark scala app
                            
                                Is using Any for Union type a good idea?
                            
                                Is there any action in RDD keeps the order?
                            
                                Spark2 - LogisticRegression training finished but the result is not converged because: line search failed
                            
                                How to get Kotlin's type safe builders to work in Scala?
                            
                                The usage of serializable object: Caused by: java.io.NotSerializableException
                            
                                Free ~> Trampoline : recursive program crashes with OutOfMemoryError
                            
                                Generic derivation of AnyVal types with Circe
                            
                                Scala how to get last calculated value of stream?
                            
                                Play Slick: How to inject DbConfigProvider in tests
                            
                                How to implement multiple Silhouette Authenticators?
                            
                                Akka streams. Group by, aggregate for some time and emit result
                            
                                Does Scala Future[T] block internally? What happens inside Scala Future?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With