I wish to connect to a remote cluster and execute a Spark process. So, from what I have read, this is specified in the SparkConf.
val conf = new SparkConf()
.setAppName("MyAppName")
.setMaster("spark://my_ip:7077")
Where my_ip is the IP address of my cluster. Unfortunately, I get connection refused. So, I am guessing some credentials must be added to connect correctly. How would I specify the credentials? It seems it would be done with .set(key, value), but have no leads on this.
Connecting an Application to the Cluster To run an application on the Spark cluster, simply pass the spark://IP:PORT URL of the master as to the SparkContext constructor. You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.
Based on the resource manager, the spark can run in two modes: Local Mode and cluster mode. The way we specify the resource manager is by the way of a command-line option called --master. Local Mode is also known as Spark in-process is the default mode of spark.
There are two things missing:
yarn
(setMaster("yarn")) and the deploy-mode to cluster
,
your current setup is used for Spark standalone. More info here:
http://spark.apache.org/docs/latest/configuration.html#application-properties
yarn-site.xml
and core-site.xml
files from the cluster and put them in HADOOP_CONF_DIR
, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html
By the way, this would work if you use spark-submit
to submit a job, programatically it's more complex to achieve it and could only use yarn-client
mode which is tricky to setup remotely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With