Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to connect Apache Spark with Yarn from the SparkContext?

I have developed a Spark application in Java using Eclipse.
So far, I am using the standalone mode by configuring the master's address to 'local[*]'.
Now I want to deploy this application on a Yarn cluster.
The only official documentation I found is http://spark.apache.org/docs/latest/running-on-yarn.html

Unlike the documentation for deploying on a mesos cluster or in standalone (http://spark.apache.org/docs/latest/running-on-mesos.html), there is not any URL to use within SparkContext for the master's adress.
Apparently, I have to use line commands to deploy spark on Yarn.

Do you know if there is a way to configure the master's adress in the SparkContext like the standalone and mesos mode?


1 Answers

There actually is a URL.

Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager

You should have at least hdfs-site.xml, yarn-site.xml, and core-site.xml files that specify all the settings and URLs for the Hadoop cluster you connect to.

Some properties from yarn-site.xml include yarn.nodemanager.hostname and yarn.nodemanager.address.

Since the address has a default of ${yarn.nodemanager.hostname}:0, you may only need to set the hostname.

like image 147
OneCricketeer Avatar answered Dec 09 '25 07:12

OneCricketeer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!