Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Value for HADOOP_CONF_DIR from Cluster

I have setup a cluster(YARN) using Ambari with 3 VMs as hosts.

Where I can find the value for HADOOP_CONF_DIR ?

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn-cluster \  # can also be `yarn-client` for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000
like image 359
nish1013 Avatar asked Dec 17 '15 11:12

nish1013


People also ask

Where is HADOOP_CONF_DIR?

Also, CDH cluster's HADOOP_CONF_DIR should by default be set to /etc/hadoop/conf .

How do I run Spark-submit in cluster mode?

You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.

Do you need to install Spark on all nodes of YARN cluster?

If you use yarn as manager on a cluster with multiple nodes you do not need to install spark on each node. Yarn will distribute the spark binaries to the nodes when a job is submitted. Running Spark on YARN requires a binary distribution of Spark which is built with YARN support.


1 Answers

Install Hadoop as well. In my case I've installed it in /usr/local/hadoop

Setup Hadoop Environment Variables

export HADOOP_INSTALL=/usr/local/hadoop

Then set the conf directory

export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
like image 190
Saurabh Avatar answered Sep 24 '22 08:09

Saurabh