I want to config Apache spark master to connect with Zookeeper
I have installed both of them and run Zookeeper.
In spark-env.sh, I add 2 lines:
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=localhost:2181
But when I start Apache spark with ./sbin/start-all.sh
It shows errors
/home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 46: -Dspark.deploy.recoveryMode=ZOOKEEPER: command not found
/home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 47: -Dspark.deploy.zookeeper.url=localhost:2181: command not found
I want to know how to add Zookeeper settings on spark-env.sh
First we need to have an established Zookeeper cluster. Start the Spark Master on multiple nodes and ensure that these nodes have the same Zookeeper configuration for ZooKeeper URL and directory.
To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself.
First we need to have an established Zookeeper cluster. Start the Spark Master on multiple nodes and ensure that these nodes have the same Zookeeper configuration for ZooKeeper URL and directory. The master can be added or removed at any time.
Kafka and Zookeeper HA Cluster Setup. Kafka and Zookeeper cluster setup is something where multiple Kafka and Zookeeper are running on different VMs to provide HA. Before we get into things like what is Kafka Cluster? What is HA? We must learn about What is Kafka? Where it is used? What is Kafka? Kafka is an open-source platform.
Set hive.server2.zookeeper.namespace to the value that you want to use as the root namespace on ZooKeeper. The default value is hiveserver2. 5. The adminstrator should ensure that the ZooKeeper service is running on the cluster, and that each HiveServer2 instance gets a unique host:port combination to bind to upon startup.
One master instance will take the role of a master and others would be in the standby mode. If the current master dies, Zookeeper will elect another standby instance as a Master, recover the older Master's state and then resume the scheduling .
Most probably you have added these lines directly to the file like so:
export SPARK_PREFIX=`dirname "$this"`/..
export SPARK_CONF_DIR="$SPARK_HOME/conf"
...
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=localhost:2181
And when invoked by start-all.sh, bash complains that those -Dspark...
are not valid commands. Note that spark_config.sh
is a bash script and should contain valid bash expressions.
Following the configuration guide at High Availability, you should set SPARK_DAEMON_JAVA_OPTS
with the options for: spark.deploy.recoveryMode
, spark.deploy.zookeeper.url
, and spark.deploy.zookeeper.dir
.
Using your data, you need to add a line to spark-conf.sh
like so:
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:2181"
Try adding the below line in spark_env.sh
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=ZK1:2181,ZK2:2181,ZK3:2181 -Dspark.deploy.zookeeper.dir=/sparkha"
Please replace ZK1, ZK2 and ZK3 with your ZK quorum hosts and port and here /sparkha is the data store in ZK for spark , bu default it will be /spark Just tested , it worked for us . HTH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With