I want to config Apache spark master to connect with Zookeeper I have installed both of them and run Zookeeper. In spark-env.sh, I add 2 lines: <pre class="prettyprint"><code>-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:2181 </code></pre> But when I start Apache spark with ./sbin/start-all.sh It shows errors <pre class="prettyprint"><code>/home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 46: -Dspark.deploy.recoveryMode=ZOOKEEPER: command not found /home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 47: -Dspark.deploy.zookeeper.url=localhost:2181: command not found </code></pre> I want to know how to add Zookeeper settings on spark-env.sh

Most probably you have added these lines directly to the file like so: <pre class="prettyprint"><code>export SPARK_PREFIX=`dirname "$this"`/.. export SPARK_CONF_DIR="$SPARK_HOME/conf" ... -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:2181 </code></pre> And when invoked by start-all.sh, bash complains that those <code>-Dspark...</code> are not valid commands. Note that <code>spark_config.sh</code> is a bash script and should contain valid bash expressions. Following the configuration guide at High Availability, you should set <code>SPARK_DAEMON_JAVA_OPTS</code> with the options for: <code>spark.deploy.recoveryMode</code>, <code>spark.deploy.zookeeper.url</code>, and <code>spark.deploy.zookeeper.dir</code>. Using your data, you need to add a line to <code>spark-conf.sh</code> like so: <pre class="prettyprint"><code>export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:2181" </code></pre>

How to set up Spark with Zookeeper for HA?

Tags:

apache-zookeeper

apache-spark

I want to config Apache spark master to connect with Zookeeper

I have installed both of them and run Zookeeper.

In spark-env.sh, I add 2 lines:

-Dspark.deploy.recoveryMode=ZOOKEEPER

-Dspark.deploy.zookeeper.url=localhost:2181

But when I start Apache spark with ./sbin/start-all.sh

It shows errors

/home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 46: -Dspark.deploy.recoveryMode=ZOOKEEPER: command not found

/home/deploy/spark-1.0.0/sbin/../conf/spark-env.sh: line 47: -Dspark.deploy.zookeeper.url=localhost:2181: command not found

I want to know how to add Zookeeper settings on spark-env.sh

244

asked Jun 12 '14 12:06

Minh Ha Pham

2 Answers

Most probably you have added these lines directly to the file like so:

export SPARK_PREFIX=`dirname "$this"`/..
export SPARK_CONF_DIR="$SPARK_HOME/conf"
...
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=localhost:2181

And when invoked by start-all.sh, bash complains that those -Dspark... are not valid commands. Note that spark_config.sh is a bash script and should contain valid bash expressions.

Following the configuration guide at High Availability, you should set SPARK_DAEMON_JAVA_OPTS with the options for: spark.deploy.recoveryMode, spark.deploy.zookeeper.url, and spark.deploy.zookeeper.dir.

Using your data, you need to add a line to spark-conf.sh like so:

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:2181"

132

answered Oct 03 '22 14:10

maasg

Try adding the below line in spark_env.sh

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=ZK1:2181,ZK2:2181,ZK3:2181 -Dspark.deploy.zookeeper.dir=/sparkha"

Please replace ZK1, ZK2 and ZK3 with your ZK quorum hosts and port and here /sparkha is the data store in ZK for spark , bu default it will be /spark Just tested , it worked for us . HTH

answered Oct 02 '22 14:10

Suman Banerjee

Related questions
                            
                                Access public available Amazon S3 file from Apache Spark
                            
                                how can I access spark javadoc or sources from java project?
                            
                                How to extract a value from a Vector in a column of a Spark Dataframe [duplicate]
                            
                                pyspark add new row to dataframe
                            
                                How to handle small file problem in spark structured streaming?
                            
                                How to mock inner call to pyspark sql function
                            
                                Is Apache Spark good for lots of small, fast computations and a few big, non-interactive ones?
                            
                                spark graphx: how to travers a graph to create a graph of second degree neighbors
                            
                                Running Spark on YARN in yarn-cluster mode: Where does the console output go?
                            
                                Spark CollectAsMap
                            
                                Performing lookup/translation in a Spark RDD or data frame using another RDD/df
                            
                                Why does my Spark run slower than pure Python? Performance comparison
                            
                                How to define a global read\write variables in Spark
                            
                                Why do we need kafka to feed data to apache spark
                            
                                How to insert spark structured streaming DataFrame to Hive external table/location?
                            
                                Spark (Scala) filter array of structs without explode
                            
                                Pure Java/Scala code for writing Tensorflow TFRecords data file
                            
                                Spark: saveAsTextFile without compression
                            
                                Encode an ADT / sealed trait hierarchy into Spark DataSet column
                            
                                where does df.cache() is stored

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With