Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly is the zookeeper quorum setting in hbase-site.xml?

Tags:

What exactly is the zookeeper quorum setting in hbase-site.xml?

like image 760
raj Avatar asked Dec 14 '10 09:12

raj


People also ask

What is HBase ZooKeeper quorum?

The hbase. zookeeper. quorum property is a comma-separated list of hosts on which ZooKeeper servers are running. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".

How is ZooKeeper used in HBase?

HBase uses ZooKeeper as a distributed coordination service for region assignments and to recover any region server crashes by loading them onto other region servers that are functioning. ZooKeeper is a centralized monitoring server that maintains configuration information and provides distributed synchronization.

Does HBase require ZooKeeper?

And you can build on it for your own, specific needs. HBase relies completely on Zookeeper. HBase provides you the option to use its built-in Zookeeper which will get started whenever you start HBAse. But it is not good if you are working on a production cluster.

Where is HBase site XML located?

In all cases of hbase the /etc/hbase/conf/hbase-site. xml file is always read. The /usr/lib/hbase/conf/hbase-site. xml is a symlink to /etc/hbase/conf/hbase-site.


2 Answers

As described in hbase-default.xml, here's the setting:

Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.

What this actually does has been answered by Edward J. Yoon here. With editing on my part, for clarity:

The Apache Zookeeper is a coordination service for distributed applications, like Google's Chubby. Many projects uses zookeeper, and we (Apache Hama) also use zookeeper for barrier synchronization of Bulk Synchronous Parallel computing framework.

Today, I surveyed more about the paxos and dynamic quorum features of the Zookeeper project, to better name the class org.apache.hama.zookeeper.QuorumPeer. Because of documentation is not enough ( http://hadoop.apache.org/zookeeper/docs/r3.0.0/api/index.html ), I didn't understand the meaning of "quorum", as this term was somewhat odd to me. But, "org.apache.hama.zookeeper.QuorumPeer" is the proper name!! xD

So, what is the Quorum and why do we need a Quorum?

According to Wikipedia, Quorum is the minimum number of members of a deliberative body necessary to conduct the business of that group. Ordinarily, this is a majority of the people expected to be there, although many bodies may have a lower or higher quorum.

As you know, a Fault-Tolerant mechanism is one of the important functions of distributed system. The Quorum algorithm is used to prevent a split-brain condition. When split-brain condition occurs, according to the Quorum algorithm, zookeeper determines the "Primary Partition" and "Secondary Partition". Then, the servers in primary group receive and process user's request, and the servers in secondary group become read-only.

When does this system recover from a split-brain condition? When they're merged to one partition again. Internally, zookeeper uses atomic broadcast protocol instead of Paxos.

You should also read the original version, in case I mistranslated the concepts he was trying to present.

My understanding of the quorum mechanism in Apache Zookeeper is it explicitly defines a replication quorum across several pre-defined hosts. If this quorum is not met, the partitions that disagree are split off to a secondary partition until Zookeeper can reintegrate them with the primary partition.

This adds more granularity to Hadoop's eventual consistency model. HBase, meanwhile, is currently in the process of further integrating Zookeeper with its code.

like image 113
MrGomez Avatar answered Sep 19 '22 11:09

MrGomez


From the hbase-default.xml file:

Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.

And from the Getting Started's Requirements section:

HBase depends on ZooKeeper as of release 0.20.0. HBase keeps the location of its root table, who the current master is, and what regions are currently participating in the cluster in ZooKeeper. Clients and Servers now must know their ZooKeeper Quorum locations before they can do anything else (Usually they pick up this information from configuration supplied on their CLASSPATH). By default, HBase will manage a single ZooKeeper instance for you. In standalone and pseudo-distributed modes this is usually enough, but for fully-distributed mode you should configure a ZooKeeper quorum (more info below).

Hope that helps.

like image 43
Jean-Daniel Cryans Avatar answered Sep 20 '22 11:09

Jean-Daniel Cryans