HBase and ZooKeeper roles in Hadoop?

Tags:

hadoop

I have installed Hadoop single node cluster in my Ubuntu machine and able to run NameNode, datanode etc.. Now i need to install HBase and Zookeeper. But i don't really know what they are. Guys could anybody give me brief description about those tools.

Thanks

438

asked Jul 29 '13 09:07

Radhakrishna

2 Answers

First of all I would strongly recommend you to go through the official pages of these projects. Go here for HBase and here for Zookeeper.

HBase is a NoSQL datastore that runs on top of your existing Hadoop cluster(HDFS). It provides you capabilities like random, real-time reads/writes, which HDFS being a FS lacks. Since it is a NoSQL datastore it doesn't follow SQL conventions and terminologies. HBase provides a good set of APIs( includes JAVA and Thrift). Along with this HBase also provides seamless integration with MapReduce framework. But, along with all these advantages of HBase you should keep this in mind that random read-write is quick but always has additional overhead. So think well before ye make any decision.

ZooKeeper is a high-performance coordination service for distributed applications(like HBase). It exposes common services like naming, configuration management, synchronization, and group services, in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.

HBase relies completely on Zookeeper. HBase provides you the option to use its built-in Zookeeper which will get started whenever you start HBAse. But it is not good if you are working on a production cluster. In such scenarios it's always good to have a dedicated Zookeeper cluster and integrate it with your HBase cluster.

Note : You should always have odd number of nodes in your ZK Quorum.

HTH

164

answered Oct 07 '22 18:10

Tariq

An overview:

Zookeeper: In short, zookeeper is a distributed application (cluster) configuration and management tool, and it exits independent of HBase. From the docs:

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.

HBase:The NoSQL datastore on top of the HDFS (can use simple file system, but it guarantees no data durability). HBase contains two primary services:

Master server - The master server (HMaster) co-ordinates the cluster and performs administrative operations, such as assigning regions and balancing the loads.
Region servers - The region servers do the real work. A subset of the data of each table is handled by each region server. Clients talk to region servers to access data in HBase.

The connection between HBase and Zookeeper:

A distributed HBase relies completely on Zookeeper (for cluster configuration and management). In Apache HBase, ZooKeeper coordinates, communicates, and shares state between the Masters and RegionServers. HBase has a design policy of using ZooKeeper only for transient data (that is, for coordination and state communication). Thus if the HBase’s ZooKeeper data is removed, only the transient operations are affected — data can continue to be written and read to/from HBase.

Once you have the HBase started - you can verify the processes it has started using jps command:

$ jps

the command will list all the java processes on the machine (HBase itself is a Java application) - the probable output (in case of simple standalone HBase setup) has to be:

62019 Jps
61098 HMaster        
61233 HRegionServer     
61003 HQuorumPeer

Technically speaking: By default HBase manages zookeeper itself i.e. starting and stopping the zookeeper quorum (the cluster of zookeeper nodes) when we start and stop HBase - to verify the settings look into the file conf/hbase-evn.sh (in your hbase directory) there must be a line:

export HBASE_MANAGES_ZK=true

Once set all we need to do is set the following directives in conf/hbase-site.xml - from docs:

 <configuration>
    ...
    <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2181</value>
      <description> The port at which the clients will connect.
      </description>
    </property>
    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value>
      <description>Comma separated list of servers in the ZooKeeper Quorum.
      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      By default this is set to localhost for local and pseudo-distributed modes
      of operation. For a fully-distributed setup, this should be set to a full
      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
      this is the list of servers which we will start/stop ZooKeeper on.
      </description>
    </property>
    <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/usr/local/zookeeper</value>
      <description>Property from ZooKeeper's config zoo.cfg.
      The directory where the snapshot is stored.
      </description>
    </property>
    ...
  </configuration>

answered Oct 07 '22 18:10

Nabeel Ahmed

Related questions
                            
                                Hadoop, Mahout real-time processing alternative
                            
                                Slow transfers in Jetty with chunked transfer encoding at certain buffer size
                            
                                hbase cannot find an existing table
                            
                                Rstudio-server environment variables not loading?
                            
                                What is the fastest way to bulk load data into HBase programmatically?
                            
                                Accessing Hue on Cloudera Docker QuickStart
                            
                                Reading and Writing Sequencefile using Hadoop 2.0 Apis
                            
                                hadoop and hbase rebalancing after node additions
                            
                                AWS Glue issue with double quote and commas
                            
                                What is the most mature library for building a Data Analytics Pipeline in Java/Scala for Hadoop?
                            
                                How to test if a kinit is needed?
                            
                                Got InterruptedException while executing word count mapreduce job
                            
                                Transfer file out from HDFS
                            
                                Difference between Hadoop Map Reduce and Google Map Reduce
                            
                                The type HTable(config,tablename) is deprecated. What use instead?
                            
                                hadoop MultipleInputs fails with ClassCastException
                            
                                what is the basic difference between jobconf and job?
                            
                                What is the difference between the fair and capacity schedulers?
                            
                                Hive 2.1.1 MetaException(message:Version information not found in metastore. )
                            
                                what is a data serialization system?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With