HBase HDFS zookeeper

Tags:

Now I am learning about HBase. I set up my HBase Cluster and Hadoop Cluster like this:

server1: Namenode HMaster
server2: datanode1 RegionServer1 HQuorumPeer
Server3: datanode2 RegionServer2 HQuorumPeer
Server4: datanode3 RegionServer3 HQuorumPeer

I have several question about HBase cluster:

1: All RegionServers must be in the Hadoop Cluster so it can use HDFS to store 
   data, even though it will store data into local file system, right?
2: What does RegionServer do? Does the HMaster give the job to all RegionServeres 
   and let them running parallel, like tasktracker in datanode? 
3: What does zookeeper do? Do I need to setup zookeeper in all RegionServers 
   nodes and the master node? 
4: It is related to #3. I know HBase uses zookeeper to recovery once regionServer 
   is down. How does it specific work?

619

asked Sep 10 '13 21:09

user2597504

1 Answers

All RegionServers must be in the Hadoop Cluster so it can use HDFS to store data, even though it will store data into local file system, right?

Yes. RegionServers are the daemons that are responsible for storing data in a HBase cluster. You store data in HBase tables which are spread over many regions on several RegionServers across the cluster. Although data goes into the RegionServers, it actually gets stored inside HDFS. But if you are on a standalone setup HDFS is not used. The data gets stored directly in the local FS. It is analogous to any DB and FS. Take MSQL and ext3 for example. And yes, all the HDFS data is stored on your disk in reality. You cannot see it directly though.

What does RegionServer do? Does the HMaster give the job to all RegionServeres and let them running parallel, like tasktracker in datanode?

As specified in the comment above RegionServer is the daemon that actually stores data in a HBase cluster. I'm sorry I didn't quite get the second part of this question. what do you mean by like tasktracker in datanode? In a HBase cluster HMaster is the daemon which is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. Its job is monitoring and management. Regionservers don't run any job like TaskTrackers do. They just store data and are responsible for stuff like serving and managing regions.

What does zookeeper do? Do I need to setup zookeeper in all RegionServers nodes and the master node?

Zookeeper is the guy who coordinates everything behind the curtains. It is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. A distributed HBase setup depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to access the running ZooKeeper ensemble. HBase by default manages a ZooKeeper cluster. It gets started and stopped as part of the HBase start/stop process. But, you can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. You don't have to have Zookeepers running on all the nodes. Just decide some number which suits your cluster. One thing to note here is that you should always use an odd number of Zookeepers.

It is related to #3. I know HBase uses zookeeper to recovery once regionServer is down. How does it specific work?

Each RegionServer is connected to ZooKeeper, and the master watches these connections. ZooKeeper manages a heartbeat with a timeout. So, on a timeout, the HMaster declares the region server as dead, and starts the recovery process. Following things happen during the recovery process :

Identifying that a node is down : a node can cease to respond simply because it is overloaded or as well because it is dead.
Recovering the writes in progress : that’s reading the commit log and recovering the edits that were not flushed.
Reassigning the regions : the region server was previously handling a set of regions. This set must be reallocated to other region servers, depending on their respective workload.

The process is actually a bit more involved. You can find more on this here. I would also suggest you to go through the book HBase The Definitive Guide by Lars in order to get some grip on HBase.

HTH

103

answered Sep 23 '22 06:09

Tariq

Related questions
                            
                                How HiveContext of spark internally works?
                            
                                Change column type in hive
                            
                                curl: (5) Could not resolve proxy: DELETE; Unknown error
                            
                                Sorting by value in Hadoop from a file
                            
                                How to stop a particular job while running Hive queries on Hadoop?
                            
                                Java Hadoop: How can I create mappers that take as input files and give an output which is the number of lines in each file?
                            
                                Splitting a tuple into multiple tuples in Pig
                            
                                how to set classpath for a Java program on hadoop file system
                            
                                How to resolve java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2?
                            
                                Hive - get column names
                            
                                Hive (Finding min of n columns in a row)
                            
                                How recursively use a directory structure in the new Hadoop API?
                            
                                Spark Shell stuck in YARN Accepted state
                            
                                List folder and files of HDFS using JAVA
                            
                                In Nifi, what is the difference between FirstInFirstOutPrioritizer and OldestFlowFileFirstPrioritizer
                            
                                spark select and add columns with alias
                            
                                Splitting input into substrings in PIG (Hadoop)
                            
                                Video Tutorial for Hadoop [closed]
                            
                                what is best HBase client API for java [closed]
                            
                                Cassandra and MapReduce - minimal setup requirements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HBase HDFS zookeeper

Tags:

apache-zookeeper

hadoop

hbase

user2597504

People also ask

1 Answers

Tariq

Recent Activity

Donate For Us