Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase: how put/get knows which region server to write to?

In HBase, how the put/get operations know which region server the row should be written to? In case of multiple rows to be read how multiple region servers are contacted and the results are retrieved?

like image 584
Vinodh Avatar asked Sep 10 '13 12:09

Vinodh


Video Answer


1 Answers

I assume your question is simply curiosity, since this behavior is abstracted from the user and you shouldn't care.


In HBase, how the put/get operations know which region server the row should be written to?

From the hbase documentation book:

The HBase client HTable is responsible for finding RegionServers that are serving the particular row range of interest. It does this by querying the .META. and -ROOT- catalog tables (TODO: Explain). After locating the required region(s), the client directly contacts the RegionServer serving that region (i.e., it does not go through the master) and issues the read or write request. This information is cached in the client so that subsequent requests need not go through the lookup process. Should a region be reassigned either by the master load balancer or because a RegionServer has died, the client will requery the catalog tables to determine the new location of the user region.

So first step is looking up in meta and root to determine where it is, then it contacts that regionserver to do that work.


In case of multiple rows to be read how multiple region servers are contacted and the results are retrieved?

There are two ways to read from HBase in general: scanners and gets.

If you run multiple gets, those will each individually fetch those records separately. Each one of those is possibly going to a different region server.

The scanner will simply look for the start of the range and then move forward from there. Sometimes it needs to move to a different regionserver when it reaches the end, but the client handles that behind the scenes. If there is some way to design the table such that your multiple gets is a scan and not a series of gets, you should hypothetically have better performance.

like image 168
Donald Miner Avatar answered Oct 21 '22 11:10

Donald Miner