HBase: How does replication work?

Tags:

I'm currently evaluating HBase as a Datastore, but one question was left unanswered: HBase stores many copies of the same object on many nodes (aka replication). As HBase features so-called strong consistency (in constrast to eventual consistent) it guarantees that every replica returns the same value if read.

As I understood the HBase concept, when reading values, first the HBase master is queried for a (there must be more than one) RegionServer providing the data. Then I can issue read and write requests without invention of the master. How can then replication work?

How does HBase provide concistency?
How do write operations internally work?
Do write operations block until all replicas are written (=> synchronous replication). If yes, who manages this transfer?
How does HDFS come into the game?

I have already read the BigTable-Paper and searched the docs, but I found no further information on the architecture of HBase.

Thanks!

576

asked Mar 24 '11 10:03

theomega

1 Answers

hbase does not do any replication in the way that you are thinking. It is built on top of HDFS, which provides replication for the data blocks that make up the hbase tables. However, only one regionserver ever serves or writes data for any given row.

Usually regionservers are colocated with data nodes. All data writes in HDFS go to the local node first, if possible, another node on the same rack, and another node on a different rack (given a replication factor of 3 in HDFS). So, a region server will eventually end up with all of its data served from the local server.

As for blocking: the only block is until the WAL (write ahead log) is flushed to disk. This guarentees that no data is lost as the log can always be replayed. Note that older version of hbase did not have this worked out because HDFS did not support a durable append operation until recently. We are in a strange state for the moment as there is no official Apache release of Hadoop that supports both append and HBase. In the meantime, you can either apply the append patch yourself, or use the Cloudera distribution (recommended).

HBase does have a related replication feature that will allow you to replicate data from one cluster to another.

answered Sep 19 '22 11:09

David

Related questions
                            
                                Why doesn't Spring support direct field dependency injection (except for autowired)?
                            
                                Is there any Template Haskell tutorial for someone who doesn't know Lisp?
                            
                                programmatically trigger BSOD
                            
                                CUDA Block and Grid size efficiencies
                            
                                SVG: when to use animVal / baseVal
                            
                                Difference between initialization of static variables in C and C++
                            
                                StatsD and Graphite-like tools for .Net and Windows [closed]
                            
                                HTTP request cost vs. page size cost?
                            
                                How to Maintain VOIP socket connection in background?
                            
                                UML association vs. composition and detail level
                            
                                Programmatically Checking if a Passcode Lock is Set
                            
                                What's the difference between async and nonblocking in unix socket?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With