I'm new to HBase. Currently I'm using hortonworks sandbox hdp2. While studying Hbase, I came across some questions.
Where does hbase stores data?
If it stores on HDFS, then how it perform update operation, as hdfs is write once & read many times
All HRegion metadata of HBase is stored in the . META. table.
Just like in a Relational Database, data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.
Key Differences Between HDFS and HBase HDFS is a distributed file system that is well suited for the storage of large files. But HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
The column family is part of the directory name where the Hbase data is stored.
By default Hbase stores the data in HDFS. It is possible to run HBase over other distributed file systems like Amazon s3, GFS etc. We can't edit hdfs, but we can append data to HDFS. HDFS supports append feature.
HBase uses HFile as the format to store the tables on HDFS. HFile stores the keys in a lexicographic order using row keys. It's a block indexed file format for storing key-value pairs. Block indexed means that the data is stored in a sequence of blocks and a separate index is maintained at the end of the file to locate the blocks. When a read request comes, the index is searched for the block location. Then the data is read from that block.
Regionserver maintains the inmemory copy of the table updates in memcache. In-memory copy is flushed to the disc periodically. Updates to HBase table is stored in HLog files which stores redo records. In case of region recovery, these logs are applied to the last commited HFile and reconstruct the in-memory image of the table. After reconstructing the in-memory copy is flushed to the disc so that the disc copy is latest.
Hbase keep the versions of your updates. The earlier version will be preserved along with the latest version. By default the number of preserved versions are 3. It is a new copy that is getting saved when you perform an update.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With