Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does Hbase store data?

I'm new to HBase. Currently I'm using hortonworks sandbox hdp2. While studying Hbase, I came across some questions.

  1. Where does hbase stores data?

  2. If it stores on HDFS, then how it perform update operation, as hdfs is write once & read many times

like image 849
Vijay_Shinde Avatar asked Aug 24 '15 06:08

Vijay_Shinde


People also ask

Where is HBase table data stored?

All HRegion metadata of HBase is stored in the . META. table.

How HBase data is stored?

Just like in a Relational Database, data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.

Does HBase use HDFS for storage?

Key Differences Between HDFS and HBase HDFS is a distributed file system that is well suited for the storage of large files. But HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.

What is part of the directory name where HBase data is stored?

The column family is part of the directory name where the Hbase data is stored.


1 Answers

By default Hbase stores the data in HDFS. It is possible to run HBase over other distributed file systems like Amazon s3, GFS etc. We can't edit hdfs, but we can append data to HDFS. HDFS supports append feature.

HBase uses HFile as the format to store the tables on HDFS. HFile stores the keys in a lexicographic order using row keys. It's a block indexed file format for storing key-value pairs. Block indexed means that the data is stored in a sequence of blocks and a separate index is maintained at the end of the file to locate the blocks. When a read request comes, the index is searched for the block location. Then the data is read from that block.

Regionserver maintains the inmemory copy of the table updates in memcache. In-memory copy is flushed to the disc periodically. Updates to HBase table is stored in HLog files which stores redo records. In case of region recovery, these logs are applied to the last commited HFile and reconstruct the in-memory image of the table. After reconstructing the in-memory copy is flushed to the disc so that the disc copy is latest.

Hbase keep the versions of your updates. The earlier version will be preserved along with the latest version. By default the number of preserved versions are 3. It is a new copy that is getting saved when you perform an update.

like image 174
Amal G Jose Avatar answered Sep 23 '22 17:09

Amal G Jose