Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

column based or row based for HBase

Tags:

hbase

I am wondering whether HBase is using column based storage or row based storage?

  • I read some technical documents and mentioned advantages of HBase is using column based storage to store similar data together to foster compression. So it means same columns of different rows are stored together;
  • But I also learned HBase is a sorted key-value map. It uses key to address all related columns for that key (row), so it seems to be a row based storage?

It is appreciated if anyone could clarify my confusions.

thanks in advance, George

like image 414
George2 Avatar asked Aug 05 '12 12:08

George2


People also ask

Is HBase column based?

Yes, Hbase is known to be a column oriented database (where the column data stay together), the data in HBase for a particular row stay together and the column data is spread and not together.

Why HBase is columnar database?

HBase is a column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which are the key value pairs. A table have multiple column families and each column family can have any number of columns. Subsequent column values are stored contiguously on the disk.

Is Apache HBase a columnar database?

Apache HBase is an open-source, column-oriented, distributed NoSQL database. HBase runs on the Apache Hadoop framework. HBase provides you a fault-tolerant, efficient way of storing large quantities of sparse data using column-based compression and storage.

Is HBase key value or columnar?

HBase is a key/ value store. HBase is specifically Sparse, Distributed, Multi-dimensional, sorted Maps and consistent.


2 Answers

George, here's a presentation I gave about understanding HBase schemas from HBaseCon 2012:

http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbasecon-2012.html

In short, each row in HBase is actually a key/value map, where you can have any number of columns (keys), each of which has a value. (And, technically, each of which can have multiple values with different timestamps).

Additionally, "column families" allow you to host multiple key/value maps in the same row, in different physical (on disk) files. This helps optimize in situations where you have sets of values that are usually accessed disjointly from other sets (so you have less stuff to read off disk). The trade off is that, of course, it's more work to read all the values in a row if you separate columns into two column families, because there are 2x the number of disk accesses needed.

Unlike more standard "column oriented" databases, I've never heard of anyone creating an HBase table that had a column family for every logical column. There's overhead associated with column families, and the general advice is usually to have no more than 3 or 4 of them. Column families are "design time" information, meaning you must specify them at the time you create (or alter) the table.

Generally, I find column families to be an advanced design option that you'd only use once you have a deep understanding of HBase's architecture and can show that it would be a net benefit.

So overall, while it's true that HBase can act in a "column oriented" way, it's not the default nor the most common design pattern in HBase. It's better to think of it as a row store with key/value maps.

like image 187
Ian Varley Avatar answered Jan 04 '23 09:01

Ian Varley


In addition to Ian's excellent answer, I would opine that HBase is both a row-based key-value, as well as a column-based key-value store (if you know the row-key).

If you prefer to think of it in terms of data structures, here's what a simple HBase table could look like:

'rowkey1' => {
    'c:col1' => 'value1',
    'c:col2' => 'value2',
},
'rowkey2' => {
    'c:col1' => 'value10',
    'c:col3' => 'value3'
}

Of course, you can also store even more complicated data-structures in it, as you can see from Ian's presentation.

like image 38
Suman Avatar answered Jan 04 '23 09:01

Suman