Hbase column family

Tags:

hbase

Hbase documentation says that avoid creating more than 2-3 column families because Hbase does not handle more than 2-3 column families very well. The reason for this is compaction and flushing and hence the IO. However, if all my columns are always populated (for every row) then I think this reasoning is not that important, so, considering that my access to columns is completely random (I want to access any combination of columns) - can I have one column family -one column configuration (effectively trying to make it pure columnar).

There are many blogs/wikis explaining this but they all seem to contradict and add more confusion. I just don't seem to be able to digest the fact that Hbase prefers one column family, then what's the point of calling is a column store?

455

asked Mar 05 '12 14:03

PrakashT

1 Answers

Currently (though this is expected to change), all of the column families for a region are flushed together. This is the primary reason why people say "HBase doesn't do well with more than 2 or 3 column families". Consider two CF's, each with one column. Column A:A stores whole web page texts. Column B:B stores the number of words in the page. So every time we flush A:A (which will happen more often because A:A's data is far bigger), we also need to go through a whole separate file I/O juggling routing for column B:B, even though there is no need to- with B:B only holding numbers, I could go for months without flushing it.

If you store A and B in the same column family (A:A and A:B), you will probably see vastly better flush I/O performance, and because most HBase reads are purely from the memstore, you will probably find that read speeds are equivalent.

Also, and perhaps more importantly, if the cardinality of the columns is wildly different, then your regionservers will need to maintain useless mostly-empty files for your less-dense column families. This will never change.

All of this is available in the HBase Book.

So, as in all such performance situations, measure before deciding what the "correct" path is.

130

answered Nov 13 '22 06:11

Chris Shain

Related questions
                            
                                How to configure logging in Hadoop / HDP components?
                            
                                How can I read from one HBase instance but write to another?
                            
                                How can I increase read queries/second on my database?
                            
                                Should the HBase region server and Hadoop data node on the same machine?
                            
                                Is HBase stable and production-ready?
                            
                                HBase: how put/get knows which region server to write to?
                            
                                elasticsearch vs hbase/hadoop for realtime statistics
                            
                                Efficient way to delete multiple rows in HBase
                            
                                HBase : get(...) vs scan and in-memory table
                            
                                What are the advantages of multiple column families in HBase?
                            
                                How do I access HBase table in Hive & vice-versa?
                            
                                Enriching SparkContext without incurring in serialization issues
                            
                                Get error when I run Hbase shell
                            
                                Tuning Hive Queries That Uses Underlying HBase Table
                            
                                Spark with HBASE vs Spark with HDFS
                            
                                Loading csv data into Hbase [closed]
                            
                                How to run Hbase Java example?
                            
                                HBase Shell: get a list of the row keys
                            
                                Get HBase Row Keys in Range without Retrieving Data?
                            
                                Group by In HBase

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With