Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does HBase need to store the Column Family for every Value?

Because HBase tables are sparse tables, HBase stores for every cell not only the value, but all the information required to identify the cell (often described as the Key, not to be confused with the RowKey). The Key looks as follows:

RowKey-ColumnFamily-ColumnQualifier-Timestamp

And all this information is stored for every entry. That's why there is the recommendation to use short names for Column Families and Column Qualifiers to reduce additional overhead.

My Question: Why do I need to store the ColumnFamily for every entry? From my understanding every Store File belongs to exactly one Column Family. Wouldn't it be enough to store the Column Family name once per Store File? This would reduce overhead, arbitrary Column Family names could be used and we would still be able to identify the Column Family for every entry. What am I missing here?

like image 734
user3793764 Avatar asked Nov 11 '22 05:11

user3793764


1 Answers

Like a relational database, tables in HBase consist of rows and columns. In HBase, the columns are grouped together in column families. This grouping is expressed logically as a layer in the map of maps. Column families are also expressed physically. Each column family gets its own set of HFiles on disk. This physical isolation allows the underlying HFiles of one column family to be managed in isolation of the others. As far as compactions are concerned, the HF iles for each column family are managed independently.

like image 95
Fatih Yakut Avatar answered Jan 04 '23 03:01

Fatih Yakut