Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will HBase store column families for the same row in different machine?

Tags:

hbase

Column families for the same row belong to the same RegionServer. So, the question here is will a RegionServer store different column families in different machine?

like image 838
James Avatar asked Nov 22 '10 10:11

James


People also ask

What is the purpose of column family HBase?

An HBase table is made of column families which are the logical and physical grouping of columns. The columns in one family are stored separately from the columns in another family. If you have data that is not often queried, assign that data to a separate column family.

How many column families does HBase have?

Technically, HBase can manage more than three of four column families. However, you need to understand how column families work to make the best use of them.

Is HBase a column family database?

HBase is a column-oriented database and the tables in it are sorted by row. The table schema defines only column families, which are the key value pairs. A table have multiple column families and each column family can have any number of columns. Subsequent column values are stored contiguously on the disk.

How does HBase distribute data?

HBase stores rows of data in tables. Tables are split into chunks of rows called “regions”. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process.


1 Answers

Not neccessarily, but at some point it will. This is part of the basic HBase architecture. If you imaging a HBase table as being a spreadsheet, with its rows and columns, then a region spans multiple successive rows in one direction and all columns of one or more column family. This way, the whole sheet is covered with region tiles.

Each region is stored on one or more (typically three) cluster nodes. (If you'd loose all nodes containing a specific region at once you'd loose all the region's data. If you'd only loose one replica, HBase makes sure it is replicated to another node from the remaining copies.)

Now, when the data contained in a region grows too big, a region split is automatically initiated by HBase, resulting in two new regions, each containing on half of the data. Only through region splits (besides region replication) data gets distributed over a HBase cluster eventually.

Storing data for one row in different columns of the same column family assures that the data is stored together at one place.

like image 119
zillion1 Avatar answered Oct 02 '22 20:10

zillion1