Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wide column vs column family vs columnar vs column oriented DB definition

There are lots of confusions among these terms. I'd like to through my understanding out and see if people agree. I have seen conflicting and wrong definition all over the web.

In my mind, wide column and column family DB are essentially the same thing. They are

  1. the data are organized logically by a group of key-value pairs (each one called column);
  2. is identified by a unique row key;
  3. each row can have variable length or definition of columns and
  4. stored on disk one row after another. So column family (wide column) table is similar to relational DB's table in that they are organized as rows still.

The main difference is they it doesn't have fixed schema for columns and can't do table join obviously.

An example of 3 rows (column families): each row has different length and/or columns., but on disk rowkey1's entire content is a continuous line followed by other rows similar to relational DB

rowkey1 k1-v k2-v k3-v

rowkey2 k1-v k4v

rowkey3 k2-v k4-v k5-v

On the other hand, the term columnar DB is the same column oriented DB. They are stored on disk one column at a time, not one row at a time. It is great for time series or any multi series analytical purpose. The fact each column has the same type of data and is stored together allows for better data compression as an added bonus.

an example:

enter image description here

on disk:

a:1 b:2 c:3 d:4

10:1 9:2 8:3 7:4

like image 306
Kenneth Avatar asked Jul 30 '20 18:07

Kenneth


Video Answer


1 Answers

The definition from Wikipedia also helps further:

Wide-column stores such as Bigtable and Apache Cassandra are not column stores in the original sense of the term, since their two-level structures do not use a columnar data layout. In genuine column stores, a columnar data layout is adopted such that each column is stored separately on disk. Wide-column stores do often support the notion of column families that are stored separately. However, each such column family typically contains multiple columns that are used together, similar to traditional relational database tables. Within a given column family, all data is stored in a row-by-row fashion, such that the columns for a given row are stored together, rather than each column being stored separately. Wide-column stores that support column families are also known as column family databases.

Reference: https://en.wikipedia.org/wiki/Wide-column_store

like image 131
srth12 Avatar answered Oct 18 '22 10:10

srth12