Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a columnar database?

Tags:

sql

database

I have been working with warehousing for a while now.

I am intrigued by Columnar Databases and the speed that they have to offer for data retrievals.

I have multi-part question:

  • How do Columnar Databases work?
  • How do they differ from relational databases?
like image 674
Raj More Avatar asked Jan 25 '10 14:01

Raj More


People also ask

What is a columnar in database?

A columnar database is a database management system (DBMS) that stores data in columns instead of rows. The purpose of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.

What is an example of column based database?

What are the Top Column-Oriented Databases: MariaDB, CrateDB, ClickHouse, Greenplum Database, Apache Hbase, Apache Kudu, Apache Parquet, Hypertable, MonetDB are some of the Top Column-Oriented Databases.

Is SQL a columnar database?

Most modern analytical and reporting relational databases such as Amazon Redshift, SAP HANA and Actian's Vector database use columnar storage. In fact, even Oracle and Microsoft SQL Server, traditionally row-based databases, provide columnar capabilities (at a cost) to improve query performance.

Where is columnar database used?

Columnar databases are used in data warehouses where businesses send massive amounts of data from multiple sources for BI analysis. Column oriented databases have faster query performance because the column design keeps data closer together, which reduces seek time.


2 Answers

How do columnar databases work? The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:

SNO  STATUS CITY    SNAME ---  ------ ----    ----- S1       20 London  Smith S2       10 Paris   Jones S3       30 Paris   Blake S4       20 London  Clark S5       30 Athens  Adams 

would be stored on disk or in memory something like:

S1S2S3S4S5;2010302030;LondonParisParisLondonAthens;SmithJonesBlakeClarkAdams  

This is in contrast to a traditional rowstore which would store the data more like this:

S120LondonSmith;S210ParisJones;S330ParisBlake;S420LondonClark;S530AthensAdams 

From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.

How do they differ from relational databases? A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.

like image 103
Paul Mansour Avatar answered Oct 08 '22 20:10

Paul Mansour


How do Columnar Databases work?
Columnar database is a concept rather a particular architecture/implementation. In other words, there isn't one particular description on how these databases work; indeed, several are build upon traditional, row-oriented, DBMS, simply storing the info in tables with one (or rather often two) columns (and adding the necessary layer to access the columnar data in an easy fashion).

How do they differ from relational databases? They generally differ from traditional (row-oriented) databases with regards to ...

  • performance...
  • storage requirements ...
  • ease of modification of the schema ...

...in specific use cases of DBMSes.
In particular they offer advantages in the areas mentioned when the typical use is to compute aggregate values on a limited number of columns, as opposed to try and retrieve all/most columns for a given entity.

Is there a trial version of a columnar database I can install to play around? (I am on Windows 7) Yes, there are commercial, free and also open-source implementation of columnar databases. See the list at the end of the Wikipedia article for starter.
Beware that several of these implementations were introduced to address a particular need (say very small footprint, highly compressible distribution of data, or spare matrix emulation etc.) rather than provide a general purpose column-oriented DBMS per-se.

Note: The remark about the "single purpose orientation" of several columnar DBMSes is not a critique of these implementations, but rather an additional indication that such an approach for DBMSes strays from the more "natural" (and certainly more broadly used) approach to storing record entities. As a result, this approach is used when the row-oriented approach isn't satisfactory, and therefore and tends to
a) be targeted for a particular purpose b) receive less resources/interest than work on "General Purpose", "Tried and Tested", tabular approach.

Tentatively, the Entity-Attribute-Value (EAV) data model, may be an alternative storage strategy which you may want to consider. Although distinct from the "pure" Columnar DB model, EAV shares several of the characteristics of Columnar DBs.

like image 42
mjv Avatar answered Oct 08 '22 20:10

mjv