Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does columnar database optimization differ from relational database optimization?

Tags:

I have the following database structure, stored in a relational database:

  • Two fact tables with ~80 million rows each
  • Three dimension tables that have between 300,000 - 500,000 rows
  • Both fact tables have 3 foreign keys that are used to join to the dimension tables
  • One security table also has 3 foreign keys that are used to join to the dimension tables

A developer is using my data to create an application that utilizes a columnar database. They have been having issues with performance, and when I suggested adding indexes / keys to their tables, they said that indexing a columnar database does not improve performance. As a result, they are asking me to combine the fact tables with the dimension tables.

This seems to contradict what I know about the fundamental principles of database management. Is it true that columnar databases cannot use indexes to improve performance? What steps should be taken to optimize columnar performance?

I am seeking high-level information, but for the sake of completeness, the relational database is Teradata, and the columnar database is SAP HANA.

like image 227
Nick Morgan Avatar asked Jul 31 '17 20:07

Nick Morgan


1 Answers

At a high level, the difference between relational and columnar databases is in how the data is stored. Relational DB's store records by the row, columnar by the column.

For example: Records:

Name          ID number        zip code
smith         4444             98210
jones         1234             10125

An RDBMS stores this is blocks by record: smith, 4444, 98210 and jones, 1234, 10125 The columnar DB stores this in blocks by column: smith, jones and 4444, 1234 and 98210, 10125

You can create indexes. In HANA there are UNIQUE, BTREE, CPBTREE indexes. Unique indexes on a unique value - like primary keys in RDBMS, BTree is a binary search tree index and CPBTREE is the compressed prefix B+ tree index.

However, it is important to evaluate the performance problem before creating indexes hoping for a fix. Look at the logs, analyze the DB and find out what is causing the slow performance. The comment "a developer is using my data to create an application that uses a columnar database" is likely the crux of the problem. The way the data is stored and retrieved in each database type is completely different. RDBMS are better suited to transactional data. So if this app is taking advantage of a columnar database then it is better suited to efficiently searching for specific data in large amounts of data -- as only the impacted columns need to be loaded, not the entire record.

This application may not run correctly simply due to the different DB structure.

like image 105
Lena Weber Avatar answered Oct 11 '22 21:10

Lena Weber