I have the following database structure, stored in a relational database:
A developer is using my data to create an application that utilizes a columnar database. They have been having issues with performance, and when I suggested adding indexes / keys to their tables, they said that indexing a columnar database does not improve performance. As a result, they are asking me to combine the fact tables with the dimension tables.
This seems to contradict what I know about the fundamental principles of database management. Is it true that columnar databases cannot use indexes to improve performance? What steps should be taken to optimize columnar performance?
I am seeking high-level information, but for the sake of completeness, the relational database is Teradata, and the columnar database is SAP HANA.
At a high level, the difference between relational and columnar databases is in how the data is stored. Relational DB's store records by the row, columnar by the column.
For example: Records:
Name ID number zip code
smith 4444 98210
jones 1234 10125
An RDBMS stores this is blocks by record: smith, 4444, 98210
and jones, 1234, 10125
The columnar DB stores this in blocks by column: smith, jones
and 4444, 1234
and 98210, 10125
You can create indexes. In HANA there are UNIQUE, BTREE, CPBTREE indexes. Unique indexes on a unique value - like primary keys in RDBMS, BTree is a binary search tree index and CPBTREE is the compressed prefix B+ tree index.
However, it is important to evaluate the performance problem before creating indexes hoping for a fix. Look at the logs, analyze the DB and find out what is causing the slow performance. The comment "a developer is using my data to create an application that uses a columnar database" is likely the crux of the problem. The way the data is stored and retrieved in each database type is completely different. RDBMS are better suited to transactional data. So if this app is taking advantage of a columnar database then it is better suited to efficiently searching for specific data in large amounts of data -- as only the impacted columns need to be loaded, not the entire record.
This application may not run correctly simply due to the different DB structure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With