Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a table with lots of columns still an anti-pattern when using clustered column storage index in SQL Server 2014?

Reading about clustered column store index in SQL Server 2014, I'm wondering if having a table with a huge number of columns is still an anti-pattern. Currently to alleviate the problem of having a single table with lots of columns I'm using vertical partitioning but having clustered column store index available this shouldn't be needed. Is this correct or am I missing something?

Example: Lets take for example the log of performance counters, the raw data might have the following structure:

╔══════════════════╦═══════╦═══════╦═════╦═════╦═════╦══════════╗
║       Time       ║ Perf1 ║ Perf2 ║ ... ║ ... ║ ... ║ Perf1000 ║
╠══════════════════╬═══════╬═══════╬═════╬═════╬═════╬══════════╣
║ 2013-11-05 00:01 ║     1 ║     5 ║     ║     ║     ║        9 ║
║ 2013-11-05 00:01 ║     2 ║     9 ║     ║     ║     ║        9 ║
║ 2013-11-05 00:01 ║     3 ║     2 ║     ║     ║     ║        9 ║
║ 2013-11-05 00:01 ║     4 ║     3 ║     ║     ║     ║        9 ║
╚══════════════════╩═══════╩═══════╩═════╩═════╩═════╩══════════╝

Having such a table with 1000 columns is evil, because one row will most likely span more than one page, because usually it is unlikely that one will be interested in all measures but the query will always incur in the IO cost, etc.. etc.. To solve this vertical partitioning usually helps, for example one could partition performance counters in different tables by category (CPU, RAM, etc.).

Conversely having such a table as a clustered column store index should not be such a problem because data will be stored column-wise and the IO involved for every query will be about only the requested columns, nothing more regardless of the total number of columns in the table.

like image 859
marcob Avatar asked Nov 04 '13 12:11

marcob


People also ask

When should I use columnstore?

Columnstore indexes give high performance gains for analytic queries that scan large amounts of data, especially on large tables. Use columnstore indexes on data warehousing and analytics workloads, especially on fact tables, because they tend to require full table scans rather than table seeks.

Why is columnstore faster?

Columnstore indexes achieve up to 10x greater data compression than rowstore indexes. This greatly reduces the I/O required to execute analytics queries and therefore improves query performance. Columnstore indexes read compressed data from disk, which means fewer bytes of data need to be read into memory.

What is clustered column store index?

Columnstore index is a new type of index introduced in SQL Server 2012. It is a column-based non-clustered index geared toward increasing query performance for workloads that involve large amounts of data, typically found in data warehouse fact tables.

What columns should be indexed?

Primary key columns are typically great for indexing because they are unique and are often used to lookup rows.


1 Answers

It's certainly less "bad" than horizontal store but 1000 is pushing the limit a bit too far. Our data warehouse usually have tables with 100 - 200 columns and they zippy enough with column store index. Assuming you have perfect column store index, each query should only look at specific vertical index and hence very efficient. But if your column store indexes are not optimal for the query, SQL Server has to do some jumping between the indices and those are not good.

There's no rule of thumb about this. You will have to benchmark to answer this question in your specific environment.

like image 195
Code Different Avatar answered Oct 09 '22 03:10

Code Different