I have a table with ~200 million rows and ~15 columns in it. I am planning to create the column store index on my table. Will there be any change in performance based on the order of columns that I use in the column store index? If Yes, what is the logic behind it?
The order of the columns in a composite index does matter on how a query against a table will use it or not. A query will use a composite index only if the where clause of the query has at least the leading/left most columns of the index in it.
Using ORDER BY on indexed column is not a good idea. Actually the purpose of using index is to making searching faster so the index column helps to maintain the data in sorted order.
The order of things in the WHERE does not matter; the order of the columns in an INDEX does matter, sometimes a lot.
Columnstore indexes are the standard for storing and querying large data warehousing fact tables. This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in your data warehouse over traditional row-oriented storage.
The most important aspect is to add ALL of the columns to the Columnstore index. I spoke to the MSFT product team about this, because I was skeptical about the claim that the 'column order doesn't matter'. But they confirmed that, so long as you add all of the table's columns, this is true.
I also tried partial columnstore indexes (i.e. adding only a subset of columns), while I was able to get the queries I tested to use that columnstore index, clearly the Query Optimizer is not built for this scenario, as the execution plans will be inconsitant and not always optimal, i.e. use columnstore and non-columnstore indices, etc..
No, it doesn't make any difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With