I have a non-clustered columnstore index on all columns a 40m record non-memory optimized table on SQL Server 2016 Enterprise Edition.
A query forcing the use of the columnstore index will perform significantly faster but the optimizer continues to choose to use the clustered index and other non-clustered indexes. I have lots of available RAM and am using appropriate queries against a dimensional model.
Why won't the optimizer choose the columnstoreindex? And how can I encourage its use (without using a hint)?
Here is a sample query not using columnstore:
SELECT
COUNT(*),
SUM(TradeTurnover),
SUM(TradeVolume)
FROM DWH.FactEquityTrade e
--with (INDEX(FactEquityTradeNonClusteredColumnStoreIndex))
JOIN DWH.DimDate d
ON e.TradeDateId = d.DateId
JOIN DWH.DimInstrument i
ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2
It takes 7 seconds without hint and a fraction of a second with the hint. The query plan without the hint is here. The query plan with the hint is here.
The create statement for the columnstore index is:
CREATE NONCLUSTERED COLUMNSTORE INDEX [FactEquityTradeNonClusteredColumnStoreIndex] ON [DWH].[FactEquityTrade]
(
[EquityTradeID],
[InstrumentID],
[TradingSysTransNo],
[TradeDateID],
[TradeTimeID],
[TradeTimestamp],
[UTCTradeTimeStamp],
[PublishDateID],
[PublishTimeID],
[PublishedDateTime],
[UTCPublishedDateTime],
[DelayedTradeYN],
[EquityTradeJunkID],
[BrokerID],
[TraderID],
[CurrencyID],
[TradePrice],
[BidPrice],
[OfferPrice],
[TradeVolume],
[TradeTurnover],
[TradeModificationTypeID],
[InColumnStore],
[TradeFileID],
[BatchID],
[CancelBatchID]
)
WHERE ([InColumnStore]=(1))
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO
Update. Plan using Count(EquityTradeID) instead of Count(*) and with hint included
Columnstore indexes give high performance gains for analytic queries that scan large amounts of data, especially on large tables. Use columnstore indexes on data warehousing and analytics workloads, especially on fact tables, because they tend to require full table scans rather than table seeks.
Columnstore index is a new type of index introduced in SQL Server 2012. It is a column-based non-clustered index geared toward increasing query performance for workloads that involve large amounts of data, typically found in data warehouse fact tables.
Columnstore indexes achieve up to 10x greater data compression than rowstore indexes. This greatly reduces the I/O required to execute analytics queries and therefore improves query performance.
Columnstore indexes work well when the data is stable. Queries should be updating and deleting less than 10% of the rows.
You're asking SQL Server to choose a complicated query plan over a simple one. Note that when using the hint, SQL Server has to concatenate the columnstore index with a rowstore non-clustered index (IX_FactEquiteTradeInColumnStore
). When using just the rowstore index, it can do a seek (I assume TradeDateId
is the leading column on that index). It does still have to do a key lookup, but it's simpler.
I can see two options to get this behavior without a hint:
First, remove InColumnStore
from the columnstore index definition and cover the entire table. That's what you're asking from the columnstore - to cover everything.
If that's not possible, you can use a UNION ALL
to explicitly split the data:
WITH workaround
AS (
SELECT TradeDateId
, instrumentid
, TradeTurnover
, TradeVolume
FROM DWH.FactEquityTrade
WHERE InColumnStore = 1
UNION ALL
SELECT TradeDateId
, instrumentid
, TradeTurnover
, TradeVolume
FROM DWH.FactEquityTrade
WHERE InColumnStore = 0 -- Assuming this is a non-nullable BIT
)
SELECT COUNT(*)
, SUM(TradeTurnover)
, SUM(TradeVolume)
FROM workaround e
JOIN DWH.DimDate d
ON e.TradeDateId = d.DateId
JOIN DWH.DimInstrument i
ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With