I have a Clustered Columnstore Index Table for our IOT metrics (timeseries data). It contains more than 1 billion rows and structured like this: <pre class="prettyprint"><code>CREATE TABLE [dbo].[Data]( [DeviceId] [bigint] NOT NULL, [MetricId] [smallint] NOT NULL, [TimeStamp] [datetime2](2) NOT NULL, [Value] [real] NOT NULL ) CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([TimeStamp],[DeviceId],[MetricId]) --WITH (DROP_EXISTING = ON) CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) </code></pre> There are some 10,000 distinct DeviceId values and TimeStamps range from 2008 till now. A typical query against this table looks like this: <pre class="prettyprint"><code>SET STATISTICS TIME, IO ON SELECT [DeviceId] ,[MetricId] ,DATEADD(hh, DATEDIFF(day, '2005-01-01', [TimeStamp]), '2005-01-01') As [Date] ,MIN([Value]) as [Min] ,MAX([Value]) as [Max] ,AVG([Value]) as [Avg] ,SUM([Value]) as [Sum] ,COUNT([Value]) as [Count] FROM [dbo].[Data] WHERE [DeviceId] = 6077129891325167032 AND [MetricId] = 1000 AND [TimeStamp] BETWEEN '2017-07-01' AND '2017-07-30' GROUP BY [DeviceId] ,[MetricId] ,DATEDIFF(day, '2005-01-01', [TimeStamp]) ORDER BY [DeviceId] ,[MetricId] ,DATEDIFF(day, '2005-01-01', [TimeStamp]) </code></pre> When I execute this query, I get this for performance metrics: Because at the moment a query like stated above does too many Segment reads I believe: <pre class="prettyprint"><code>Table 'Data'. Scan count 2, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 5257, lob physical reads 9, lob read-ahead reads 4000. Table 'Data'. Segment reads 11, segment skipped 764. </code></pre> Query plan: <img src="https://i.stack.imgur.com/m72lC.png" alt="Query plan"> This is not well optimized I believe as there were 11 segments read to retrieve only 212 out of 1 billion source rows (before grouping/aggregation) So then I ran Niko Neugebauer's great scripts to validate our setup and the Columnstore Alignment https://github.com/NikoNeugebauer/CISL/blob/master/Azure/alignment.sql, I get this result after rebuilding the Columnstore Clustered Index: <img src="https://i.stack.imgur.com/jjuKH.png" alt="Columnstore Alignment"> MetricId and TimeStamp columns have optimal alignment score of 100%. How can we ensure that the DeviceId column is also well aligned? I played with the column order in the initial Clustered (Rowstore) index, is that where things can be optimized?

the key solution to align your table by DeviceId is to build a clustered rowstore index on your table and then build over it a clustered Columnstore Index with MAXDOP = 1 (in order not to introduce any overlaps which take place when index build runs with multiple cores). So the possible code will look something like this: <pre class="prettyprint"><code>CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([DeviceId],[TimeStamp],[MetricId]) --WITH (DROP_EXISTING = ON) CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE) </code></pre> Another possibility would be to do it all within CISL, by preparing and then executing the alignment funcitonality: <pre class="prettyprint"><code>insert into dbo.cstore_Clustering( TableName, Partition, ColumnName ) VALUES ('[dbo].[Data]', 1, 'DeviceId' ); </code></pre> This is though just for 1 partition, but you should consider partitioning your table anyway, once you get into the numbers you are using. After setting up, you can start executing dbo.cstore_doAlignment, which will automatically re-align and optimise your table. (You will have some parameters to configure the threshold of the optimisations, if you like) Best regards, Niko

How to optimize SQL Server Columnstore Alignment

Tags:

sql-server

azure-sql-database

time-series

columnstore

I have a Clustered Columnstore Index Table for our IOT metrics (timeseries data). It contains more than 1 billion rows and structured like this:

CREATE TABLE [dbo].[Data](
[DeviceId] [bigint] NOT NULL,
[MetricId] [smallint] NOT NULL,
[TimeStamp] [datetime2](2) NOT NULL,
[Value] [real] NOT NULL
)

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([TimeStamp],[DeviceId],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

There are some 10,000 distinct DeviceId values and TimeStamps range from 2008 till now. A typical query against this table looks like this:

SET STATISTICS TIME, IO ON
SELECT
    [DeviceId]
    ,[MetricId]
    ,DATEADD(hh, DATEDIFF(day, '2005-01-01', [TimeStamp]), '2005-01-01') As [Date]
    ,MIN([Value]) as [Min]
    ,MAX([Value]) as [Max]
    ,AVG([Value]) as [Avg]
    ,SUM([Value]) as [Sum]
    ,COUNT([Value]) as [Count]
FROM
    [dbo].[Data]
WHERE
    [DeviceId] = 6077129891325167032
    AND [MetricId] = 1000
    AND [TimeStamp] BETWEEN '2017-07-01' AND '2017-07-30'
GROUP BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])
ORDER BY
    [DeviceId]
    ,[MetricId]
    ,DATEDIFF(day, '2005-01-01', [TimeStamp])

When I execute this query, I get this for performance metrics:

Because at the moment a query like stated above does too many Segment reads I believe:

Table 'Data'. Scan count 2, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 5257, lob physical reads 9, lob read-ahead reads 4000.
Table 'Data'. Segment reads 11, segment skipped 764.

Query plan:

This is not well optimized I believe as there were 11 segments read to retrieve only 212 out of 1 billion source rows (before grouping/aggregation)

So then I ran Niko Neugebauer's great scripts to validate our setup and the Columnstore Alignment https://github.com/NikoNeugebauer/CISL/blob/master/Azure/alignment.sql, I get this result after rebuilding the Columnstore Clustered Index:

Columnstore Alignment

MetricId and TimeStamp columns have optimal alignment score of 100%. How can we ensure that the DeviceId column is also well aligned? I played with the column order in the initial Clustered (Rowstore) index, is that where things can be optimized?

420

asked Aug 05 '17 07:08

Ted van der Veen

2 Answers

the key solution to align your table by DeviceId is to build a clustered rowstore index on your table and then build over it a clustered Columnstore Index with MAXDOP = 1 (in order not to introduce any overlaps which take place when index build runs with multiple cores). So the possible code will look something like this:

CREATE CLUSTERED INDEX [PK_Data] ON [dbo].[Data] ([DeviceId],[TimeStamp],[MetricId]) --WITH (DROP_EXISTING = ON)
CREATE CLUSTERED COLUMNSTORE INDEX [PK_Data] ON [dbo].[Data] WITH (DROP_EXISTING = ON, MAXDOP = 1, DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)

Another possibility would be to do it all within CISL, by preparing and then executing the alignment funcitonality:

insert into dbo.cstore_Clustering( TableName, Partition, ColumnName )
    VALUES ('[dbo].[Data]', 1, 'DeviceId' );

This is though just for 1 partition, but you should consider partitioning your table anyway, once you get into the numbers you are using. After setting up, you can start executing dbo.cstore_doAlignment, which will automatically re-align and optimise your table. (You will have some parameters to configure the threshold of the optimisations, if you like)

Best regards, Niko

answered Sep 20 '22 10:09

Niko Neugebuer

When Max dop set as 1 for creating clustered column store will sort your records effectively, but for a table with 1 billion rows this max dop 1 will not help u. It is better to partition the table with any of the date columns and then create a clustered column store index with max dop any thing higher than 1 or 0. But in that case sorting will not be guaranteed but clustered column store index will do the segment elimination effectively. One should clearly noted, do not drop or create clustered column store index by keeping any other non clustered index in your table, this will impact your clustered column store index creation/drop performance. If you drop a clustered column store index by keeping other index SQL server will do lot of works on the other indexes.

answered Sep 19 '22 10:09

Kannan.C

Related questions
                            
                                Left join to same table SQL
                            
                                Create index failed because of ANSI_Padding
                            
                                USING Common Table Expression and perform multiple update commands
                            
                                Execute sql-server transaction using python
                            
                                Source control in SSIS and Concurrent work on dtsx file
                            
                                How to get values alternate for ROW_NUMBER()?
                            
                                List row count of each view and table
                            
                                SQL Server could not find stored procedure 'show'
                            
                                How to make LIKE '%Search% faster in SQL Server
                            
                                Response data truncation when running SP through job
                            
                                How to insert NULL into SQL Server DATE field *from XML*
                            
                                Check if column value is zero MS SQL Server
                            
                                Trigger: How does the inserted table work? How to access its rows?
                            
                                SQL Output Multiple Local Variables into one Column
                            
                                How to assign 2 default values to SQL table column?
                            
                                Using INSERT and/or UPDATE together from a single CTE
                            
                                After Installing Visual Studio 2017 I am unable to debug in SQL Server Management Studio (SSMS)
                            
                                SSIS steps to load CSV from Azure blob to Azure SQL
                            
                                Why does the FOR Clause not work with an alias in SQL Server 2016 with temporal tables?
                            
                                SQL Server Compare Schema in Visual Studio 2017

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With