Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing indices for ranking in SQL Server

We've got a wide table that we're currently trying to optimize. The table has 50 columns (stats) that we eventually want to rank in descending order. Currently there's well over 5 million rows.

We're looking for ways to optimize this table both in terms of reducing complexity and improving read speed. Write speed is also important to us, but read is more critical. The ranks of these statistics should be as close to real time as possible with an optimal solution being one that ranks quickly on a per request basis (new rows are being added all the time and we want to show ranks for these rows as soon as possible.)

We're currently evaluating whether or not a vertical table layout would be a.) more performant, and b.) easier to work with.

Because the stats that are being inserted are not necessarily well defined, it's easier for us if they aren't hard coded into the table (hence the preference for a vertical table structure.)

Here's a look at our current table structure and query:

CREATE TABLE Stats 
(
    Id BIGINT PRIMARY KEY NOT NULL,
    UserId INT,
    Name VARCHAR(32) NOT NULL,
    Value DECIMAL(10,4) DEFAULT ((0)) NOT NULL,
    UpdatedAt DATETIME
);

CREATE INDEX Leaderboard__index ON Stats (Name, Value DESC);

SELECT
    Id,
    Name,
    Value,
    RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
FROM 
    Stats
ORDER BY 
    Value DESC

Typically we'd either be searching for top N rows for any given stat (like a leaderboard), or we'd be selecting a single UserId and getting the rank of all stats associated with that UserId.

The data is of considerable size (as I mentioned above, because there's a lot of rows and a lot of columns, a vertical table structure might be in the range of 250 million rows and will continue to grow.)

We're looking to fetch this data as fast as possible on whatever hardware is required, seconds is our target, as we're currently in the minutes range.

In a test of the vertical table structure we've inserted over 400,000 rows of data and the query above takes a little less than 3 minutes (though it also only took about 18 seconds less to rank 10,000 rows.)

I'd love to hear any suggestions. Thanks for your time!

like image 875
iLoch Avatar asked Jun 12 '16 06:06

iLoch


People also ask

What is index optimization in SQL Server?

This means that a non-clustered index can point to either a heap or a clustered index. If it's a clustered index, it's going to use a key as its pointer, and if it's a heap it's going to use row ID as its pointer. Non-clustered SQL Server indexes are most commonly used for searching columns.

Which index is faster in SQL Server?

A clustered index may be the fastest for one SELECT statement but it may not necessarily be correct choice. SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page.

Is it good to have multiple indexes on a table?

Standard indexes on a column can lead to substantial decreases in query execution times as shown in this article on optimizing queries. Multi-column indexes can achieve even greater decreases in query time due to its ability to move through the data quicker.


1 Answers

The index you have is not usefull for your window function because

1.To get ID column value, SQL may end up doing key lookups or even end up scanning whole other index if it crosses Tipping point.So your index may not be used at all.

2.You are ordering by val desc which requires a sort with no suitable index and may even endup spilling to TEMPDB

3.For one more interesting fragmenation aspect ,see below

Typically for a Window function to perform well,you will need a POCindex which means

P,O--Partition and order by columns should be in key clause
C--covering --columns you are including in select should be included

So for below query to work optimally.

SELECT
    Id,
    Name,
    Value,
    RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
FROM 
    Stats
ORDER BY 
    Value DESC

You will need below index

create index nci_test on dbo.table(name,value desc)
include(id)

There is one more issue with your index created with " value desc".

Normally in an index all values will be stored in Ascending order by default,but with this index you are asking to store in a reverse way which can cause logical fragmentation which can be seen from answer of Martin Smith here ..Pasting relevant terms from the answer here ...

If the index is created with keys descending but new rows are appended with ascending key values then you can end up with every page out of logical order. This can severely impact the size of the IO reads when scanning the table and it is not in cache

So few options..

1.Run the index rebuild based on your frequency to see if it helps

2.Changing the query to order by partition clause will eliminate the need for index to be created with "val desc" option

SELECT
        Id,
        Name,
        Value,
        RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
    FROM 
        Stats
    ORDER BY 
        name DESC

The above query doesnt need an index to be created like the one you created .you can change it like below..which also takes care of Fragmentation aspects noted above

CREATE INDEX Leaderboard__index ON Stats (Name, Value)
include(id);

References:
Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions

like image 166
TheGameiswar Avatar answered Sep 23 '22 05:09

TheGameiswar