Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Statistics and Indexes - How detailed are they?

Do the statistics (which help decide whether an index is to be used) take into account the number of rows per actual column value, or does it just use the average number of rows per value.

Suppose I have a table with an bit column called active which has a million of rows, but with 99.99% set to false. If I have an index on this column, then is Sql smart enough to know to use the index if searching for active=1 but that there is no point if searching for active=0.

Another example, if I have a table which has say 1,000,000 records with a indexed column which contains about 50,000 different values with an average number of rows per value of 10, but then one special value which has 500,000 rows. The index may not be useful if searching for this special record, but would be very useful when looking for any of the other codes.

But does this special case ruin the effectiveness of the index.

like image 393
sgmoore Avatar asked Oct 22 '10 17:10

sgmoore


1 Answers

You can see for yourself:

CREATE TABLE IndexTest (
Id int not null primary key identity(1,1),
Active bit not null default(0),
IndexedValue nvarchar(10) not null
)

CREATE INDEX IndexTestActive ON IndexTest (Active)
CREATE INDEX IndexTestIndexedValue ON IndexTest (IndexedValue)

DECLARE @values table
(
    Id int primary key IDENTITY(1, 1),
    Value nvarchar(10)
)

INSERT INTO @values(Value) VALUES ('1')
INSERT INTO @values(Value) VALUES ('2')
INSERT INTO @values(Value) VALUES ('3')
INSERT INTO @values(Value) VALUES ('4')
INSERT INTO @values(Value) VALUES ('5')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')
INSERT INTO @values(Value) VALUES ('Many')

DECLARE @rowCount int
SET @rowCount = 100000

WHILE(@rowCount > 0)
BEGIN
    DECLARE @valueIndex int
    SET @valueIndex = CAST(RAND() * 10 + 1 as int)
    DECLARE @selectedValue nvarchar(10)
    SELECT @selectedValue = Value FROM @values WHERE Id = @valueIndex
    DECLARE @isActive bit
    SELECT @isActive = CASE 
            WHEN RAND() < 0.001 THEN 1 
            ELSE 0
           END
    INSERT INTO IndexTest(Active, IndexedValue) VALUES (@isActive, @selectedValue)
    SET @rowCount = @rowCount - 1
END

SELECT count(*) FROM IndexTest WHERE Active = 1
SELECT count(*) FROM IndexTest WHERE Active = 0

SELECT count(*) FROM IndexTest WHERE IndexedValue = '1'
SELECT count(*) FROM IndexTest WHERE IndexedValue = 'Many'

It looks to me like it always uses the indexes on this query plan:

query plan

like image 110
codekaizen Avatar answered Sep 22 '22 06:09

codekaizen