Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up a SQL Server query involving count(distinct())

I have a deceptively simple SQL Server query that's taking a lot longer than I would expect.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT COUNT(DISTINCT(guid)) FROM listens WHERE url='http://www.sample.com/'

'guid' is varchar(64) NULL

'url' is varchar(900) NULL

There is an index on guid and url.

There are over 7 million rows in the 'listens' table, of which 17,000 match the url in question, and the result of the query is 5,500.

It is taking over 1 minute to run this query on SQL Server 2008 on a fairly idle Dual-Core AMD Opteron 2GHz with 1GB RAM.

Any ideas how to get the execution time down? Ideally it should be under 1 second!

like image 682
Tim Norman Avatar asked Jan 23 '23 18:01

Tim Norman


2 Answers

Create an index on url which would cover the GUID:

CREATE INDEX ix_listens_url__guid ON listens (url) INCLUDE (guid)

When dealing with urls as identifiers, it is much better to store and index the URL hash rather than the whole URL.

like image 124
Quassnoi Avatar answered Jan 29 '23 12:01

Quassnoi


scaning indexes that large will take long no matter what.
what you need to do is to shorten the indexes.
what you can do is have an integer column where the checksum of the url is calculated and stored. this way your index will be narrow and count will be fast.

note that checksum is not unique but it's unique enough. here's a complete code example of how to do it. I've included checksums for both columns but it probably needs only one. you could also calculate the checksum on the insert or update by yourself and remove the trigger.

CREATE TABLE MyTable
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    [Guid] varchar(64),
    Url varchar(900),
    GuidChecksum int,
    UrlChecksum int
)
GO

CREATE TRIGGER trgMyTableCheckSumCalculation ON MyTable
FOR UPDATE, INSERT
as
UPDATE t1
SET    GuidChecksum = checksum(I.[Guid]),
       UrlChecksum = checksum(I.Url)
FROM   MyTable t1 
       join inserted I on t1.ID = I.ID

GO
CREATE NONCLUSTERED INDEX NCI_MyTable_GuidChecksum ON MyTable(GuidChecksum)
CREATE NONCLUSTERED INDEX NCI_MyTable_UrlChecksum ON MyTable(UrlChecksum)

INSERT INTO MyTable([Guid], Url)
select NEWID(), 'my url 1' union all
select NEWID(), 'my url 2' union all
select null, 'my url 3' union all
select null, 'my url 4'

SELECT  *
FROM    MyTable

SELECT  COUNT(GuidChecksum)
FROM    MyTable
WHERE   Url = 'my url 3'
GO

DROP TABLE MyTable
like image 20
Mladen Prajdic Avatar answered Jan 29 '23 10:01

Mladen Prajdic