Slow distinct query in SQL Server over large dataset

Tags:

We're using SQL Server 2005 to track a fair amount of constantly incoming data (5-15 updates per second). We noticed after it has been in production for a couple months that one of the tables has started to take an obscene amount of time to query.

The table has 3 columns:

id -- autonumber (clustered)
typeUUID -- GUID generated before the insert happens; used to group the types together
typeName -- The type name (duh...)

One of the queries we run is a distinct on the typeName field:

SELECT DISTINCT [typeName] FROM [types] WITH (nolock);

The typeName field has a non-clusted, non-unique ascending index on it. The table contains approximately 200M records at the moment. When we run this query, the query took 5m 58s to return! Perhaps we're not understanding how the indexes work... But I didn't think we mis-understood them that much.

To test this a little further, we ran the following query:

SELECT DISTINCT [typeName] FROM (SELECT TOP 1000000 [typeName] FROM [types] WITH (nolock)) AS [subtbl]

This query returns in about 10 seconds, as I would expect, it's scanning the table.

Is there something we're missing here? Why does the first query take so long?

Edit: Ah, my apologies, the first query returns 76 records, thank you ninesided.

Follow up: Thank you all for your answers, it makes more sense to me now (I don't know why it didn't before...). Without an index, it's doing a table scan across 200M rows, with an index, it's doing an index scan across 200M rows...

SQL Server does prefer the index, and it does give a little bit of a performance boost, but nothing to be excited about. Rebuilding the index did take the query time down to just over 3m instead of 6m, an improvement, but not enough. I'm just going to recommend to my boss that we normalize the table structure.

Once again, thank you all for your help!!

557

asked Apr 16 '09 06:04

Miquella

1 Answers

You do misunderstand the index. Even if it did use the index it would still do an index scan across 200M entries. This is going to take a long time, plus the time it takes to do the DISTINCT (causes a sort) and it's a bad thing to run. Seeing a DISTINCT in a query always raises a red flag and causes me to double check the query. In this case, perhaps you have a normalization issue?

183

answered Oct 22 '22 09:10

Al W

Related questions
                            
                                sql nested case statements
                            
                                What does a caret (^) do in a SQL query?
                            
                                Tracing Rails 3 SQL queries
                            
                                Performance of Tables vs. Views
                            
                                ARel mimic includes with find_by_sql
                            
                                Why is RAND() not producing random numbers?
                            
                                How to use @@ROWCOUNT in IF statement as well as within BEGIN..END block?
                            
                                How can I get a random cartesian product in PostgreSQL?
                            
                                How to view all the Metadata of columns of a table in oracle database?
                            
                                Using Oracle SQL, how does one output day number of week and day of week?
                            
                                How to Delete Records NOT IN
                            
                                What is the equivalent of "CASE WHEN THEN" (T-SQL) with Entity Framework?
                            
                                Output of adding an integer and a String in SQL Server
                            
                                Pandas is faster to load CSV than SQL
                            
                                How to create a dependency list for an object in Redshift?
                            
                                Transform table to one-hot-encoding of single column value
                            
                                PySpark / Spark Window Function First/ Last Issue
                            
                                Postgres Insert without ANY VALUES FOR COLUMNS. ALL ARE DEFAULT
                            
                                Either OR non-null constraints in MySQL
                            
                                Running total by grouped records in table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Slow distinct query in SQL Server over large dataset

Tags:

sql

sql-server

sql-server-2005

Miquella

People also ask

1 Answers

Al W

Recent Activity

Donate For Us