COUNT (DISTINCT column_name) Discrepancy vs. COUNT (column_name) in SQL Server 2008?

Tags:

I'm running into a problem that's driving me nuts. When running the query below, I get a count of 233,769

 SELECT COUNT(distinct  Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)  
     ON Member_List_Link.UserID = MasterMembers.UserID   
  WHERE MasterMembers.Active = 1 And
        Member_List_Link.GroupID = 5 AND 
        MasterMembers.ValidUsers = 1 AND 
        Member_List_Link.Status = 1

But if I run the same query without the distinct keyword, I get a count of 233,748

 SELECT COUNT(Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)
   ON Member_List_Link.UserID = MasterMembers.UserID   
 WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5 
  AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1

To test, I recreated all the tables and place them into temp tables and ran the queries again:

  SELECT COUNT(distinct  #Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1

And without the distinct keyword

  SELECT COUNT(#Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1

On a side note, I recreated the temp tables by simply running (select * from Member_List_Link into #temp...)

And now when I check to see the difference between COUNT(column) vs. COUNT(distinct column) with these temp tables, I don't see any!

So why is there a discrepancy with the original tables?

I'm running SQL Server 2008 (Dev Edition).

UPDATE - Including statistics profile

PhysicalOp column only for the first query (without distinct)

NULL
Compute Scalar
Stream Aggregate
Clustered Index Seek

PhysicalOp column only for the first query (with distinct)

NULL
Compute Scalar
Stream Aggregate
Parallelism
Stream Aggregate
Hash Match
Hash Match
Bitmap
Parallelism
Index Seek
Parallelism
Clustered Index Scan

Rows and Executes for the 1st query (without distinct)

Rows and Executes for the 2nd query (with distinct)

Rows    Executes
1   1
0   0
1   1
16  1
16  16
233767  16
233767  16
281901  16
281901  16
281901  16
234787  16
234787  16

Adding OPTION(MAXDOP 1) to the 2nd query (with distinct)

Rows Executes

1           1
0           0
1           1
233767          1
233767          1
281901          1
548396          1

And the resulting PhysicalOp

NULL
Compute Scalar
Stream Aggregate
Hash Match
Hash Match
Index Seek
Clustered Index Scan

803

asked Sep 30 '11 17:09

Ray

1 Answers

FROM http://msdn.microsoft.com/en-us/library/ms187373.aspx NOLOCK Is equivalent to READUNCOMMITTED. For more information, see READUNCOMMITTED later in this topic.

READUNCOMMITED will read rows twice if they are the subject of a transation- since both the roll foward and roll back rows exist within the database when the transaction is IN process.

By default all queries are read committed which excludes uncommitted rows

When you insert into a temp table the select will give you only committed rows - I believe this covers all the symptoms you are trying to explain

131

answered Oct 21 '22 22:10

Ian P

Related questions
                            
                                Selecting many arbitrary columns in Slick
                            
                                Retrieving Data From Object Not Working in Ionic
                            
                                Bulk Insert with format file NOT skipping column in destination table with 146 fields as it should be
                            
                                SQL Alchemy Parametrized Query , binding table name as parameter gives error
                            
                                Is it possible to automatically force SQLCMD mode in script?
                            
                                How to create history fact table?
                            
                                Double IN Statements in SQL
                            
                                Postgres database create if not exists [duplicate]
                            
                                Why PostgreSQL queries are slower in the first request after first new connection than during the subsequent requests?
                            
                                MySQL how to write SQL to find excessive transactions in 15 minute windows?
                            
                                Insert into multiple tables
                            
                                mysql update off by one character
                            
                                What are advantages of capturing the Infomessages of SQL connections?
                            
                                Left join on a table with condition on others table
                            
                                Hibernate Join two unrelated table when both has Composite Primary Key
                            
                                Data Truncation issue while importing excel from Azure Blob storage to Sql Server
                            
                                How to format SQL inserts to align the comma separators?
                            
                                SQL user defined aggregate order of values preserved?
                            
                                Logging Django SQL queries with DEBUG set to False
                            
                                Suggestions for Querying Database for Names

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

COUNT (DISTINCT column_name) Discrepancy vs. COUNT (column_name) in SQL Server 2008?

Tags:

sql

sql-server

tsql

sql-server-2008

Ray

People also ask

1 Answers

Ian P

Recent Activity

Donate For Us