Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL distinct count query...can it be optimized with an EXISTS or something else?

Is there any way to make this query run faster (meaning take less reads/IO from SQL Server). The logic essentially is

  1. I count distinct values from a column
  2. if there is any more than 1 distinct value, it is considered as existing
  3. List is built with the name of the column and a 1 or 0 if it is existing

I would like to do something with EXISTS (in t-sql that terminates the scan of the table/index if SQL Server finds the match to the EXISTS predicate). I am not sure if that is possible in this query.

Note: I am not looking for answers like is there an index on the table...well beyond that :)

    with SomeCTE as
            (                   
    select 
    count(distinct(ColumnA)) as ColumnA,        
    count(distinct(ColumnB)) as ColumnB,        
    count(distinct(ColumnC)) as ColumnC
from VERYLARGETABLE
        )
select 'NameOfColumnA', case when ColumnA > 1 then 1 else 0 end from SomeCTE
UNION ALL
select 'NameOfColumnB', case when ColumnB > 1 then 1 else 0 end from SomeCTE
UNION ALL
select 'NameOfColumnC', case when ColumnC > 1 then 1 else 0 end from SomeCTE

Just to copy what I posted below in the comments. So after testing this solution. It makes the queries run "faster". To give two examples..one query went from 50 seconds to 3 seconds. Another went from 9+ minutes (stopped running it) went down to 1min03seconds. Also I am missing indexes (so according to DTA should run 14% faster) also I am running this in the SQL Azure DB (where you are being throttled drastically in terms of I/O, CPU and tempddb memory)...very nice solution all around. One downside is that min/max does not work on bit columns, but those can be converted.

like image 650
Bart Czernicki Avatar asked Feb 17 '23 15:02

Bart Czernicki


1 Answers

If the datatype of the columns allow aggregate functions and if there are indexes, one for every column, this will be fast:

SELECT 'NameOfColumnA' AS ColumnName,
       CASE WHEN MIN(ColumnA) < MAX(ColumnA)
                THEN 1 ELSE 0
       END AS result
FROM VERYLARGETABLE 

UNION ALL 

SELECT 'NameOfColumnB',
       CASE WHEN MIN(ColumnB) < MAX(ColumnB)
                THEN 1 ELSE 0
       END 
FROM VERYLARGETABLE 

UNION ALL 

SELECT 'NameOfColumnC' 
       CASE WHEN MIN(ColumnC) < MAX(ColumnC)
                THEN 1 ELSE 0
       END 
FROM VERYLARGETABLE ;
like image 163
ypercubeᵀᴹ Avatar answered Apr 26 '23 15:04

ypercubeᵀᴹ