Counting DISTINCT over multiple columns

People also ask

How do I count distinct values in multiple columns in SQL?

select count(distinct col1 || '-' || col2) from mytable; or use a subquery: select count(*) from (select distinct col1, col2 from mytable);

Can I use distinct with multiple columns?

Answer. Yes, the DISTINCT clause can be applied to any valid SELECT query. It is important to note that DISTINCT will filter out all rows that are not unique in terms of all selected columns.

How do I count distinct columns?

The COUNT DISTINCT function returns the number of unique values in the column or expression, as the following example shows. SELECT COUNT (DISTINCT item_num) FROM items; If the COUNT DISTINCT function encounters NULL values, it ignores them unless every value in the specified column is NULL.

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.

Edit: Altered from the less-than-reliable checksum-only query I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp
  2  order by deptno, job
  3  /

    DEPTNO JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

9 rows selected.


SQL> select count(*) from (
  2  select distinct deptno, job from emp
  3  )
  4  /

  COUNT(*)
----------
         9

SQL>

edit

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp
  2  /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
                                9

SQL>

edit 2

Given the following data the concatenating solution provided above will miscount:

col1  col2
----  ----
A     AA
AA    A

So we to include a separator...

select col1 + '*' + col2 from t23
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.

SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

In MySQL you can do the same thing without the concatenation step as follows:

SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;

This feature is mentioned in the MySQL documentation:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct

How about something like:

select count(*)
from
  (select count(*) cnt
   from DocumentOutputItems
   group by DocumentId, DocumentSessionId) t1

Probably just does the same as you are already though but it avoids the DISTINCT.

Many (most?) SQL databases can work with tuples like values so you can just do: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems; If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g. COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId)).

A related use of tuples is performing IN queries such as: SELECT * FROM DocumentOutputItems WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));

Related questions
                            
                                How do I create a foreign key in SQL Server?
                            
                                SQL JOIN and different types of JOINs
                            
                                Fastest way to count exact number of rows in a very large table?
                            
                                SQL WITH clause example [duplicate]
                            
                                How to speed up insertion performance in PostgreSQL
                            
                                How to list active connections on PostgreSQL?
                            
                                Is there a performance difference between CTE , Sub-Query, Temporary Table or Table Variable?
                            
                                Is it possible to GROUP BY multiple columns using MySQL?
                            
                                Can I use multiple "with"?
                            
                                How to list table foreign keys
                            
                                How to randomly select rows in SQL?
                            
                                SQL select join: is it possible to prefix all columns as 'prefix.*'?
                            
                                WHERE vs HAVING
                            
                                Storing JSON in database vs. having a new column for each key
                            
                                Need to list all triggers in SQL Server database with table name and table's schema
                            
                                ERROR: permission denied for sequence cities_id_seq using Postgres
                            
                                SQL: How to properly check if a record exists
                            
                                How do you import a large MS SQL .sql file?
                            
                                Generate class from database table
                            
                                How do you return the column names of a table?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Counting DISTINCT over multiple columns

Tags:

performance

sql

sql-server

tsql

query-optimization

People also ask

Recent Activity

Donate For Us