Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running count distinct over a column - Oracle SQL

I want to aggregate the DAYS column based on the running distinct counts of CLIENT_ID, but the catch is CLIENT_ID that were seen from the previous DAYS should not be counted. How to do this in Oracle SQL?

Based on the table below (let's call this table DAY_CLIENT):

DAY CLIENT_ID

1   10
1   11
1   12 
2   10
2   11
3   10
3   11
3   12
3   13
4   10

I want to get (let's call this table DAY_AGG):

DAYS CNT_CLIENT_ID

1    3
2    3
3    4
4    4

So, in day 1 there are 3 distinct client IDs. In day 2, there are still 3 because CLIENT_ID 10 & 11 were already found in day 1. In day 3, distinct clients became 4 because CLIENT_ID 13 is not found on previous days.

like image 864
Timothy Quiros Avatar asked Nov 28 '25 04:11

Timothy Quiros


1 Answers

Here's an alternative solution that may or may not be more performant than the other solutions:

WITH your_table AS (SELECT 1 DAY, 10 CLIENT_ID FROM dual UNION ALL
                    SELECT 1 DAY, 11 CLIENT_ID FROM dual UNION ALL
                    SELECT 1 DAY, 12 CLIENT_ID FROM dual UNION ALL
                    SELECT 2 DAY, 10 CLIENT_ID FROM dual UNION ALL
                    SELECT 2 DAY, 11 CLIENT_ID FROM dual UNION ALL
                    SELECT 3 DAY, 10 CLIENT_ID FROM dual UNION ALL
                    SELECT 3 DAY, 11 CLIENT_ID FROM dual UNION ALL
                    SELECT 3 DAY, 12 CLIENT_ID FROM dual UNION ALL
                    SELECT 3 DAY, 13 CLIENT_ID FROM dual UNION ALL
                    SELECT 4 DAY, 10 CLIENT_ID FROM dual)
SELECT DISTINCT DAY,
                COUNT(CASE WHEN rn = 1 THEN client_id END) OVER (ORDER BY DAY) num_distinct_client_ids
FROM   (SELECT DAY,
               client_id,
               row_number() OVER (PARTITION BY client_id ORDER BY DAY) rn
        FROM   your_table);

       DAY NUM_DISTINCT_CLIENT_IDS
---------- -----------------------
         1                       3
         2                       3
         3                       4
         4                       4

I recommend you test all the solutions against your data to see which one works best for you.

like image 126
Boneist Avatar answered Nov 30 '25 17:11

Boneist