Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative distinct count

Tags:

sql

presto

I am working on query to get cumulative distinct count of uids on daily basis.

Example : Say there are 2 uids (100,200) appeared on date 2016-11-01 and they also appeared on next day with new uid 300 (100,200,300) on 2016-11-02 At this point i want store cumulative count to be 3 not 5 as (user id 100 and 200 already appeared on past day ).

Input table:

    date            uid         
2016-11-01          100
2016-11-01          200
2016-11-01          300
2016-11-01          400         
2016-11-02          100
2016-11-02          200                 
2016-11-03          300
2016-11-03          400
2016-11-03          500
2016-11-03          600
2016-11-04          700

Expected query result:

date            daily_cumulative_count
2016-11-01              4   
2016-11-02              4
2016-11-03              6
2016-11-04              7

Till now i am able to get cumulative distinct count per day but it includes previous distinct uids from previous day as well.

SELECT 
  date, 
  SUM(count) OVER (
    ORDER BY date ASC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  )
FROM (
  SELECT 
    date, 
    COUNT(DISTINCT uid) AS count
  FROM sample_table
  GROUP by 1
)ORDER BY date DESC;

Any kind of help would be greatly appreciated.

like image 544
user1495744 Avatar asked Nov 09 '16 20:11

user1495744


People also ask

What is cumulative count?

The cumulative count function is the sum of all the counts generated so far. Mathematically it is represented as. Sk = ∑Ci for i = 1 to k. When k = 3, i.e., 3rd cumulative count is calculated by adding up the first 3 counts C1, C2, C3.

Is it count distinct or distinct count?

The correct syntax for using COUNT(DISTINCT) is: SELECT COUNT(DISTINCT Column1) FROM Table; The distinct count will be based off the column in parenthesis. The result set should only be one row, an integer/number of the column you're counting distinct values of.


1 Answers

A version similar to @stepan-blr but with the final result you are looking for

Version WITH:

WITH t as (
    SELECT uid
           , min(dt) fst_date 
    FROM input_table 
    GROUP BY uid
)
SELECT DISTINCT fst_date
                , count(uid) over (order by fst_date ) daily_cumulative_count
FROM t

Version SELECT FROM SELECT:

SELECT DISTINCT fst_date
                , count(uid) over (order by fst_date ) daily_cumulative_count
FROM (
    SELECT uid
           , min(dt) fst_date 
    FROM input_table 
    GROUP BY uid
     ) t
like image 155
Pin_Eipol Avatar answered Oct 14 '22 11:10

Pin_Eipol