Query for count of distinct values in a rolling date range

Tags:

I have a data set of email addresses and dates that those email addresses were added to a table. There can be multiple entries of an email address for various different dates. For example, if I have the data set below. I would be looking to get the date and count of distinct emails that we have between said date and 3 days ago.

Click to copy

Date   | email  
-------+----------------
1/1/12 | test@test.com
1/1/12 | test1@test.com
1/1/12 | test2@test.com
1/2/12 | test1@test.com
1/2/12 | test2@test.com
1/3/12 | test@test.com
1/4/12 | test@test.com
1/5/12 | test@test.com
1/5/12 | test@test.com
1/6/12 | test@test.com
1/6/12 | test@test.com
1/6/12 | test1@test.com

Result set would look something like this if we use a date period of 3

Click to copy

date   | count(distinct email)
-------+------
1/1/12 | 3
1/2/12 | 3
1/3/12 | 3
1/4/12 | 3
1/5/12 | 1
1/6/12 | 2

I can get a distinct count of a date range using the query below, but looking to get a count of a range by day so I do not have to manually update the range for hundreds of dates.

Click to copy

select test.date, count(distinct test.email)  
from test_table as test  
where test.date between '2012-01-01' and '2012-05-08'  
group by test.date;

669

asked May 11 '12 01:05

harold

2 Answers

Test case:

Click to copy

CREATE TABLE tbl (date date, email text);
INSERT INTO tbl VALUES
  ('2012-01-01', 'test@test.com')
, ('2012-01-01', 'test1@test.com')
, ('2012-01-01', 'test2@test.com')
, ('2012-01-02', 'test1@test.com')
, ('2012-01-02', 'test2@test.com')
, ('2012-01-03', 'test@test.com')
, ('2012-01-04', 'test@test.com')
, ('2012-01-05', 'test@test.com')
, ('2012-01-05', 'test@test.com')
, ('2012-01-06', 'test@test.com')
, ('2012-01-06', 'test@test.com')
, ('2012-01-06', 'test1@test.com`')
;

Query - returns only days where an entry exists in tbl:

Click to copy

SELECT date
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  date BETWEEN t.date - 2 AND t.date -- period of 3 days
      ) AS dist_emails
FROM   tbl t
WHERE  date BETWEEN '2012-01-01' AND '2012-01-06'  
GROUP  BY 1
ORDER  BY 1;

Or - return all days in the specified range, even if there are no rows for the day:

Click to copy

SELECT date
     ,(SELECT count(DISTINCT email)
       FROM   tbl
       WHERE  date BETWEEN g.date - 2 AND g.date
      ) AS dist_emails
FROM  (SELECT generate_series(timestamp '2012-01-01'
                            , timestamp '2012-01-06'
                            , interval  '1 day')::date) AS g(date);

db<>fiddle here

Result:

Click to copy

day        | dist_emails
-----------+------------
2012-01-01 | 3
2012-01-02 | 3
2012-01-03 | 3
2012-01-04 | 3
2012-01-05 | 1
2012-01-06 | 2

This sounded like a job for window functions at first, but I did not find a way to define the suitable window frame. Also, per documentation:

Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list.

So I solved it with correlated subqueries instead. I guess that's the smartest way.

BTW, "between said date and 3 days ago" would be a period of 4 days. Your definition is contradictory there.

Slightly shorter, but slower for few days:

Click to copy

SELECT g.date, count(DISTINCT email) AS dist_emails
FROM  (SELECT generate_series(timestamp '2012-01-01'
                            , timestamp '2012-01-06'
                            , interval  '1 day')::date) AS g(date)
LEFT   JOIN tbl t ON t.date BETWEEN g.date - 2 AND g.date
GROUP  BY 1
ORDER  BY 1;

Generating time series between two dates in PostgreSQL
Rolling count of rows withing time interval

178

answered Nov 07 '22 21:11

Erwin Brandstetter

A lateral join is useful for such "sliding window" needs, like this:

Click to copy

SELECT
       t.day
     , ljl.dist_emails
FROM   tbl t
LEFT JOIN LATERAL (
        SELECT
               count(DISTINCT email) as dist_emails
        FROM   tbl
        WHERE  day BETWEEN t.day - 2 AND t.day -- period of 3 days
       ) AS ljl ON TRUE
WHERE t.day BETWEEN '2012-01-01' AND '2012-01-06'

Note this is a variant to a previous query by Erwin Brandstetter, and it surprises me he hadn't suggested it, but these lateral joins excellent for this type of need.

answered Nov 07 '22 20:11

Paul Maxwell

Related questions
                            
                                How can I use SqlBulkCopy with binary data (byte[]) in a DataTable?
                            
                                "Create table if not exists" - how to check the schema, too?
                            
                                Max Row Size in SQL Server 2012 with varchar(max) fields
                            
                                Execute multiple semi-colon separated query using mysql Prepared Statement
                            
                                PostgreSQL function definition in SQuirreL: unterminated dollar-quoted string
                            
                                What is the difference when comparing with parentheses: WHERE (a, b)=(1,2)
                            
                                What is the difference between setting statement fetch size in JDBC or firing a SQL query with LIMIT clause?
                            
                                testing inequality with columns that can be null
                            
                                SQL Server Query log for failed/incorrect queries?
                            
                                How to check if a column is being updated in an INSTEAD OF UPDATE Trigger
                            
                                Calling Oracle stored procedure with output parameter from SQL Server
                            
                                Dynamic SQL Parameters with Anorm and Scala Play Framework
                            
                                How to fix "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS" error?
                            
                                Update VERY LARGE PostgreSQL database table efficiently
                            
                                Setting up a PHP web project, the infrastructure
                            
                                How to connect to a MySQL database from an iPhone?
                            
                                Quick SQL question: Correct syntax for creating a table with a primary key in H2?
                            
                                Best practice: best database naming convention for JPA? [closed]
                            
                                Select a portion from a MySQL Blob Field
                            
                                how much safe from SQL-Injection if using hibernate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Query for count of distinct values in a rolling date range

Tags:

date

sql

postgresql

count

rolling-computation

harold

People also ask

2 Answers

Erwin Brandstetter

Paul Maxwell

Recent Activity

Donate For Us