I am using <code>count</code> and <code>group by</code> to get the number of subscribers registered each day: <pre class="prettyprint"><code> SELECT created_at, COUNT(email) FROM subscriptions GROUP BY created at; </code></pre> Result: <pre class="prettyprint"><code>created_at count ----------------- 04-04-2011 100 05-04-2011 50 06-04-2011 50 07-04-2011 300 </code></pre> I want to get the cumulative total of subscribers every day instead. How do I get this? <pre class="prettyprint"><code>created_at count ----------------- 04-04-2011 100 05-04-2011 150 06-04-2011 200 07-04-2011 500 </code></pre>

With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions. This is what it looks like: <pre class="prettyprint"><code>SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM subscriptions GROUP BY created_at; </code></pre> Here <code>OVER</code> creates the window; <code>ORDER BY created_at</code> means that it has to sum up the counts in <code>created_at</code> order. <hr> Edit: If you want to remove duplicate emails within a single day, you can use <code>sum(count(distinct email))</code>. Unfortunately this won't remove duplicates that cross different dates. If you want to remove all duplicates, I think the easiest is to use a subquery and <code>DISTINCT ON</code>. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one): <pre class="prettyprint"><code>SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM ( SELECT DISTINCT ON (email) created_at, email FROM subscriptions ORDER BY email, created_at ) AS subq GROUP BY created_at; </code></pre> If you create an index on <code>(email, created_at)</code>, this query shouldn't be too slow either. <hr> (If you want to test, this is how I created the sample dataset) <pre class="prettyprint"><code>create table subscriptions as select date '2000-04-04' + (i/10000)::int as created_at, 'foofoobar@foobar.com' || (i%700000)::text as email from generate_series(1,1000000) i; create index on subscriptions (email, created_at); </code></pre>

Count cumulative total in Postgresql

Tags:

sql

postgresql

aggregate-functions

I am using count and group by to get the number of subscribers registered each day:

  SELECT created_at, COUNT(email)       FROM subscriptions  GROUP BY created at;

Result:

created_at  count ----------------- 04-04-2011  100 05-04-2011   50 06-04-2011   50 07-04-2011  300

I want to get the cumulative total of subscribers every day instead. How do I get this?

created_at  count ----------------- 04-04-2011  100 05-04-2011  150 06-04-2011  200 07-04-2011  500

243

asked Apr 18 '11 04:04

khairul

2 Answers

With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.

This is what it looks like:

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM subscriptions GROUP BY created_at;

Here OVER creates the window; ORDER BY created_at means that it has to sum up the counts in created_at order.

Edit: If you want to remove duplicate emails within a single day, you can use sum(count(distinct email)). Unfortunately this won't remove duplicates that cross different dates.

If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at) FROM (     SELECT DISTINCT ON (email) created_at, email     FROM subscriptions ORDER BY email, created_at ) AS subq GROUP BY created_at;

If you create an index on (email, created_at), this query shouldn't be too slow either.

(If you want to test, this is how I created the sample dataset)

create table subscriptions as    select date '2000-04-04' + (i/10000)::int as created_at,           '[email protected]' || (i%700000)::text as email    from generate_series(1,1000000) i; create index on subscriptions (email, created_at);

183

answered Oct 01 '22 03:10

intgr

Use:

SELECT a.created_at,        (SELECT COUNT(b.email)           FROM SUBSCRIPTIONS b          WHERE b.created_at <= a.created_at) AS count   FROM SUBSCRIPTIONS a

answered Oct 01 '22 03:10

OMG Ponies

Related questions
                            
                                SQL SELECT from multiple tables
                            
                                How to detect if a string contains at least a number?
                            
                                Increasing the Command Timeout for SQL command
                            
                                I've caught an exception!! Now what?
                            
                                MYSQL syntax not evaluating not equal to in presence of NULL
                            
                                Select a column if other column is null
                            
                                Select * from subquery
                            
                                To ignore duplicate keys during 'copy from' in postgresql
                            
                                How to Generate Scripts For All Triggers in Database Using Microsoft SQL Server Management Studio
                            
                                How to rollback or commit a transaction in SQL Server
                            
                                How can I tell what edition of SQL Server runs on the machine?
                            
                                Select a random sample of results from a query result
                            
                                Passing multiple values for a single parameter in Reporting Services
                            
                                LISTAGG function: "result of string concatenation is too long"
                            
                                MySQL INSERT INTO ... VALUES and SELECT
                            
                                Why does MySQL allow "group by" queries WITHOUT aggregate functions?
                            
                                set default schema for a sql query
                            
                                Which SQL query is better, MATCH AGAINST or LIKE?
                            
                                Postgresql column reference "id" is ambiguous
                            
                                MySQL Trigger after update only if row has changed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With