Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative Monthly Totals and Postgresql

Tags:

sql

postgresql

I am trying to calculate number of users, cumulatively for the dellstore2 database. Looking at answers here, and other forums, I used this

select 
date_trunc('month',orderdate),
sum(count(distinct(customerid)))
   over (order by date_trunc('month',orderdate))
from orders group by date_trunc('month',orderdate)

This returns

2004-01-01 00:00:00.0   979
2004-02-01 00:00:00.0   1,952
2004-03-01 00:00:00.0   2,922
2004-04-01 00:00:00.0   3,898
2004-05-01 00:00:00.0   4,873
2004-06-01 00:00:00.0   5,846
2004-07-01 00:00:00.0   6,827
2004-08-01 00:00:00.0   7,799
2004-09-01 00:00:00.0   8,765
2004-10-01 00:00:00.0   9,745
2004-11-01 00:00:00.0   10,710
2004-12-01 00:00:00.0   11,681

Each month is

979
973
970
976
975
973
981
972
966
980
965
971

It seems to be totaling fine, looking at the first few items. But when I ran

select count(distinct(customerid)) from orders

for the entire thing, I get

8996

which does not agree with the last item in the first output 11,681. I guess the calculation above cannot determine uniqueness across months. What is the fastest way for this calculation, preferably without using self-joins?

like image 938
BBSysDyn Avatar asked Jun 12 '12 11:06

BBSysDyn


1 Answers

Instead of selecting directly from orders, you could use a subquery like so:

SELECT  OrderDate,
        SUM(COUNT(DISTINCT customerid)) OVER (ORDER BY OrderDate)
FROM    (   SELECT  CustomerID, 
                    DATE_TRUNC('MONTH', MIN(OrderDate)) AS OrderDate
            FROM    Orders
            GROUP BY CustomerID
        ) AS Orders
GROUP BY OrderDate

I think this would work as required.

http://sqlfiddle.com/#!1/7a8cc/1

EDIT

If you still needed both methods (i.e. distinct and running total) you could use this:

SELECT  OrderDate,
        COUNT(DISTINCT CustomerID) AS MonthTotal,
        SUM(COUNT(DISTINCT customerid)) OVER (ORDER BY OrderDate) AS CumulativeTotal,
        SUM(COUNT(DISTINCT CASE WHEN OrderNumber = 1 THEN customerid END)) OVER (ORDER BY OrderDate) AS CumulativeDistinctTotal
FROM    (   SELECT  CustomerID, 
                    OrderDate,
                    ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS OrderNumber
            FROM    Orders
        ) AS Orders
GROUP BY OrderDate

Example here:

http://sqlfiddle.com/#!1/7a8cc/10

like image 106
GarethD Avatar answered Sep 30 '22 13:09

GarethD