Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL sum of column value, unique per user per day

I have a postgres table that looks like this:

id | user_id | state | created_at

The state can be any of the following:

new, paying, paid, completing, complete, payment_failed, completion_failed

I need a statement that returns a report with the following:

  1. sum of all paid states by date
  2. sum of all completed states by date
  3. sum of all new, paying, completing states by date with only one per user per day to be counted
  4. sum of all payment_failed, completion_failed by date with only one per user per day to be counted

So far I have this:

SELECT
  DATE(created_at) AS date,
  SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete,
  SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at)

A sum of the in progress and failed states is easy enough by adding this to the select:

SUM(CASE WHEN state IN('new','paying','completing') THEN 1 ELSE 0 END) AS in_progress,
SUM(CASE WHEN state IN('payment_failed','completion_failed') THEN 1 ELSE 0 END) AS failed 

But i'm having trouble figuring out how to make only one per user_id per day in_progress and failed states to be counted.

The reason I need this is to manipulate the failure rate in our stats, as many users who trigger a failure or incomplete order go on to trigger more which inflates our failure rate.

Thanking you in advance.

like image 260
Marc Greenstock Avatar asked Jan 11 '13 21:01

Marc Greenstock


2 Answers

SELECT created_at::date AS the_date
      ,SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete
      ,SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid
      ,COUNT(DISTINCT CASE WHEN state IN('new','paying','completing')
                      THEN user_id ELSE NULL END) AS in_progress
      ,COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed')
                      THEN user_id ELSE NULL END) AS failed 
FROM   orders
WHERE  created_at BETWEEN ? AND ?
GROUP  BY created_at::date

I use the_date as alias, since it is unwise (while allowed) to use the key word date as identifier.

You could use a similar technique for complete and paid, one is as good as the other there:

COUNT(CASE WHEN state = 'complete' THEN 1 ELSE NULL END) AS complete
like image 182
Erwin Brandstetter Avatar answered Oct 03 '22 07:10

Erwin Brandstetter


Try something like:

SELECT
  DATE(created_at) AS date,
  SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete,
  SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid,
  COUNT(DISTINCT CASE WHEN state IN('new','paying','completing') THEN user_id ELSE NULL END) AS in_progress,
  COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed') THEN user_id ELSE NULL END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at);

The main idea - COUNT (DISTINCT ...) will count unique user_id and wont count NULL values.

Details: aggregate functions, 4.2.7. Aggregate Expressions

The whole query with same style counts and simplified CASE WHEN ...:

SELECT
  DATE(created_at) AS date,
  COUNT(CASE WHEN state = 'complete' THEN 1 END) AS complete,
  COUNT(CASE WHEN state = 'paid' THEN 1 END) AS paid,
  COUNT(DISTINCT CASE WHEN state IN('new','paying','completing') THEN user_id END) AS in_progress,
  COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed') THEN user_id END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at);
like image 43
Ihor Romanchenko Avatar answered Oct 03 '22 07:10

Ihor Romanchenko