Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill in missing rows when aggregating over multiple fields in Postgres

I am aggregating sales for a set of products per day using Postgres and need to know not just when sales do happen, but also when they do not for further processing.

SELECT 
sd.date, 
COUNT(sd.sale_id) AS sales, 
sd.product

FROM sales_data sd
-- sales per product, per day
GROUP BY sd.product, sd.date
ORDER BY sd.product, sd.date

This produces the following:

    date    | sales |       product                           
------------+-------+-------------------
 2017-08-17 |  10   | soap
 2017-08-19 |   2   | soap
 2017-08-20 |   5   | soap
 2017-08-17 |   2   | shower gel
 2017-08-21 |   1   | shower gel

As you can see - the date ranges per product are not continuous as sales_data just didn't contain any info for these products on some days.

What I'm aiming to do is to add a sales = 0 row for each product that is not sold on any day in a range - for example here, between 2017-08-17 and 2017-08-21 to give something like the the following:

    date    | sales |      product                           
------------+-------+-------------------
 2017-08-17 |  10   | soap
 2017-08-18 |   0   | soap
 2017-08-19 |   2   | soap
 2017-08-20 |   5   | soap
 2017-08-21 |   0   | soap
 2017-08-17 |   2   | shower gel
 2017-08-18 |   0   | shower gel
 2017-08-19 |   0   | shower gel
 2017-08-20 |   0   | shower gel
 2017-08-21 |   1   | shower gel

In a simpler case where there was only a single product, it seems like the solution would be to use generate_series() i.e.:

  • create a full range of dates using generate_series
  • LEFT JOIN the already aggregated sales data onto the date series
  • COALESCE any NULL counts to 0 in the missing rows

The problem I have is that this approach does not seem to work dates repeat in the aggregated data as I'm grouping over not just multiple dates, but multiple products also.

It feels like I should be able to do something cunning with window functions here to solve this e.g. joining onto the full date range over partitions defined by the product name - but I can't see a way of actually getting this to work.

like image 604
Chris Whittleston Avatar asked Aug 17 '17 14:08

Chris Whittleston


1 Answers

You could use:

WITH cte AS (
   SELECT date, s.product
   FROM  ... -- some way to generate date series
   CROSS JOIN (SELECT DISTINCT product FROM sales_data) s
)
SELECT 
    c.date,
    c.product,
    COUNT(sd.sale_id) AS sales
FROM cte c
LEFT JOIN sales_data sd
  ON c.date = sd.date AND c.product= sd.product
GROUP BY c.date, c.product
ORDER BY c.date, c.product;

First create Cartesian product of dates and products, then LEFT JOIN to actual data and do calculations.


Oracle has great feature for this scenarios called Partitioned Outer Joins:

SELECT times.time_id, product, quantity 
FROM inventory  PARTITION BY  (product) 
RIGHT OUTER JOIN times ON (times.time_id = inventory.time_id) 
WHERE times.time_id BETWEEN TO_DATE('01/04/01', 'DD/MM/YY') 
      AND TO_DATE('06/04/01', 'DD/MM/YY') 
ORDER BY  2,1; 
like image 89
Lukasz Szozda Avatar answered Nov 20 '22 12:11

Lukasz Szozda