Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows with max value from groups of rows grouped by multiple columns (PSQL)

Tags:

sql

postgresql

I have a table of transactional data, which are forecasts for the future. The same forecasts, identified by same date, type, location and product, are thus read in multiple times as the forecasts become more accurate as time goes by and are resent.

I would like to create a query that would group the transactions that are of the same type and for the same location, product and date and then select from these groups only the ones that have the newest updated timestamps.

The table has now hundreds of thousands of rows and as time goes by, millions, so a reasonably efficient solution would be appreciated :)

Example table:

date    |  location_code   | product_code  | quantity |   type   | updated_at 
------------+------------------+---------------+----------+----------+------------
2013-02-04 | ABC | 123 |  -26.421 | TRANSFER | 2013-01-12
2013-02-07 | ABC | 123 |    -48.1 | SALE | 2013-01-10
2013-02-06 | BCD | 234 |  -58.107 | SALE | 2013-01-11
2013-02-06 | BCD | 234 |      -60 | SALE | 2013-01-10
2013-02-04 | ABC | 123 |   -6.727 | TRANSFER | 2013-01-10

The desired result:

date    |  location_code   | product_code  | quantity |   type   | updated_at 
------------+------------------+---------------+----------+----------+------------
2013-02-04 | ABC | 123 |  -26.421 | TRANSFER | 2013-01-12
2013-02-07 | ABC | 123 |    -48.1 | SALE | 2013-01-10
2013-02-06 | BCD | 234 |  -58.107 | SALE | 2013-01-11

I tried for example:

SELECT t.date, t.location_code, t.product_code, t.quantity, t.type, t.updated_at
FROM transactions t
INNER JOIN
(
   SELECT MAX(updated_at) as max_updated_at
   FROM transactions
   GROUP BY product_code, location_code, type, date
) s on t.updated_at=max_updated_at; 

But this seems to take ages and doesn't seem to work.

Thank you for the help!

like image 729
jesseniem Avatar asked Mar 16 '13 22:03

jesseniem


3 Answers

select distinct on ("date", location_code, product_code, type)
    "date",
    location_code,
    product_code,
    quantity,
    type,
    updated_at
from transactions t
order by t."date", t.location_code, t.product_code, t.type, t.updated_at desc
like image 171
Clodoaldo Neto Avatar answered Oct 20 '22 03:10

Clodoaldo Neto


This is probably more efficient than the join with the derived table

select *
from (
    select date, 
           location_code, 
           product_code, 
           quantity, 
           type, 
           updated_at, 
           max(updated_at) over (partition by product_code, location_code, type, date) as max_updated
    from transactions
) t
where updated_at = max_updated;
like image 23
a_horse_with_no_name Avatar answered Oct 20 '22 05:10

a_horse_with_no_name


Thanks Dan Bracuk!

This was the correct query:

SELECT t.date, t.location_code, t.product_code, t.quantity, t.type, t.updated_at
FROM transactions t
INNER JOIN
(
   SELECT MAX(updated_at) as max_updated_at, product_code prod, location_code loc, type     typ, date dat
   FROM transactions
   GROUP BY product_code, location_code, type, date
   ) s ON t.updated_at=max_updated_at AND t.location_code=loc AND t.product_code=prod AND  t.type=typ AND t.date=dat;
like image 1
jesseniem Avatar answered Oct 20 '22 03:10

jesseniem