I have a table of transactional data, which are forecasts for the future. The same forecasts, identified by same date, type, location and product, are thus read in multiple times as the forecasts become more accurate as time goes by and are resent.
I would like to create a query that would group the transactions that are of the same type and for the same location, product and date and then select from these groups only the ones that have the newest updated timestamps.
The table has now hundreds of thousands of rows and as time goes by, millions, so a reasonably efficient solution would be appreciated :)
Example table:
date | location_code | product_code | quantity | type | updated_at
------------+------------------+---------------+----------+----------+------------
2013-02-04 | ABC | 123 | -26.421 | TRANSFER | 2013-01-12
2013-02-07 | ABC | 123 | -48.1 | SALE | 2013-01-10
2013-02-06 | BCD | 234 | -58.107 | SALE | 2013-01-11
2013-02-06 | BCD | 234 | -60 | SALE | 2013-01-10
2013-02-04 | ABC | 123 | -6.727 | TRANSFER | 2013-01-10
The desired result:
date | location_code | product_code | quantity | type | updated_at
------------+------------------+---------------+----------+----------+------------
2013-02-04 | ABC | 123 | -26.421 | TRANSFER | 2013-01-12
2013-02-07 | ABC | 123 | -48.1 | SALE | 2013-01-10
2013-02-06 | BCD | 234 | -58.107 | SALE | 2013-01-11
I tried for example:
SELECT t.date, t.location_code, t.product_code, t.quantity, t.type, t.updated_at
FROM transactions t
INNER JOIN
(
SELECT MAX(updated_at) as max_updated_at
FROM transactions
GROUP BY product_code, location_code, type, date
) s on t.updated_at=max_updated_at;
But this seems to take ages and doesn't seem to work.
Thank you for the help!
select distinct on ("date", location_code, product_code, type)
"date",
location_code,
product_code,
quantity,
type,
updated_at
from transactions t
order by t."date", t.location_code, t.product_code, t.type, t.updated_at desc
This is probably more efficient than the join with the derived table
select *
from (
select date,
location_code,
product_code,
quantity,
type,
updated_at,
max(updated_at) over (partition by product_code, location_code, type, date) as max_updated
from transactions
) t
where updated_at = max_updated;
Thanks Dan Bracuk!
This was the correct query:
SELECT t.date, t.location_code, t.product_code, t.quantity, t.type, t.updated_at
FROM transactions t
INNER JOIN
(
SELECT MAX(updated_at) as max_updated_at, product_code prod, location_code loc, type typ, date dat
FROM transactions
GROUP BY product_code, location_code, type, date
) s ON t.updated_at=max_updated_at AND t.location_code=loc AND t.product_code=prod AND t.type=typ AND t.date=dat;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With