I have a table in a PostgreSQL 9.2 DB, created and filled as follows:
CREATE TABLE foo( id integer, date date );
INSERT INTO foo
SELECT (id % 10) + 1, now() - (id % 50) * interval '1 day'
FROM generate_series(1, 100000) AS id;
Now, I need to find all pairs (id, date)
such that the date is a maximum one among all pairs with the same id
. The query is kind of well known and it's common to use the window function called ROW_NUMBER()
SELECT id, date
FROM (
SELECT id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) rn
FROM foo
) sbt
WHERE sbt.rn = 1;
Now, I ask for the plan of that query and figured out that the WindowAgg
node requires a table to be sorted first.
Subquery Scan on sbt (cost=11116.32..14366.32 rows=500 width=8) (actual time=71.650..127.809 rows=10 loops=1)
Filter: (sbt.rn = 1)
Rows Removed by Filter: 99990
-> WindowAgg (cost=11116.32..13116.32 rows=100000 width=8) (actual time=71.644..122.476 rows=100000 loops=1)
-> Sort (cost=11116.32..11366.32 rows=100000 width=8) (actual time=71.637..92.081 rows=100000 loops=1)
Sort Key: foo.id, foo.date
Sort Method: external merge Disk: 1752kB
-> Seq Scan on foo (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.006..6.138 rows=100000 loops=1)
As expected, the sorting took the majority of the query execution time and it would definitely helpful to use index
es.
So I create the CREATE INDEX ON foo(id, date)
and expect that now it'll be use indexes. But it doesn't. I got the same plan with the external merge
and even turning off the sequential scan
wasn't useful. I just ended up with Bitmap Index Scan
-> Sort (cost=12745.58..12995.58 rows=100000 width=8) (actual time=69.247..90.003 rows=100000 loops=1)
Sort Key: foo.id, foo.date
Sort Method: external merge Disk: 1752kB
-> Bitmap Heap Scan on foo (cost=1629.26..3072.26 rows=100000 width=8) (actual time=5.359..12.639 rows=100000 loops=1)
-> Bitmap Index Scan on foo_id_date_idx (cost=0.00..1604.26 rows=100000 width=0) (actual time=5.299..5.299 rows=100000 loops=1)
QUESTION
Can WindowAgg
ever use indexes for sorting? I think that it can't but don't understand why... GroupAggreagte
s can and it gives good performance improving.
To match the index you created:
CREATE INDEX ON foo(id, date)
you would have to make that:
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC NULLS LAST)
which is the perfect reverse order of ASC
.
That aside, you could just run:
SELECT DISTINCT ON (id)
id, date
FROM foo
ORDER BY id, date DESC NULLS LAST;
But that's probably not what you wanted to ask. Either way, I would make the index:
CREATE INDEX ON foo(id, date DESC NULLS LAST)
so that max(date)
is the first index entry per id
.
Related:
You can rewrite the logic "the date is a maximum one among all pairs with the same id" to a direct translation:
SELECT id, date
FROM (
SELECT id, date, MAX(date) OVER (PARTITION BY id) as maxDate
FROM foo
) sbt
WHERE date = maxDate;
It's not exactly the same as a ROW_NUMBER
, but a RANK
, it might return multiple rows with the same date
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With